Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Week 4 This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based … Continue reading Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Week 3 Run a lasso regression analysis using k-fold cross validation to identify a subset of predictors from a larger pool of predictor variables that best predicts a quantitative response variable. The Features and Target of My Model The target variable (what's being predicted) is going to be Employed Rate. Since Lasso works best with … Continue reading Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Week 2 Run a Random Forest. You will need to perform a random forest analysis to evaluate the importance of a series of explanatory variables in predicting a binary, categorical response variable. The Features and Target of My Model These will be the same as the previous week. The target is Poverty Group (1= >16%, … Continue reading Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

Course 4: Machine Learning for Data Analysis This course focuses on various machine learning algorithms: Decision Trees, Random Forests, Lasso Regression, and K-Mean Cluster Analysis. Week 1 Run a Classification Tree. You will need to perform a decision tree analysis to test nonlinear relationships among a series of explanatory variables and a binary, categorical response … Continue reading Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

Data Analysis & Interpretation 3.4: Testing a Logistic Regression Model

Week 4 This week's assignment is to test a logistic regression model. Summarize in a few sentences 1) what you found, making sure you discuss the results for the associations between all of your explanatory variables and your response variable. Make sure to include statistical results (odds ratios, p-values, and 95% confidence intervals for the … Continue reading Data Analysis & Interpretation 3.4: Testing a Logistic Regression Model

Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model

Week 3 This week's assignment is to test a multiple regression model. Discuss the results for the associations between all of your explanatory variables and your response variable. Make sure to include statistical results (Beta coefficients and p-values) in your summary. 2) Report whether your results supported your hypothesis for the association between your primary … Continue reading Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model

Data Analysis & Interpretation 3.1: Describing the Data, Methodology, and Measures

Describe 1) your sample, 2) the data collection procedure, and 3) a measures section describing your variables and how you managed them to address your own research question. Sample The sample is from a combination of data sources: US Census (2015), US Crime (2015), and Full-time Law Enforcement Employees in the US (2015).  The census … Continue reading Data Analysis & Interpretation 3.1: Describing the Data, Methodology, and Measures

Data Analysis & Interpretation 2.4: Hypothesis Tests Using a Moderator

Week 4 Run an ANOVA, Chi-Square Test or correlation coefficient that includes a moderator. A moderator is a third variable that has an affect on the direction and/or strength of the relationship between an explanatory variable and a response variable. I will run an ANOVA test and a correlation coefficient using Metropolitan as the moderator. … Continue reading Data Analysis & Interpretation 2.4: Hypothesis Tests Using a Moderator

Data Analysis & Interpretation 2.3: Pearson Correlation

Week 3 Generate a correlation coefficient. Poverty vs. Population How do poverty and population relate to one another?   In [96]: # plot regplot of poverty vs. population sns.regplot('TotalPop', 'Poverty', data=df, color='steelblue') sns.despine() plt.title('Poverty vs. Population', loc='left', fontweight='bold', y=1.02) There's obviously a negative correlation, but let's exclude extremely high populations (>= 3,000,000) in order to "zoom-in" … Continue reading Data Analysis & Interpretation 2.3: Pearson Correlation

Data Analysis & Interpretation 2.1: Analysis of Variance (ANOVA)

Course 2: Data Analysis Tools This course builds on the previous one, exploring advanced statistical methods in the area of hypothesis testing: ANOVA, Chi-Square, and Pearson correlation. Keep in mind, that I did perform hypothesis testing in the previous course even though it was not required. However, I performed z-tests. So part of this course … Continue reading Data Analysis & Interpretation 2.1: Analysis of Variance (ANOVA)