Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Week 3 Run a lasso regression analysis using k-fold cross validation to identify a subset of predictors from a larger pool of predictor variables that best predicts a quantitative response variable. The Features and Target of My Model The target variable (what's being predicted) is going to be Employed Rate. Since Lasso works best with … Continue reading Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Week 2 Run a Random Forest. You will need to perform a random forest analysis to evaluate the importance of a series of explanatory variables in predicting a binary, categorical response variable. The Features and Target of My Model These will be the same as the previous week. The target is Poverty Group (1= >16%, … Continue reading Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

Course 4: Machine Learning for Data Analysis This course focuses on various machine learning algorithms: Decision Trees, Random Forests, Lasso Regression, and K-Mean Cluster Analysis. Week 1 Run a Classification Tree. You will need to perform a decision tree analysis to test nonlinear relationships among a series of explanatory variables and a binary, categorical response … Continue reading Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

Data Analysis & Interpretation 3.4: Testing a Logistic Regression Model

Week 4 This week's assignment is to test a logistic regression model. Summarize in a few sentences 1) what you found, making sure you discuss the results for the associations between all of your explanatory variables and your response variable. Make sure to include statistical results (odds ratios, p-values, and 95% confidence intervals for the … Continue reading Data Analysis & Interpretation 3.4: Testing a Logistic Regression Model

Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model

Week 3 This week's assignment is to test a multiple regression model. Discuss the results for the associations between all of your explanatory variables and your response variable. Make sure to include statistical results (Beta coefficients and p-values) in your summary. 2) Report whether your results supported your hypothesis for the association between your primary … Continue reading Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model

Data Analysis & Interpretation 3.2: Testing a Basic Linear Regression Model

Week 2 This week's assignment asks you to test a basic linear regression model for the association between your primary explanatory variable and a response variable. 1) If you have a categorical explanatory variable, make sure one of your categories is coded "0" and generate a frequency table for this variable to check your coding. … Continue reading Data Analysis & Interpretation 3.2: Testing a Basic Linear Regression Model

Data Analysis & Interpretation 3.1: Describing the Data, Methodology, and Measures

Describe 1) your sample, 2) the data collection procedure, and 3) a measures section describing your variables and how you managed them to address your own research question. Sample The sample is from a combination of data sources: US Census (2015), US Crime (2015), and Full-time Law Enforcement Employees in the US (2015).  The census … Continue reading Data Analysis & Interpretation 3.1: Describing the Data, Methodology, and Measures

Data Analysis & Interpretation 2.4: Hypothesis Tests Using a Moderator

Week 4 Run an ANOVA, Chi-Square Test or correlation coefficient that includes a moderator. A moderator is a third variable that has an affect on the direction and/or strength of the relationship between an explanatory variable and a response variable. I will run an ANOVA test and a correlation coefficient using Metropolitan as the moderator. … Continue reading Data Analysis & Interpretation 2.4: Hypothesis Tests Using a Moderator

Data Analysis & Interpretation 2.3: Pearson Correlation

Week 3 Generate a correlation coefficient. Poverty vs. Population How do poverty and population relate to one another?   In [96]: # plot regplot of poverty vs. population sns.regplot('TotalPop', 'Poverty', data=df, color='steelblue') sns.despine() plt.title('Poverty vs. Population', loc='left', fontweight='bold', y=1.02) There's obviously a negative correlation, but let's exclude extremely high populations (>= 3,000,000) in order to "zoom-in" … Continue reading Data Analysis & Interpretation 2.3: Pearson Correlation

Data Analysis & Interpretation 2.2: Chi-Square Test of Independence

Week 2 Run a Chi-Square Test of Independence. The null hypothesis is that the relative proportions of one variable are independent of the second variable; in other words, the proportions of one variable are the same for different values of the second variable. The alternate hypothesis is that the relative proportions of one variable are … Continue reading Data Analysis & Interpretation 2.2: Chi-Square Test of Independence