Data Analysis & Interpretation 2.4: Hypothesis Tests Using a Moderator

Week 4

Run an ANOVA, Chi-Square Test or correlation coefficient that includes a moderator. A moderator is a third variable that has an affect on the direction and/or strength of the relationship between an explanatory variable and a response variable.

I will run an ANOVA test and a correlation coefficient using Metropolitan as the moderator. The ANOVA test will use Poverty Group as the explantory variable and Crime Rate as the response variable. We will see if Metropolitan has a statistically significant influence on the difference of means. The correlation coefficient will use Crime Rate as the explanatory variable and Employee Rate as the response variable. We will see if Metropolitan has a statistically significant influence on the correlation coefficients.

Subset Data by the Moderator

I need to subset the sub1 dataframe into two subsets using the Metropolitan variable.

 

In [103]:
metro_sub = sub1[sub1['Metropolitan']==1]
nonmetro_sub = sub1[sub1['Metropolitan']==0]

ANOVA with Moderator

In [104]:
# run ANOVA for metro
sub5 = metro_sub[['Poverty Group','Crime Rate']]
sub5.columns = ['Poverty_Group','Crime_Rate'] # need to remove spaces from column headers
metro_mod = smf.ols(formula='Crime_Rate ~ C(Poverty_Group)', data=sub5).fit()
print(metro_mod.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:             Crime_Rate   R-squared:                       0.034
Model:                            OLS   Adj. R-squared:                  0.033
Method:                 Least Squares   F-statistic:                     27.33
Date:                Mon, 24 Sep 2018   Prob (F-statistic):           2.22e-07
Time:                        21:38:00   Log-Likelihood:                -6058.5
No. Observations:                 767   AIC:                         1.212e+04
Df Residuals:                     765   BIC:                         1.213e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                   652.5876     30.048     21.718      0.000     593.602     711.573
C(Poverty_Group)[T.> 16%]   253.2780     48.450      5.228      0.000     158.167     348.389
==============================================================================
Omnibus:                      196.483   Durbin-Watson:                   1.154
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              449.640
Skew:                           1.370   Prob(JB):                     2.30e-98
Kurtosis:                       5.562   Cond. No.                         2.44
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [105]:
# run ANOVA for non-metro
sub6 = nonmetro_sub[['Poverty Group','Crime Rate']]
sub6.columns = ['Poverty_Group','Crime_Rate'] # need to remove spaces from column headers
nonmetro_mod = smf.ols(formula='Crime_Rate ~ C(Poverty_Group)', data=sub6).fit()
print(nonmetro_mod.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:             Crime_Rate   R-squared:                       0.035
Model:                            OLS   Adj. R-squared:                  0.034
Method:                 Least Squares   F-statistic:                     47.54
Date:                Mon, 24 Sep 2018   Prob (F-statistic):           8.34e-12
Time:                        21:38:00   Log-Likelihood:                -10269.
No. Observations:                1316   AIC:                         2.054e+04
Df Residuals:                    1314   BIC:                         2.055e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                   654.7349     24.148     27.114      0.000     607.363     702.107
C(Poverty_Group)[T.> 16%]   226.2038     32.806      6.895      0.000     161.845     290.562
==============================================================================
Omnibus:                      605.667   Durbin-Watson:                   1.516
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4584.624
Skew:                           1.981   Prob(JB):                         0.00
Kurtosis:                      11.241   Cond. No.                         2.73
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [106]:
# use seaborn's catplot to plot avg. Crime Rate by Poverty Group and Metropolitan
sns.catplot('Poverty Group', 'Crime Rate', col='Metropolitan', data=sub1, kind='bar', ci=False)
plt.savefig('cr_pg_metro_mod_anova.png')

 

cr_pg_metro_mod_anova

ANOVA with Moderator Summary

The ANOVA test for both Metropolitan subsets showed statistically significant results. However, as we can see from the above plot, Metropolitan does not prove to be a moderator for Crime Rate by Poverty Group. It does not matter whether the poverty group resides in a metropolitan county; the average crime rate is essentially the same.

Correlation with Moderator

In [107]:
# run correlation coefficient for metro
metro_mod_r = stats.pearsonr(metro_sub['Crime Rate'], metro_sub['Employee Rate'])
print('Correlation Coefficient')
print(metro_mod_r[0])
print('')
print('P-Value')
print(metro_mod_r[1])
print('')
print('r-squared')
print(metro_mod_r[0] ** 2)
Correlation Coefficient
0.5482366368863378

P-Value
2.1673779450494352e-61

r-squared
0.30056341002444226
In [108]:
# run correlation coefficient for non-metro
nonmetro_mod_r = stats.pearsonr(nonmetro_sub['Crime Rate'], nonmetro_sub['Employee Rate'])
print('Correlation Coefficient')
print(nonmetro_mod_r[0])
print('')
print('P-Value')
print(nonmetro_mod_r[1])
print('')
print('r-squared')
print(nonmetro_mod_r[0] ** 2)
Correlation Coefficient
0.3375036743776912

P-Value
2.0188200814413264e-36

r-squared
0.1139087302184426
In [109]:
# plot Employee Rate vs. Crime Rate by for Metro
sns.regplot('Crime Rate', 'Employee Rate', data=metro_sub, color='steelblue', scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Crime Rate vs. Employee Rate - Metropolitan', loc='left', fontweight='bold', y=1.02)
plt.savefig('cr_er_metro_mod_corr.png')

cr_er_metro_mod_corr

In [110]:

# plot Employee Rate vs. Crime Rate by for Non-Metro
sns.regplot('Crime Rate', 'Employee Rate', data=nonmetro_sub, color='steelblue', scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Crime Rate vs. Employee Rate - Non-Metropolitan', loc='left', fontweight='bold', y=1.02)
plt.savefig('cr_er_nonmetro_mod_corr.png')

cr_er_nonmetro_mod_corr

Correlation with Moderator Summary

Both subsets have strong postive linear relationships. However, you’ll notice that the metro subset has a strong correlation coefficient and r-squared value. So it appears that Metropolitan does server as a moderator between crime rate and employee rate.

Bonus: ANOVA with Regions as a Moderator

I wanted to try one more example of using a variable as a moderator. This time I want to see if Region has an influence on the direction and/or strength of the relationship between Poverty Group (explanatory variable) and Property Crime Rate (response variable).

 

In [111]:
# create subsets by region
south_sub = sub1[sub1['Region']=='South']
west_sub = sub1[sub1['Region']=='West']
midwest_sub = sub1[sub1['Region']=='Midwest']
northeast_sub = sub1[sub1['Region']=='Northeast']
In [112]:
# run ANOVA for south
sub7 = south_sub[['Poverty Group','Property Crime Rate']]
sub7.columns = ['Poverty_Group','Property_Crime_Rate'] # need to remove spaces from column headers
south_mod = smf.ols(formula='Property_Crime_Rate ~ C(Poverty_Group)', data=sub7).fit()
print(south_mod.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Property_Crime_Rate   R-squared:                       0.000
Model:                             OLS   Adj. R-squared:                 -0.001
Method:                  Least Squares   F-statistic:                    0.1881
Date:                 Mon, 24 Sep 2018   Prob (F-statistic):              0.665
Time:                         21:38:01   Log-Likelihood:                -7444.8
No. Observations:                  953   AIC:                         1.489e+04
Df Residuals:                      951   BIC:                         1.490e+04
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                   825.7941     33.927     24.340      0.000     759.214     892.374
C(Poverty_Group)[T.> 16%]    17.9257     41.336      0.434      0.665     -63.194      99.045
==============================================================================
Omnibus:                      365.024   Durbin-Watson:                   1.418
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2105.839
Skew:                           1.651   Prob(JB):                         0.00
Kurtosis:                       9.491   Cond. No.                         3.26
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [113]:
# run ANOVA for west
sub8 = west_sub[['Poverty Group','Property Crime Rate']]
sub8.columns = ['Poverty_Group','Property_Crime_Rate'] # need to remove spaces from column headers
west_mod = smf.ols(formula='Property_Crime_Rate ~ C(Poverty_Group)', data=sub8).fit()
print(west_mod.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Property_Crime_Rate   R-squared:                       0.003
Model:                             OLS   Adj. R-squared:                 -0.000
Method:                  Least Squares   F-statistic:                    0.8781
Date:                 Mon, 24 Sep 2018   Prob (F-statistic):              0.349
Time:                         21:38:01   Log-Likelihood:                -2309.2
No. Observations:                  300   AIC:                             4622.
Df Residuals:                      298   BIC:                             4630.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                   669.5041     42.410     15.787      0.000     586.044     752.964
C(Poverty_Group)[T.> 16%]    57.9683     61.861      0.937      0.349     -63.771     179.708
==============================================================================
Omnibus:                      136.419   Durbin-Watson:                   1.959
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              619.790
Skew:                           1.903   Prob(JB):                    2.60e-135
Kurtosis:                       8.925   Cond. No.                         2.55
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [114]:
# run ANOVA for midwest
sub9 = midwest_sub[['Poverty Group','Property Crime Rate']]
sub9.columns = ['Poverty_Group','Property_Crime_Rate'] # need to remove spaces from column headers
midwest_mod = smf.ols(formula='Property_Crime_Rate ~ C(Poverty_Group)', data=sub9).fit()
print(midwest_mod.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Property_Crime_Rate   R-squared:                       0.020
Model:                             OLS   Adj. R-squared:                  0.019
Method:                  Least Squares   F-statistic:                     14.45
Date:                 Mon, 24 Sep 2018   Prob (F-statistic):           0.000157
Time:                         21:38:01   Log-Likelihood:                -5236.3
No. Observations:                  701   AIC:                         1.048e+04
Df Residuals:                      699   BIC:                         1.049e+04
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                   482.2070     18.934     25.467      0.000     445.032     519.382
C(Poverty_Group)[T.> 16%]   135.7572     35.717      3.801      0.000      65.632     205.883
==============================================================================
Omnibus:                      410.138   Durbin-Watson:                   1.900
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4799.871
Skew:                           2.411   Prob(JB):                         0.00
Kurtosis:                      14.878   Cond. No.                         2.44
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [115]:
# run ANOVA for northeast
sub10 = northeast_sub[['Poverty Group','Property Crime Rate']]
sub10.columns = ['Poverty_Group','Property_Crime_Rate'] # need to remove spaces from column headers
northeast_mod = smf.ols(formula='Property_Crime_Rate ~ C(Poverty_Group)', data=sub10).fit()
print(northeast_mod.summary())
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Property_Crime_Rate   R-squared:                       0.026
Model:                             OLS   Adj. R-squared:                  0.018
Method:                  Least Squares   F-statistic:                     3.374
Date:                 Mon, 24 Sep 2018   Prob (F-statistic):             0.0686
Time:                         21:38:01   Log-Likelihood:                -888.22
No. Observations:                  129   AIC:                             1780.
Df Residuals:                      127   BIC:                             1786.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                   150.9947     23.730      6.363      0.000     104.037     197.953
C(Poverty_Group)[T.> 16%]    93.5539     50.935      1.837      0.069      -7.238     194.346
==============================================================================
Omnibus:                       32.325   Durbin-Watson:                   1.091
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               46.436
Skew:                           1.382   Prob(JB):                     8.25e-11
Kurtosis:                       4.002   Cond. No.                         2.56
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [116]:
# plot mean Property Crime Rate by Poverty Group
sns.factorplot('Poverty Group', 'Property Crime Rate',data=sub1, kind='bar', ci=False)
plt.title('Without Region as Moderator', loc='left', fontweight='bold', y=1.02)
plt.savefig('pg_pcr_nomod_anova.png')

pg_pcr_nomod_anova

In [117]:

# plot mean Property Crime Rate by Poverty Group and Region
sns.catplot('Poverty Group', 'Property Crime Rate', col='Region', data=sub1, kind='bar', ci=False)
plt.savefig('pc_pcr_region_mod_anova.png')

pc_pcr_region_mod_anova

(View Larger Image)

ANOVA with Region as Moderator Summary

We can clearly see that Region does in fact serve as a moderator for the relationship between Poverty Group and Property Crime Rate. Midwest is the only region that has a statistically significant relationship. Northeast comes close with a p-value just under 0.07.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s