Data Analysis & Interpretation 2.2: Chi-Square Test of Independence

Week 2

Run a Chi-Square Test of Independence. The null hypothesis is that the relative proportions of one variable are independent of the second variable; in other words, the proportions of one variable are the same for different values of the second variable. The alternate hypothesis is that the relative proportions of one variable are associated with the second variable.

H0 (null hypothesis): variable 2 prop is independent of variable 1

H1 (alternative hypothesis): variable 2 prop is dependent with variable 1

 

In [91]:
# import scipy.stats to run chi-squared test
import scipy.stats as stats

Metropolitan and Poverty Group Chi-Squared Test

Does a county being metro or non-metro influence its overall poverty? In other words, are Metropolitan and Poverty Group dependent? If they are not, then we accept the null hypothesis. If they are, then we reject the null hypothesis and accept the alternative hypothesis.

 

In [92]:
# add metropolitan to sub1
sub1['Metropolitan'] = df['Metropolitan']

In [93]:

# contingency table of observed counts for Metropolitan and Poverty Group
ct1 = pd.crosstab(sub1['Metropolitan'], sub1['Poverty Group'])
ct1
Out[93]:
Poverty Group <= 16% > 16%
Metropolitan
0 603 713
1 472 295
In [94]:
# column totals
ct1_sum = ct1.sum(axis=0)

# column percentages
ct1_pct = ct1 / ct1_sum
ct1_pct
Out[94]:
Poverty Group <= 16% > 16%
Metropolitan
0 0.56093 0.707341
1 0.43907 0.292659
In [95]:
# run chi-squared test
cs1 = stats.chi2_contingency(ct1)
print(cs1)
(47.30791446629259, 6.066676927330312e-12, 1, array([[679.16466635, 636.83533365],
       [395.83533365, 371.16466635]]))

Metropolitan and Poverty Group Chi-Squared Results: Reject the Null Hypothesis

The first value is the chi-square statistics, the second is the p-value, and the third is the degrees of freedom. If we didn’t have the p-value then we could use a chi-squared distribution table to look up whether or not we can reject the null hypothesis. Since our significance level (alpha) is 0.05 and our degrees of freedom is 1, then the chi-squared statistic should be at least 3.841 to be statistically significant. Our chi-squared statistic is 47.3, well above 3.841. So we can reject the null hypothesis. We could also just look at the p-value, 0.000000000006066676927330312, which is well below 0.05. So we can reject the null hypothesis.

In other words, the proportion of the poverty group is dependent with whether or not a county is metropolitan.

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s