Data Analysis & Interpretation 2.3: Pearson Correlation

Week 3

Generate a correlation coefficient.

Poverty vs. Population

How do poverty and population relate to one another?

 

In [96]:
# plot regplot of poverty vs. population
sns.regplot('TotalPop', 'Poverty', data=df, color='steelblue')
sns.despine()
plt.title('Poverty vs. Population', loc='left', fontweight='bold', y=1.02)

poverty_population_corr1

There’s obviously a negative correlation, but let’s exclude extremely high populations (>= 3,000,000) in order to “zoom-in” on the area where most of the data points reside.

 

In [97]:
sns.regplot('TotalPop', 'Poverty', data=df[df['TotalPop']<3000000], scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Poverty vs. Population (< 3m)', loc='left', fontweight='bold', y=1.02)
plt.savefig('poverty_population_corr.png')

poverty_population_corr

In [98]:

# generate correlation coefficient, p-value, and r-squared
pov_pop_r = stats.pearsonr(df['TotalPop'], df['Poverty'])
print('Correlation Coefficient')
print(pov_pop_r[0])
print('')
print('P-Value')
print(pov_pop_r[1])
print('')
print('r-squared')
print(pov_pop_r[0] ** 2)
Correlation Coefficient
-0.051289678238339985

P-Value
0.019232619896592522

r-squared
0.002630631093792446

Poverty vs. Population Summary

There is a weak negative linear relationship between population and poverty. The p-value tells us that the relationship is statistically significant; it is highly unlikely that this is simply the result of sampling. R-squared tells us that we have less than 1% probability of predicting the variability of poverty if we know the population.

Crime Rate vs. Poverty

How do poverty and crime rate relate to one another?

 

In [99]:
# plot regplot of poverty vs. population
sns.regplot('Poverty', 'Crime Rate', data=sub1, color='steelblue', scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Crime Rate vs. Poverty', loc='left', fontweight='bold', y=1.02)
plt.savefig('crime_poverty_corr.png')

crime_poverty_corr

In [100]:

# generate correlation coefficient, p-value, and r-squared
pov_crime_r = stats.pearsonr(sub1['Poverty'], sub1['Crime Rate'])
print('Correlation Coefficient')
print(pov_crime_r[0])
print('')
print('P-Value')
print(pov_crime_r[1])
print('')
print('r-squared')
print(pov_crime_r[0] ** 2)
Correlation Coefficient
0.17200848920623796

P-Value
2.6853393840320555e-15

r-squared
0.02958692035901248

Crime Rate vs. Poverty Summary

There is a weak positive linear relationship between poverty and crime rate. The p-value tells us that the relationship is statistically significant. R-squared tells us that we would have about 3% probability of predicting the variability of crime rate if we were given the percentage of poverty.

Employee Rate vs. Crime Rate

How do crime rate and employee rate relate to one another?

 

In [101]:
# plot regplot of crime rate vs. employee rate
sns.regplot('Crime Rate', 'Employee Rate', data=sub1, color='steelblue', scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Crime Rate vs. Employee Rate', loc='left', fontweight='bold', y=1.02)
plt.savefig('crime_employee_corr.png')

crime_employee_corr

In [102]:

# generate correlation coefficient, p-value, and r-squared
cr_er_r = stats.pearsonr(sub1['Crime Rate'], sub1['Employee Rate'])
print('Correlation Coefficient')
print(cr_er_r[0])
print('')
print('P-Value')
print(cr_er_r[1])
print('')
print('r-squared')
print(cr_er_r[0] ** 2)
Correlation Coefficient
0.3742427673455067

P-Value
3.0456058959706253e-70

r-squared
0.14005764891042308

Employee Rate vs. Crime Rate Summary

There is a weak postive linear relationship between crime rate and employee rate. The p-value tells us that the relationship is statistically significant. R-squared tells us that we would have 14% probability of predicting the variability of employee rate if we were given the crime rate.

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s