 # Data Analysis & Interpretation 2.3: Pearson Correlation

## Week 3

Generate a correlation coefficient.

## Poverty vs. Population

How do poverty and population relate to one another?

In :
```# plot regplot of poverty vs. population
sns.regplot('TotalPop', 'Poverty', data=df, color='steelblue')
sns.despine()
plt.title('Poverty vs. Population', loc='left', fontweight='bold', y=1.02)``` There’s obviously a negative correlation, but let’s exclude extremely high populations (>= 3,000,000) in order to “zoom-in” on the area where most of the data points reside.

In :
```sns.regplot('TotalPop', 'Poverty', data=df[df['TotalPop']<3000000], scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Poverty vs. Population (< 3m)', loc='left', fontweight='bold', y=1.02)
plt.savefig('poverty_population_corr.png')``` In :

```# generate correlation coefficient, p-value, and r-squared
pov_pop_r = stats.pearsonr(df['TotalPop'], df['Poverty'])
print('Correlation Coefficient')
print(pov_pop_r)
print('')
print('P-Value')
print(pov_pop_r)
print('')
print('r-squared')
print(pov_pop_r ** 2)
```
```Correlation Coefficient
-0.051289678238339985

P-Value
0.019232619896592522

r-squared
0.002630631093792446
```

## Poverty vs. Population Summary

There is a weak negative linear relationship between population and poverty. The p-value tells us that the relationship is statistically significant; it is highly unlikely that this is simply the result of sampling. R-squared tells us that we have less than 1% probability of predicting the variability of poverty if we know the population.

## Crime Rate vs. Poverty

How do poverty and crime rate relate to one another?

In :
```# plot regplot of poverty vs. population
sns.regplot('Poverty', 'Crime Rate', data=sub1, color='steelblue', scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Crime Rate vs. Poverty', loc='left', fontweight='bold', y=1.02)
plt.savefig('crime_poverty_corr.png')``` In :

```# generate correlation coefficient, p-value, and r-squared
pov_crime_r = stats.pearsonr(sub1['Poverty'], sub1['Crime Rate'])
print('Correlation Coefficient')
print(pov_crime_r)
print('')
print('P-Value')
print(pov_crime_r)
print('')
print('r-squared')
print(pov_crime_r ** 2)
```
```Correlation Coefficient
0.17200848920623796

P-Value
2.6853393840320555e-15

r-squared
0.02958692035901248
```

## Crime Rate vs. Poverty Summary

There is a weak positive linear relationship between poverty and crime rate. The p-value tells us that the relationship is statistically significant. R-squared tells us that we would have about 3% probability of predicting the variability of crime rate if we were given the percentage of poverty.

## Employee Rate vs. Crime Rate

How do crime rate and employee rate relate to one another?

In :
```# plot regplot of crime rate vs. employee rate
sns.regplot('Crime Rate', 'Employee Rate', data=sub1, color='steelblue', scatter_kws={'alpha':0.5})
sns.despine()
plt.title('Crime Rate vs. Employee Rate', loc='left', fontweight='bold', y=1.02)
plt.savefig('crime_employee_corr.png')``` In :

```# generate correlation coefficient, p-value, and r-squared
cr_er_r = stats.pearsonr(sub1['Crime Rate'], sub1['Employee Rate'])
print('Correlation Coefficient')
print(cr_er_r)
print('')
print('P-Value')
print(cr_er_r)
print('')
print('r-squared')
print(cr_er_r ** 2)
```
```Correlation Coefficient
0.3742427673455067

P-Value
3.0456058959706253e-70

r-squared
0.14005764891042308
```

## Employee Rate vs. Crime Rate Summary

There is a weak postive linear relationship between crime rate and employee rate. The p-value tells us that the relationship is statistically significant. R-squared tells us that we would have 14% probability of predicting the variability of employee rate if we were given the crime rate.