## Week 4

This week we are supposed to focus on plotting univariate and bivariate graphs.

STEP 1: Create graphs of your variables one at a time (univariate graphs).

Examine both their center and spread.

STEP 2: Create a graph showing the association between your explanatory and response variables (bivariate graph).

Your output should be interpretable (i.e. organized and labeled).

I have already performed several such visualizations during weeks 2 and 3. If you have not done so already, I encourage you to skim through the material for those weeks. Those graphs certainly fulfill the requirements of step 1. That being said, I will display summary stats of those variables for this week. Further, at least part of step 2 was fulfilled during my week 3 analysis; however, I will perform additional graphing of bivariate data for this week.

## Summary Stats of Key Variables

```
# view summary stats of key variables
sub1.describe()
```

## Correlations Between Key Variables

```
# plot correlations of key variables
sns.pairplot(sub1.drop('State', axis=1), plot_kws={'color':'steelblue','lw':0,'alpha':0.5},
diag_kws={'color':'darkgray','edgecolor':'white','lw':0.25})
plt.savefig('pairplot_key_variables.png')
```

## Stats by Poverty Group

```
# add poverty groups (<=16% or >16%) to sub1
def poverty_group(row):
if row['Poverty'] <= 16:
return '<= 16%'
else:
return '> 16%'
sub1['Poverty Group'] = sub1.apply(lambda row: poverty_group(row), axis=1)
# check
sub1[['County','Poverty','Poverty Group']].head()
```

County | Poverty | Poverty Group | |
---|---|---|---|

0 | Baldwin | 13.4 | <= 16% |

1 | Bibb | 16.8 | > 16% |

2 | Bullock | 24.6 | > 16% |

3 | Clay | 16.7 | > 16% |

4 | Cleburne | 17.0 | > 16% |

In [67]:

```
# how many counties are represented in each poverty group?
poverty_group_count = sub1['Poverty Group'].value_counts(sort=False)
poverty_group_count
```

[Input for 68-69 have been removed due to irrelevancy.]

```
# plot avg. violent crime rate by poverty group
sns.factorplot(x='Poverty Group', y='Violent Crime Rate', data=sub1, kind='bar', ci=False, color='steelblue')
plt.title('Avg. Violent Crime Rate by Grouping', loc='left', fontweight='bold', y=1.02)
plt.savefig('avg_vcr_grouping.png')
```

## Hypothesis Test of Average Violent Crime Rate by Poverty Group

I want to determine whether or not the difference in averages between these two poverty groups is statistically significant. I will use a significance level (alpha) of 0.05. My hypotheses statements are as follows:

H0 (null hypothesis): m1 = m2

H1 (alternative hypothesis): m1 != m2

```
# assign variables
group1_vcr = sub1[sub1['Poverty Group']=='<= 16%']['Violent Crime Rate']
group2_vcr = sub1[sub1['Poverty Group']=='> 16%']['Violent Crime Rate']
# run test
vcr_group_ztest = ztest(x1=group1_vcr, x2=group2_vcr)
# print results
print('Test Statistic')
print(vcr_group_ztest[0])
print('\n' + 'P-Value')
print('{:.25f}'.format(float(vcr_group_ztest[1])))
```

## Violent Crime Rate by Poverty Group Z-Test Results: Reject the Null Hyopthesis

Since the p-value is well below 0.05, we reject the null hypothesis. The difference in means between poverty groups for violent crime rate is statistically significant. Remember that the previous hypothesis test ran on the average violent crime rate between metro and non-metro counties was not statistically significant. This suggests that a stronger relationship exists between poverty and violent crime rate than exists between metro/non-metro and violent crime rate.

```
# plot avg. property crime rate by poverty group
sns.factorplot(x='Poverty Group', y='Property Crime Rate', data=sub1, kind='bar', ci=False, color='steelblue')
plt.title('Avg. Property Crime Rate by Grouping', loc='left', fontweight='bold', y=1.02)
plt.savefig('avg_pcr_grouping.png')
```

## Hypothesis Test of Average Property Crime Rate by Poverty Group

I want to determine whether or not the difference in averages between these two poverty groups is statistically significant. I will use a significance level (alpha) of 0.05. My hypotheses statements are as follows:

H0 (null hypothesis): m1 = m2

H1 (alternative hypothesis): m1 != m2

```
# assign variables
group1_pcr = sub1[sub1['Poverty Group']=='<= 16%']['Property Crime Rate']
group2_pcr = sub1[sub1['Poverty Group']=='> 16%']['Property Crime Rate']
# run test
pcr_group_ztest = ztest(x1=group1_pcr, x2=group2_pcr)
# print results
print('Test Statistic')
print(pcr_group_ztest[0])
print('\n' + 'P-Value')
print('{:.25f}'.format(float(pcr_group_ztest[1])))
```

## Property Crime Rate by Poverty Group Z-Test Results: Reject the Null Hypothesis

Since the p-value is less than 0.05, we reject the null hypothesis. The difference in means between poverty groups for property crime rate is statistically significant.

## Employee Rate by Violent Crime and Property Crime Groups

```
# add violent crime rate groups (split by median) to sub1
vcr_med = sub1['Violent Crime Rate'].median()
def vcr_group(row):
if row['Violent Crime Rate'] <= vcr_med:
return 'lower half'
else:
return 'upper half'
sub1['Violent Crime Group'] = sub1.apply(lambda row: vcr_group(row), axis=1)
# check
print(vcr_med)
print(sub1[['County','Violent Crime Rate','Violent Crime Group']].head(10))
```

In [75]:

```
# add property crime rate groups (split by median) to sub1
pcr_med = sub1['Property Crime Rate'].median()
def pcr_group(row):
if row['Property Crime Rate'] <= pcr_med:
return 'lower half'
else:
return 'upper half'
sub1['Property Crime Group'] = sub1.apply(lambda row: pcr_group(row), axis=1)
# check
print(pcr_med)
print(sub1[['County','Property Crime Rate','Property Crime Group']].head(10))
```

In [76]:

# plot avg. employee rate by violent crime group sns.factorplot(x='Violent Crime Group', y='Employee Rate', data=sub1, kind='bar', ci=False, color='steelblue') plt.title('Avg. Employee Rate by Grouping', loc='left', fontweight='bold', y=1.02) plt.savefig('avg_er_vcr_grouping.png')

## Hypothesis Test of Average Law Enforcement Employee Rate by Violent Crime Group

I want to determine whether or not the difference in means between these two violent crime rate groups is statistically significant. I will use a significance level (alpha) of 0.05. My hypotheses statements are as follows:

H0 (null hypothesis): m1 = m2

H1 (alternative hypothesis): m1 != m2

```
# assign variables
group1_vcr_er = sub1[sub1['Violent Crime Group']=='lower half']['Employee Rate']
group2_vcr_er = sub1[sub1['Violent Crime Group']=='upper half']['Employee Rate']
# run test
vcr_er_group_ztest = ztest(x1=group1_vcr_er, x2=group2_vcr_er)
# print results
print('Test Statistic')
print(vcr_er_group_ztest[0])
print('\n' + 'P-Value')
print('{:.25f}'.format(float(vcr_er_group_ztest[1])))
```

## Law Enforcement Employee Rate by Violent Crime Rate Group Z-Test Results: Reject the Null Hypothesis

Since the p-value is well below 0.05, we reject the null hypothesis. The difference between the means of the two groups is statistically significant.

```
# plot avg. employee rate by property crime group
sns.factorplot(x='Property Crime Group', y='Employee Rate', data=sub1, kind='bar', ci=False, color='steelblue')
plt.title('Avg. Employee Rate by Grouping', loc='left', fontweight='bold', y=1.02)
plt.savefig('avg_er_pcr_grouping.png')
```

## Hypothesis Test of Average Law Enforcement Rate by Property Crime Group

I want to determine whether or not the difference in means between these two property crime rate groups is statistically significant. I will use a significance level (alpha) of 0.05. My hypotheses statements are as follows:

H0 (null hypothesis): m1 = m2

H1 (alternative hypothesis): m1 != m2

```
# assign variables
group1_pcr_er = sub1[sub1['Property Crime Group']=='lower half']['Employee Rate']
group2_pcr_er = sub1[sub1['Property Crime Group']=='upper half']['Employee Rate']
# run test
pcr_er_group_ztest = ztest(x1=group1_pcr_er, x2=group2_pcr_er)
# print results
print('Test Statistic')
print(pcr_er_group_ztest[0])
print('\n' + 'P-Value')
print('{:.25f}'.format(float(pcr_er_group_ztest[1])))
```

## Law Enforcement Rate by Property Crime Z-Test Results: Reject the Null Hypothesis

Since the p-value is well below 0.05, we reject the null hypothesis. The difference between the means of the two groups is statistically significant.

## Summary of Analysis

This week we saw that, on average, counties with a higher percentage of poverty have higher violent crime and property crime rates. We also saw that, on average, counties with higher crime rates have higher rates of employment for full-time law enforcement.