White Men are More Prone to Suicide

Gun Deaths 2012-2014

The data for this analysis can be found on FiveThirtyEight’s GitHub page here or on Data.World (user: azel) here. Documentation, along with an analysis of the data, can be found on FiveThirtyEight’s site here.

The primary purpose of this project is to gain insight into the distributions of gun deaths by age and intent, as well as the proportions of gun deaths in the areas of intent, sex, and ethnicity. One of the most surprising findings of this analysis is that over 60% of gun deaths are suicides.  Over 85% of suicides are men.  Broken down by ethnicity, over 85% are white.

 

# Setup environment
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Read in data, clean, and review
gd = pd.read_csv('gun_deaths_2012-2014.csv')
gd.head()
Unnamed: 0 year month intent police sex age race hispanic place education
0 1 2012 1 Suicide 0 M 34.0 Asian/Pacific Islander 100 Home BA+
1 2 2012 1 Suicide 0 F 21.0 White 100 Street Some college
2 3 2012 1 Suicide 0 M 60.0 White 100 Other specified BA+
3 4 2012 2 Suicide 0 M 64.0 White 100 Home BA+
4 5 2012 2 Suicide 0 M 31.0 White 100 Other specified HS/GED
gd.tail()
Unnamed: 0 year month intent police sex age race hispanic place education
100793 100794 2014 12 Homicide 0 M 36.0 Black 100 Home HS/GED
100794 100795 2014 12 Homicide 0 M 19.0 Black 100 Street HS/GED
100795 100796 2014 12 Homicide 0 M 20.0 Black 100 Street HS/GED
100796 100797 2014 12 Homicide 0 M 22.0 Hispanic 260 Street Less than HS
100797 100798 2014 10 Homicide 0 M 43.0 Black 100 Other unspecified HS/GED
# Data types
gd.dtypes
Unnamed: 0      int64
year            int64
month           int64
intent         object
police          int64
sex            object
age           float64
race           object
hispanic        int64
place          object
education      object
dtype: object
# Remove Unnamed: 0 column
gd = gd.drop('Unnamed: 0', axis='columns')
gd.columns
Index(['year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic',
       'place', 'education'],
      dtype='object')
# The hispanic column is for clarifying the hispanic origin
# (e.g. Mexican, Puerto Rican, Cuban) if the race column is Hispanic.
# This is not necessary for my analysis, so I can remove the hispanic column.
gd = gd.drop('hispanic', axis='columns')
gd.columns
Index(['year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'place',
       'education'],
      dtype='object')
# Rename race to ethnicity
gd.rename(columns={'race':'ethnicity'}, inplace=True)
gd.columns
Index(['year', 'month', 'intent', 'police', 'sex', 'age', 'ethnicity', 'place',
       'education'],
      dtype='object')
gd.shape
(100798, 9)
# Which columns have nulls?
gd.isna().any()
year         False
month        False
intent        True
police       False
sex          False
age           True
ethnicity    False
place         True
education     True
dtype: bool
# How many nulls are in each column?
gd.isnull().sum()
year            0
month           0
intent          1
police          0
sex             0
age            18
ethnicity       0
place        1384
education    1422
dtype: int64
# There are only 18 nulls in age.
# Remove the rows with nulls in age.
gd = gd[gd['age'].notnull()]
gd.shape
(100780, 9)
# Replace the nulls in the other columns with
gd.update(gd['intent'].fillna('Undetermined'))
gd.update(gd['place'].fillna('Other unspecified'))
gd.update(gd['education'].fillna('Unknown'))

# Check that there are no more nulls
gd.isnull().sum()
year         0
month        0
intent       0
police       0
sex          0
age          0
ethnicity    0
place        0
education    0
dtype: int64
# Overall age distribution
sns.distplot(gd['age'], bins=range(0,100,5), color='darkred', kde=False, 
             hist_kws=dict(edgecolor='white', linewidth=0.5, alpha=0.8))
sns.despine()
plt.title('Age Distribution of Gun Deaths (2012-2014)', fontweight='bold', loc='left', y=1.05)
plt.tight_layout()
plt.savefig('age_dist')
plt.show()

age_dist

Age Distribution

The age peaks between 20-25 years, steadily decreases, peaks again between 50-55 years and steadily decreases from there.  This bimodal distribution could mean we’re dealing with two distinct groups.  Let’s overlay age distributions by intent, specifically homicide and suicide, to see if these intents contribute more to certain age ranges.

 

# Age distribution by homicide and suicide
fig = plt.subplots()
sns.distplot(gd['age'][gd['intent']=='Homicide'], bins=range(0,100,5), color='darkred', label='Homicide', kde=False, 
             hist_kws=dict(edgecolor='white', linewidth=0.5, alpha=0.6))
sns.distplot(gd['age'][gd['intent']=='Suicide'], bins=range(0,100,5), color='gray', label='Suicide', kde=False, 
             hist_kws=dict(edgecolor='white', linewidth=0.5, alpha=0.7))
plt.legend()
sns.despine()
plt.title('Homicide and Suicide Gun Deaths\nAge Distribution (2012-2014)', fontweight='bold', loc='left', y=1.05)
plt.tight_layout()
plt.savefig('homicide_suicide_dist')
plt.show()

homicide_suicide_dist

Homicide and Suicide Age Distributions

Homicidal gun deaths are more common between the ages of 15-35, whereas suicidal gun deaths are more common between the ages of 35-95. Homicides peak between 20-25, whereas suicides peak between 50-55. The increase of suicides among older individuals could be due to things like “mid-life crisis”, increased hardships (e.g. job loss), loss of aging friends and family, etc. Additionally, suicidal gun deaths have a greater spread among ages, whereas homicidal gun deaths are strongly skewed to the right.

 

# How many gun deaths were there for each year?
deaths_by_yr = gd['year'].value_counts().sort_index()
deaths_by_yr
2012    33560
2013    33625
2014    33595
Name: year, dtype: int64
deaths_by_yr.mean()
33593.333333333336

Gun Deaths per Year

On average, over 33,500 people are killed by guns each year.

 

# What are gun deaths by year and intent?
gd['event'] = gd.index + 1
yr_intent_pvt = gd.pivot_table(values='event', index=['year','intent'], aggfunc='count')
yr_intent_pvt
event
year intent
2012 Accidental 548
Homicide 12093
Suicide 20663
Undetermined 256
2013 Accidental 505
Homicide 11666
Suicide 21172
Undetermined 282
2014 Accidental 585
Homicide 11408
Suicide 21333
Undetermined 269

 

# Plot the above pivot table of gun deaths by year and intent
yr_intent_pvt.sort_index(ascending=False).plot.barh(legend=False, color='darkred', width=0.7)
sns.despine()
plt.title('Gun Deaths by Year and Intent', loc='left', fontweight='bold', y=1.05)
plt.xlabel('')
plt.ylabel('')
plt.xticks(range(0,25000,5000))
plt.tight_layout()
plt.savefig('gun_deaths_by_yr_intent')
plt.show()

gun_deaths_by_yr_intent

# What are gun deaths by year and sex?
yr_sex_pvt = gd.pivot_table(values='event', index=['year','sex'], aggfunc='count')
yr_sex_pvt.sort_index(ascending=False).plot.barh(legend=False, color='darkred', width=0.7)
sns.despine()
plt.title('Gun Deaths by Year and Sex', loc='left', fontweight='bold', y=1.05)
plt.xlabel('')
plt.ylabel('')
plt.xticks(range(0,35000,5000))
plt.tight_layout()
plt.savefig('gun_deaths_by_yr_sex')
plt.show()

gun_deaths_by_yr_sex

Gun Deaths: Majority Suicide and Male

So far we’ve seen that the majority of gun deaths are due to suicides. In fact, whereas homicidal gun deaths slightly decreased from 2012-2014, suicidal gun deaths actually increased. Additionally, the vast majority of gun deaths are male.  Let’s explore suicide further by looking at its percentage overall.

 

# Overall deaths by intent
intent_total = gd[['intent','event']].groupby('intent')['event'].count()
intent_total
intent
Accidental       1638
Homicide        35167
Suicide         63168
Undetermined      807
Name: event, dtype: int64
# Intent proportions
total_deaths = gd.shape[0]
intent_prop = round(intent_total / total_deaths * 100,1)
intent_prop = intent_prop.to_dict()
intent_prop
{'Accidental': 1.6, 'Homicide': 34.9, 'Suicide': 62.7, 'Undetermined': 0.8}
# Since Undetermined makes us less than 1% of the data, remove from analysis
intent_prop.pop('Undetermined')
intent_prop
{'Accidental': 1.6, 'Homicide': 34.9, 'Suicide': 62.7}
# What is the average count of suicides per year?
gd_suicide = gd[gd['intent']=='Suicide']
gd_suicide.pivot_table(values='event', index='year', aggfunc='count').mean()
event    21056.0
dtype: float64
# Plot intent proportions with waffle chart
from pywaffle import Waffle

def waffle_chart(width, height, nrows, values, lgnd_loc, lgnd_cols, lgnd_s, icon, icon_s, colors, title):
    fig = plt.figure(
        figsize=(width,height),
        FigureClass=Waffle,
        rows=nrows,
        values=values,
        labels=["{0} ({1}%)".format(k,v) for k, v in values.items()],
        legend={'loc':lgnd_loc, 'bbox_to_anchor': (1,1), 'ncol':lgnd_cols, 'framealpha': 0, 'fontsize':lgnd_s},
        icons=icon,
        icon_size=icon_s,
        icon_legend=True,
        colors=colors
    )
    plt.title(title, loc='left', fontweight='bold', y=1.05)
 
waffle_chart(8, 5, 5, intent_prop, 'upper left', 1, 'large', 'child', 18, ('darkgray','gray','darkred'), 
 'Over 60% of gun deaths are due to suicide')
plt.text(4.25, .18, 'An average of 21,056 suicides\nby gun occured each year.', size=12, color='darkred')
plt.savefig('intent_prop', bbox_inches='tight')
plt.show()

intent_prop

Proportion of Gun Deaths by Intent

Roughly 62% (nearly two-thirds) of gun deaths are suicides. Gun deaths with an intent of “Undetermined” were left out of the analysis as they made up less than 1% of the data.

 

# Out of suicides, what are the proportions by sex?
gender_suicide = gd_suicide[['sex','event']].groupby('sex')['event'].count()
total_suicides = len(gd_suicide)
gender_suicide_prop = round(gender_suicide / total_suicides * 100, 1)
gender_suicide_prop = gender_suicide_prop.to_dict()
gender_suicide_prop
{'F': 13.8, 'M': 86.2}
waffle_chart(8, 5, 5, gender_suicide_prop, 'upper left', 1, 'large', 'child', 18, 
 ('gray','darkred'), 'Over 85% of Suicide Gun Deaths are Men')
plt.title('Over 85% of Suicide Gun Deaths are Men', loc='left', fontweight='bold', y=1.05)
plt.savefig('suicide_gender_prop', bbox_inches='tight')
plt.show()

suicide_gender_prop

# Out of suicides, what are the proportions by ethnicity?
ethnicity_suicides = gd_suicide[['ethnicity','event']].groupby('ethnicity')['event'].count()
ethnicity_suicides_prop = round(ethnicity_suicides / total_suicides * 100, 1)
ethnicity_suicides_prop = ethnicity_suicides_prop.to_dict()
ethnicity_suicides_prop
{'Asian/Pacific Islander': 1.2,
 'Black': 5.3,
 'Hispanic': 5.0,
 'Native American/Native Alaskan': 0.9,
 'White': 87.7}
# Create new dict where non-white are grouped together as 'other'
# Exclude 'Native American/Native Alaskan', as they make up less than 1%
# and the waffle chart will display better this way
d = ethnicity_suicides_prop.copy()
white_suicides_prop = {
 'Other':sum((d['Black'],d['Hispanic'],d['Asian/Pacific Islander'])),
 'White':d['White']
}
white_suicides_prop
{'Other': 11.5, 'White': 87.7}
waffle_chart(8, 5, 5, white_suicides_prop, 'upper left', 1, 'large', 'child', 18, 
 ('gray','darkred'), 'Whites make up over 85% of suicide gun deaths')
plt.savefig('white_suicides_prop', facecolor='#d3d3d3', bbox_inches='tight')
plt.show()

white_suicides_prop

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s