NamUS- Missing Persons in Hillsborough County, FL

NamUS stands for National Missing and Unidentified Persons System. It is administered by the National Institute of Justice and managed through a cooperative agreement with the UNT Health Science Center in Fort Worth, TX. In order to access their database you must create an account. There is no cost to create an account or access their database.

The following analysis is of missing persons data in Hillsborough County, Florida. It is important to keep in mind that this dataset does not necessarily represent all missing persons cases in Hillsorough County. The records available on NamUS are submitted by individuals and various law enforcement departments.  Keep in mind, too, that this dataset spans 1960-2018 and there’s only 67 missing persons records during this time.  Additionally, this dataset does not contain latitude and longitude data which would allow for more in-depth analysis, such as density mapping.

Hillsborough Cities with Missing Persons

There are a total of 67 missing persons records in Hillsborough County from the NamUS site. Thirteen Hillsborough cities, out of 31, are represented in the NamUS records. The majority of these cases are from Tampa (48 or 71.6%). The city with the second-most cases is Plant City (3 or 4.5%).

Tampa             0.716418
Plant City        0.044776
Brandon           0.029851
Ruskin            0.029851
St. Petersburg    0.029851
Gibsonton         0.029851
Riverview         0.029851
Dover             0.014925
Lithia            0.014925
Seffner           0.014925
Temple Terrace    0.014925
Valrico           0.014925
Thonotosassa      0.014925

Age Distributions

The majority of missing persons cases in Hillsborough consist of people between the ages of 15 and 45, with a peak between 30 and 35. The overall distribution is slightly right skewed, meaning the bulk of the persons are on the younger side of the age spectrum.

By breaking out the age distribution by sex, we see that there is far more spread for females than for males. Even though their average ages are similar, their distributions are quite different. Males have a high peak between the ages of 30-35. Females peak between 15-20 but have other high-points between 10-15 and 25-30. On average, then, missing females are younger than missing males. Females make up about 52% of the recorded cases, whereas males make up about 48% of the recorded cases.

missing_age_dist

missing_age_dist_sex

Percentage of Missing Persons by Race/Ethnicity

Of this dataset, 70% of the missing persons are white. Hispanic and Black are the next highest at 11.9% and 10.5% respectively. Let’s breakdown the age distribution by White, Hispanic, and Black. Keep in mind that this is a small dataset and Hispanic and Black combined make up 15 persons in the dataset.

White / Caucasian                      0.701493
Hispanic / Latino                      0.119403
Black / African American               0.104478
White / Caucasian,Hispanic / Latino    0.044776
Other                                  0.014925
Asian                                  0.014925

Age Distributions by Ethnicity

As would be expected, since White has more records, there is much more spread in the distribution. White has a majority of missing persons between the ages of 20-35, with a peak between 30-35. Hispanic is clustered between 0-20 and Black is clustered between 25-35.

missing_age_dist_ethnicity

Ethnic Age Distribution by City and Final Conclusions

This visual lets us see both the age distribution by city and the ethnic make-up of said distribution. Again, strong conclusions should not be made on this limited dataset, but some insights can be gathered. Assuming that the dataset is representative, to some degree, of all missing persons cases in Hillsborough County, we can gather that the majority of missing persons cases occur in Tampa and these primarily consist of middle-aged white persons.

ethnic_age_dist_by_city


The full code for this analysis is provided below.

 

In [1]:

# Setup environment
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [2]:
# Import data; identities have been removed from the data
missing = pd.read_csv('NamUS_Missing_Persons_Hillsborough_FL_cleaned.csv')
In [3]:
# Summarize data
missing.head()
Out[3]:
DLC Missing Age City County State Sex Race / Ethnicity Date Modified
0 6/1/2018 30 Tampa Hillsborough FL Male Hispanic / Latino 7/5/2018
1 5/29/2018 21 St. Petersburg Hillsborough FL Male White / Caucasian 7/5/2018
2 12/6/2017 45 Tampa Hillsborough FL Female White / Caucasian 6/28/2018
3 9/1/2017 33 Tampa Hillsborough FL Male Other 6/1/2018
4 8/6/2017 47 Tampa Hillsborough FL Female White / Caucasian 2/13/2018
In [4]:
missing.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67 entries, 0 to 66
Data columns (total 8 columns):
DLC                 67 non-null object
Missing Age         67 non-null int64
City                67 non-null object
County              67 non-null object
State               67 non-null object
Sex                 67 non-null object
Race / Ethnicity    67 non-null object
Date Modified       67 non-null object
dtypes: int64(1), object(7)
memory usage: 4.3+ KB
In [5]:
# Convert DLC (Date of Last Contact) to a datetime data type
import datetime as dt

missing['DLC'] = pd.to_datetime(missing['DLC'])

# Extract the month and year into separate columns
missing['DLC_Month'] = pd.DatetimeIndex(missing['DLC']).month
missing['DLC_Year'] = pd.DatetimeIndex(missing['DLC']).year
In [6]:
missing.head()
Out[6]:
DLC Missing Age City County State Sex Race / Ethnicity Date Modified DLC_Month DLC_Year
0 2018-06-01 30 Tampa Hillsborough FL Male Hispanic / Latino 7/5/2018 6 2018
1 2018-05-29 21 St. Petersburg Hillsborough FL Male White / Caucasian 7/5/2018 5 2018
2 2017-12-06 45 Tampa Hillsborough FL Female White / Caucasian 6/28/2018 12 2017
3 2017-09-01 33 Tampa Hillsborough FL Male Other 6/1/2018 9 2017
4 2017-08-06 47 Tampa Hillsborough FL Female White / Caucasian 2/13/2018 8 2017
In [7]:
# Using DLC_Year, how many years does this data span?
min_year = min(missing['DLC_Year'])
max_year = max(missing['DLC_Year'])
print('Based on Date of Last Contact, this data spans from ' + str(min_year) + ' to ' + str(max_year) + '.')
Based on Date of Last Contact, this data spans from 1960 to 2018.
In [8]:
# How many cases are recorded for each year?
missing_year_count = missing['DLC_Year'].value_counts().sort_index(ascending=False)
missing_year_count
Out[8]:
2018    2
2017    5
2016    2
2015    3
2014    3
2013    1
2012    1
2011    3
2010    2
2008    2
2007    3
2006    1
2005    5
2002    1
2001    2
1998    2
1997    3
1995    1
1994    4
1993    2
1992    1
1991    1
1989    2
1988    2
1985    1
1984    1
1983    1
1982    2
1981    1
1979    1
1978    2
1975    1
1972    1
1970    1
1960    1
Name: DLC_Year, dtype: int64

Note that there are gaps between years.

In [9]:
# How many records are there in this dataset?
missing_count = len(missing)
missing_count
Out[9]:
67
In [10]:
# There are 31 cities in Hillsborough County.  Which of these cities are represented in the data
# and how many NamUS records do they have?
cities_count = len(missing['City'].unique())
cities_count
Out[10]:
13
In [11]:
missing_city_counts = missing['City'].value_counts()
missing_city_counts
Out[11]:
Tampa             48
Plant City         3
Brandon            2
Ruskin             2
St. Petersburg     2
Gibsonton          2
Riverview          2
Valrico            1
Lithia             1
Dover              1
Seffner            1
Temple Terrace     1
Thonotosassa       1
Name: City, dtype: int64
In [12]:
missing_city_prop = missing['City'].value_counts(normalize=True)
missing_city_prop
Out[12]:
Tampa             0.716418
Plant City        0.044776
Brandon           0.029851
Ruskin            0.029851
St. Petersburg    0.029851
Gibsonton         0.029851
Riverview         0.029851
Valrico           0.014925
Lithia            0.014925
Dover             0.014925
Seffner           0.014925
Temple Terrace    0.014925
Thonotosassa      0.014925
Name: City, dtype: float64

Hillsborough Cities with Missing Persons

There are a total of 67 missing persons records in Hillsborough County from the NamUS site. Thirteen Hillsborough cities, out of 31, are represented in the NamUS records. The majority of these cases are from Tampa (48 or 71.6%). The city with the second-most cases is Plant City (3 or 4.5%).

 

In [13]:
# Plot age distribution
gray_blue = '#00264c'
mean_age = missing['Missing Age'].mean()

age_dist = sns.distplot(missing['Missing Age'], color=gray_blue, bins=range(0,70,5), kde=False, hist_kws={'edgecolor':'white'})
age_dist
sns.despine(left=True, bottom=True)
plt.title('Missing Age Distribution', loc='left', fontweight='bold', y=1.02)
plt.tick_params(left=False, bottom=False)
plt.axvline(mean_age, color='black', label='Avg. Age')
plt.legend(frameon=False)
plt.savefig('missing_age_dist.png')

missing_age_dist

In [14]:

# Plot age distribution by sex
male_mean_age = missing['Missing Age'][missing['Sex']=='Male'].mean()
female_mean_age = missing['Missing Age'][missing['Sex']=='Female'].mean()

fig = plt.subplots()
num_bins = bins=range(0,70,5)
male_age_dist = sns.distplot(missing['Missing Age'][missing['Sex']=='Male'], kde=False, color=gray_blue, 
                             bins=num_bins, hist_kws={'edgecolor':'white'}, label='Male')

female_age_dist = sns.distplot(missing['Missing Age'][missing['Sex']=='Female'], kde=False, color='sandybrown',
                              bins=num_bins, hist_kws={'edgecolor':'white', 'alpha':0.5}, label='Female')

male_age_dist
female_age_dist
sns.despine(left=True, bottom=True)
plt.tick_params(left=False, bottom=False)
plt.title('Missing Age Distribution by Sex', loc='left', fontweight='bold', y=1.02)
plt.axvline(male_mean_age, color=gray_blue, label='Male Avg. Age')
plt.axvline(female_mean_age, color='sandybrown', label='Female Avg. Age')
plt.legend(frameon=False)
plt.savefig('missing_age_dist_sex.png')
plt.show()

missing_age_dist_sex

In [15]:

# What are the proportions by sex?
missing_sex_prop = missing['Sex'].value_counts(normalize=True)
missing_sex_prop
Out[15]:
Female    0.522388
Male      0.477612
Name: Sex, dtype: float64

Age Distributions

The majority of missing persons cases in Hillsborough consist of people between the ages of 15 and 45, with a peak between 30 and 35. The overall distribution is slightly right skewed, meaning the bulk of the persons are on the younger side of the age spectrum.

By breaking out the age distribution by sex, we see that there is far more spread for females than for males. Even though their average ages are similar, their distributions are quite different. Males have a high peak between the ages of 30-35. Females peak between 15-20 but have other high-points between 10-15 and 25-30. On average, then, missing females are younger than missing males. Females make up about 52% of the recorded cases, whereas males make up about 48% of the recorded cases.

 

In [16]:
# How many cases are there by race/ethnicity?
missing_race_count = missing['Race / Ethnicity'].value_counts()
missing_race_count
Out[16]:
White / Caucasian                      47
Hispanic / Latino                       8
Black / African American                7
White / Caucasian,Hispanic / Latino     3
Other                                   1
Asian                                   1
Name: Race / Ethnicity, dtype: int64
In [17]:
# What are the proportions by race/ethnicity?
missing_race_prop = missing['Race / Ethnicity'].value_counts(normalize=True)
missing_race_prop
Out[17]:
White / Caucasian                      0.701493
Hispanic / Latino                      0.119403
Black / African American               0.104478
White / Caucasian,Hispanic / Latino    0.044776
Other                                  0.014925
Asian                                  0.014925
Name: Race / Ethnicity, dtype: float64

Percentage of Missing Persons by Race/Ethnicity

Of this dataset, 70% of the missing persons are white. Hispanic and Black are the next highest at 11.9% and 10.5% respectively. Let’s breakdown the age distribution by White, Hispanic, and Black. Keep in mind that this is a small dataset and Hispanic and Black combined make up 15 persons in the dataset.

 

In [18]:
# Plot age distribution by White, Hispanic, and Black
whb_subset = missing[missing['Race / Ethnicity'].isin(['White / Caucasian','Hispanic / Latino','Black / African American'])]

white_age = whb_subset['Missing Age'][whb_subset['Race / Ethnicity']=='White / Caucasian']
hispanic_age = whb_subset['Missing Age'][whb_subset['Race / Ethnicity']=='Hispanic / Latino']
black_age = whb_subset['Missing Age'][whb_subset['Race / Ethnicity']=='Black / African American']

fig = plt.subplots()
num_bins = bins=range(0,70,5)

white_age_dist = sns.distplot(white_age, kde=False, bins=num_bins, color=gray_blue,
                              hist_kws={'edgecolor':'white', 'alpha':0.3}, label='White')

hispanic_age_dist = sns.distplot(hispanic_age, kde=False, bins=num_bins, color='orangered',
                                 hist_kws={'edgecolor':'white', 'alpha':0.4}, label='Hispanic')

black_age_dist = sns.distplot(black_age, kde=False, bins=num_bins, color='rebeccapurple',
                             hist_kws={'edgecolor':'white', 'alpha':0.4}, label='Black')

white_age_dist
hispanic_age_dist
black_age_dist
sns.despine(left=True, bottom=True)
plt.title('Missing Age Distribution by Ethnicity', loc='left', fontweight='bold', y=1.02)
plt.tick_params(left=False, bottom=False)
plt.legend(frameon=False)
plt.savefig('missing_age_dist_ethnicity.png')
plt.show()

missing_age_dist_ethnicity

Age Distributions by Ethnicity

As would be expected, since White has more records, there is much more spread in the distribution. White has a majority of missing persons between the ages of 20-35, with a peak between 30-35. Hispanic is clustered between 0-20 and Black is clustered between 25-35.

 

In [19]:
# How are ages dispersed by city?
fig = plt.subplots(figsize=(8,8))
sns.stripplot(x=missing['Missing Age'], y=missing['City'], jitter=0.25, orient='h', hue=missing['Race / Ethnicity'],
              palette=['orangered','royalblue','gray','rebeccapurple','lightgreen','goldenrod'])
sns.despine(left=True, bottom=True)
plt.tick_params(left=False, bottom=False)
plt.title('Ethnic Age Distribution by City', loc='left', fontweight='bold', y=1.02)
plt.legend(frameon=False, loc='right', bbox_to_anchor=(1.5,0.9))
plt.tight_layout()
plt.savefig('ethnic_age_dist_by_city.png', bbox_inches='tight')
plt.show()

ethnic_age_dist_by_city

Ethnic Age Distribution by City and Final Conclusions

This visual lets us see both the age distribution by city and the ethnic make-up of said distribution. Again, strong conclusions should not be made on this limited dataset, but some insights can be gathered. Assuming that the dataset is representative, to some degree, of all missing persons cases in Hillsborough County, we can gather that the majority of missing persons cases occur in Tampa and these primarily consist of middle-aged white persons.

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s