The following is a final report in completion of the Data Analysis & Interpretation Specialization by Wesleyan University. You can view a PDF version of this report here. Introduction The purpose of this study is to identify the best predictors of the presence of heart disease using multiple health and demographic factors such as … Continue reading Capstone Project: An Algorithm for Predicting the Probability of the Presence of Heart Disease from Cardiovascular Results and Demographics
This week's assignment is as follows: Submit a blog entry that includes 1) a description of your preliminary statistical analyses and 2) some plots or graphs to help you convey the message. Descriptive Statistics The data is comprised of 180 patients: 124 males (69%) and 56 females (31%). Of these, 80 patients have been diagnosed … Continue reading Data Analysis & Interpretation 5.3: Preliminary Statistical Analyses
For this week's assignment, we are required to post a draft version of a segment of our report detailing the methods of our research. Methods Sample The data consists of n=180 observations of patients that have undergone cardiovascular tests and have been diagnosed with or without heart disease. There are 14 features/fields in the data … Continue reading Data Analysis & Interpretation 5.2: Methods (Sample, Measures, Analysis)
An Algorithm to Predict the Probability of Heart Disease from Cardiovascular Results [This capstone project is part of the Data Analysis & Interpretation Specialization program by Wesleyan University on Coursera.] The purpose of this study is to identify the best predictors of the presence of heart disease using multiple health and demographic factors such as … Continue reading Data Analysis & Interpretation 5.1: Introducing My Capstone Project on Heart Disease
Data is provided by IHME through their GBD Results Tool. The data consists of 29 cancer types broken down by three measures (Incidence, Prevalence, Deaths), from the years 1990-2016. In : # setup environment import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline Data In : # read in data cancer = pd.read_csv('IHME-GBD_2016_DATA-b922583c-1.csv') In : … Continue reading How are Cancer Rates Trending? 1990-2016
The following data visualization was developed in Microsoft Power BI. It allows you to explore cancer rates by cancer type, sex, and US state from various metric perspectives: prevalence, death, DALYs (Disability-adjusted life years), and YLLs (Years of life lost). The data is compiled by IHME (Institute for Health Metrics & Evaluation) and can be … Continue reading What Are the Cancer Rates of Various Metrics for Each US State in 2016?
This dataset is part of the Scikit-learn dataset package. It is from the Breast Cancer Wisconsin (Diagnostic) Database and contains 569 instances of tumors that are identified as either benign (357 instances) or malignant (212 instances). This machine learning project seeks to predict the classification of breast tumors as either malignant or benign. More information … Continue reading Predicting Breast Cancer Using Logistic Regression
Maternal Mortality Ratio 1990-2015 The project analyzes the maternal mortality ratio (MMR, out of every 100,000 live births) across countries from 1990-2015. The data for this project can be accessed on the UNICEF website here. A detailed description of how the data was compiled and measured can be found here. This analysis clearly shows that … Continue reading How Has the Maternal Mortality Ratio Improved from 1990-2015?
The data used, Gapminder, originally comes from http://www.gapminder.org. The version used here is provided by Jennifer Bryan and can be found here. The purpose of this project is to demonstrate how to read in data using Pandas read_csv method, inspect the data, perform basic manipulation and visualization, and communicate my findings. Note: "Per-capita GDP (Gross … Continue reading Gapminder: Life Expectancy and Per Capita GDP by Continent