Capstone Project: An Algorithm for Predicting the Probability of the Presence of Heart Disease from Cardiovascular Results and Demographics

The following is a final report in completion of the Data Analysis & Interpretation Specialization by Wesleyan University.  You can view a PDF version of this report here.   Introduction The purpose of this study is to identify the best predictors of the presence of heart disease using multiple health and demographic factors such as … Continue reading Capstone Project: An Algorithm for Predicting the Probability of the Presence of Heart Disease from Cardiovascular Results and Demographics

Data Analysis & Interpretation 5.3: Preliminary Statistical Analyses

This week's assignment is as follows: Submit a blog entry that includes 1) a description of your preliminary statistical analyses and 2) some plots or graphs to help you convey the message. Descriptive Statistics The data is comprised of 180 patients: 124 males (69%) and 56 females (31%).  Of these, 80 patients have been diagnosed … Continue reading Data Analysis & Interpretation 5.3: Preliminary Statistical Analyses

Visual Hack: Dynamic Scale with Horizontal Stacked Bar Chart

https://www.youtube.com/watch?v=twKJiA_Aqv4&feature=youtu.be In the above video you see what looks like a custom visual in Power BI.  However, I created this with a few simple steps by utilizing the built-in horizontal stacked bar chart in Power BI and a color scale created in Excel that is used as the background in the plot area.  Here's how … Continue reading Visual Hack: Dynamic Scale with Horizontal Stacked Bar Chart

How are Cancer Rates Trending? 1990-2016

Data is provided by IHME through their GBD Results Tool. The data consists of 29 cancer types broken down by three measures (Incidence, Prevalence, Deaths), from the years 1990-2016.   In [1]: # setup environment import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline Data In [2]: # read in data cancer = pd.read_csv('IHME-GBD_2016_DATA-b922583c-1.csv') In [3]: … Continue reading How are Cancer Rates Trending? 1990-2016

What Are the Cancer Rates of Various Metrics for Each US State in 2016?

The following data visualization was developed in Microsoft Power BI.  It allows you to explore cancer rates by cancer type, sex, and US state from various metric perspectives: prevalence, death, DALYs (Disability-adjusted life years), and YLLs (Years of life lost).  The data is compiled by IHME (Institute for Health Metrics & Evaluation) and can be … Continue reading What Are the Cancer Rates of Various Metrics for Each US State in 2016?

Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Week 4 This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based … Continue reading Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model

Week 3 This week's assignment is to test a multiple regression model. Discuss the results for the associations between all of your explanatory variables and your response variable. Make sure to include statistical results (Beta coefficients and p-values) in your summary. 2) Report whether your results supported your hypothesis for the association between your primary … Continue reading Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model

Data Analysis & Interpretation 2.4: Hypothesis Tests Using a Moderator

Week 4 Run an ANOVA, Chi-Square Test or correlation coefficient that includes a moderator. A moderator is a third variable that has an affect on the direction and/or strength of the relationship between an explanatory variable and a response variable. I will run an ANOVA test and a correlation coefficient using Metropolitan as the moderator. … Continue reading Data Analysis & Interpretation 2.4: Hypothesis Tests Using a Moderator

Data Analysis & Interpretation 2.3: Pearson Correlation

Week 3 Generate a correlation coefficient. Poverty vs. Population How do poverty and population relate to one another?   In [96]: # plot regplot of poverty vs. population sns.regplot('TotalPop', 'Poverty', data=df, color='steelblue') sns.despine() plt.title('Poverty vs. Population', loc='left', fontweight='bold', y=1.02) There's obviously a negative correlation, but let's exclude extremely high populations (>= 3,000,000) in order to "zoom-in" … Continue reading Data Analysis & Interpretation 2.3: Pearson Correlation

Data Analysis & Interpretation 1.4: More Data Visualizations and Z-Tests

Week 4 This week we are supposed to focus on plotting univariate and bivariate graphs. STEP 1: Create graphs of your variables one at a time (univariate graphs). Examine both their center and spread. STEP 2: Create a graph showing the association between your explanatory and response variables (bivariate graph). Your output should be interpretable … Continue reading Data Analysis & Interpretation 1.4: More Data Visualizations and Z-Tests