Data Analysis & Interpretation 5.2: Methods (Sample, Measures, Analysis)

For this week's assignment, we are required to post a draft version of a segment of our report detailing the methods of our research. Methods Sample The data consists of n=180 observations of patients that have undergone cardiovascular tests and have been diagnosed with or without heart disease.  There are 14 features/fields in the data … Continue reading Data Analysis & Interpretation 5.2: Methods (Sample, Measures, Analysis)

Visual Hack: Dynamic Scale with Horizontal Stacked Bar Chart

https://www.youtube.com/watch?v=twKJiA_Aqv4&feature=youtu.be In the above video you see what looks like a custom visual in Power BI.  However, I created this with a few simple steps by utilizing the built-in horizontal stacked bar chart in Power BI and a color scale created in Excel that is used as the background in the plot area.  Here's how … Continue reading Visual Hack: Dynamic Scale with Horizontal Stacked Bar Chart

Data Analysis & Interpretation 5.1: Introducing My Capstone Project on Heart Disease

An Algorithm to Predict the Probability of Heart Disease from Cardiovascular Results [This capstone project is part of the Data Analysis & Interpretation Specialization program by Wesleyan University on Coursera.] The purpose of this study is to identify the best predictors of the presence of heart disease using multiple health and demographic factors such as … Continue reading Data Analysis & Interpretation 5.1: Introducing My Capstone Project on Heart Disease

How are Cancer Rates Trending? 1990-2016

Data is provided by IHME through their GBD Results Tool. The data consists of 29 cancer types broken down by three measures (Incidence, Prevalence, Deaths), from the years 1990-2016.   In [1]: # setup environment import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline Data In [2]: # read in data cancer = pd.read_csv('IHME-GBD_2016_DATA-b922583c-1.csv') In [3]: … Continue reading How are Cancer Rates Trending? 1990-2016

Predicting CO2 Emissions from Vehicles with Multivariate Linear Regression

The data is provided by the government of Canada and provide model-specific fuel consumption ratings and estimated carbon dioxide (CO2) emissions for new light-duty vehicles for retail sale in Canada. For this project I utilized the datasets from 2010 to 2018. What I want to see is how well CO2 emissions from these vehicles can be predicted … Continue reading Predicting CO2 Emissions from Vehicles with Multivariate Linear Regression

What Are the Cancer Rates of Various Metrics for Each US State in 2016?

The following data visualization was developed in Microsoft Power BI.  It allows you to explore cancer rates by cancer type, sex, and US state from various metric perspectives: prevalence, death, DALYs (Disability-adjusted life years), and YLLs (Years of life lost).  The data is compiled by IHME (Institute for Health Metrics & Evaluation) and can be … Continue reading What Are the Cancer Rates of Various Metrics for Each US State in 2016?

Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Week 4 This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based … Continue reading Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Predicting Breast Cancer Using Logistic Regression

This dataset is part of the Scikit-learn dataset package. It is from the Breast Cancer Wisconsin (Diagnostic) Database and contains 569 instances of tumors that are identified as either benign (357 instances) or malignant (212 instances). This machine learning project seeks to predict the classification of breast tumors as either malignant or benign. More information … Continue reading Predicting Breast Cancer Using Logistic Regression

Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Week 3 Run a lasso regression analysis using k-fold cross validation to identify a subset of predictors from a larger pool of predictor variables that best predicts a quantitative response variable. The Features and Target of My Model The target variable (what's being predicted) is going to be Employed Rate. Since Lasso works best with … Continue reading Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Week 2 Run a Random Forest. You will need to perform a random forest analysis to evaluate the importance of a series of explanatory variables in predicting a binary, categorical response variable. The Features and Target of My Model These will be the same as the previous week. The target is Poverty Group (1= >16%, … Continue reading Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest