Capstone Project: An Algorithm for Predicting the Probability of the Presence of Heart Disease from Cardiovascular Results and Demographics

The following is a final report in completion of the Data Analysis & Interpretation Specialization by Wesleyan University.  You can view a PDF version of this report here.   Introduction The purpose of this study is to identify the best predictors of the presence of heart disease using multiple health and demographic factors such as … Continue reading Capstone Project: An Algorithm for Predicting the Probability of the Presence of Heart Disease from Cardiovascular Results and Demographics

Data Analysis & Interpretation 5.3: Preliminary Statistical Analyses

This week's assignment is as follows: Submit a blog entry that includes 1) a description of your preliminary statistical analyses and 2) some plots or graphs to help you convey the message. Descriptive Statistics The data is comprised of 180 patients: 124 males (69%) and 56 females (31%).  Of these, 80 patients have been diagnosed … Continue reading Data Analysis & Interpretation 5.3: Preliminary Statistical Analyses

Data Analysis & Interpretation 5.2: Methods (Sample, Measures, Analysis)

For this week's assignment, we are required to post a draft version of a segment of our report detailing the methods of our research. Methods Sample The data consists of n=180 observations of patients that have undergone cardiovascular tests and have been diagnosed with or without heart disease.  There are 14 features/fields in the data … Continue reading Data Analysis & Interpretation 5.2: Methods (Sample, Measures, Analysis)

Data Analysis & Interpretation 5.1: Introducing My Capstone Project on Heart Disease

An Algorithm to Predict the Probability of Heart Disease from Cardiovascular Results [This capstone project is part of the Data Analysis & Interpretation Specialization program by Wesleyan University on Coursera.] The purpose of this study is to identify the best predictors of the presence of heart disease using multiple health and demographic factors such as … Continue reading Data Analysis & Interpretation 5.1: Introducing My Capstone Project on Heart Disease

Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Week 4 This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based … Continue reading Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Week 3 Run a lasso regression analysis using k-fold cross validation to identify a subset of predictors from a larger pool of predictor variables that best predicts a quantitative response variable. The Features and Target of My Model The target variable (what's being predicted) is going to be Employed Rate. Since Lasso works best with … Continue reading Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Week 2 Run a Random Forest. You will need to perform a random forest analysis to evaluate the importance of a series of explanatory variables in predicting a binary, categorical response variable. The Features and Target of My Model These will be the same as the previous week. The target is Poverty Group (1= >16%, … Continue reading Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

Course 4: Machine Learning for Data Analysis This course focuses on various machine learning algorithms: Decision Trees, Random Forests, Lasso Regression, and K-Mean Cluster Analysis. Week 1 Run a Classification Tree. You will need to perform a decision tree analysis to test nonlinear relationships among a series of explanatory variables and a binary, categorical response … Continue reading Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

Data Analysis & Interpretation 3.4: Testing a Logistic Regression Model

Week 4 This week's assignment is to test a logistic regression model. Summarize in a few sentences 1) what you found, making sure you discuss the results for the associations between all of your explanatory variables and your response variable. Make sure to include statistical results (odds ratios, p-values, and 95% confidence intervals for the … Continue reading Data Analysis & Interpretation 3.4: Testing a Logistic Regression Model

Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model

Week 3 This week's assignment is to test a multiple regression model. Discuss the results for the associations between all of your explanatory variables and your response variable. Make sure to include statistical results (Beta coefficients and p-values) in your summary. 2) Report whether your results supported your hypothesis for the association between your primary … Continue reading Data Analysis & Interpretation 3.3: Testing a Multiple Regression Model