The following is a final report in completion of the Data Analysis & Interpretation Specialization by Wesleyan University. You can view a PDF version of this report here. Introduction The purpose of this study is to identify the best predictors of the presence of heart disease using multiple health and demographic factors such as … Continue reading Capstone Project: An Algorithm for Predicting the Probability of the Presence of Heart Disease from Cardiovascular Results and Demographics

# Category: Machine Learning

# Predicting CO2 Emissions from Vehicles with Multivariate Linear Regression

The data is provided by the government of Canada and provide model-specific fuel consumption ratings and estimated carbon dioxide (CO2) emissions for new light-duty vehicles for retail sale in Canada. For this project I utilized the datasets from 2010 to 2018. What I want to see is how well CO2 emissions from these vehicles can be predicted … Continue reading Predicting CO2 Emissions from Vehicles with Multivariate Linear Regression

# Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

Week 4 This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based … Continue reading Data Analysis & Interpretation 4.4: K-Means Cluster Analysis

# Predicting Breast Cancer Using Logistic Regression

This dataset is part of the Scikit-learn dataset package. It is from the Breast Cancer Wisconsin (Diagnostic) Database and contains 569 instances of tumors that are identified as either benign (357 instances) or malignant (212 instances). This machine learning project seeks to predict the classification of breast tumors as either malignant or benign. More information … Continue reading Predicting Breast Cancer Using Logistic Regression

# Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

Week 3 Run a lasso regression analysis using k-fold cross validation to identify a subset of predictors from a larger pool of predictor variables that best predicts a quantitative response variable. The Features and Target of My Model The target variable (what's being predicted) is going to be Employed Rate. Since Lasso works best with … Continue reading Data Analysis & Interpretation 4.3: Predicting Employed Rate with Lasso Regression

# Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

Week 2 Run a Random Forest. You will need to perform a random forest analysis to evaluate the importance of a series of explanatory variables in predicting a binary, categorical response variable. The Features and Target of My Model These will be the same as the previous week. The target is Poverty Group (1= >16%, … Continue reading Data Analysis & Interpretation 4.2: Predicting County Poverty Group with Random Forest

# Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

Course 4: Machine Learning for Data Analysis This course focuses on various machine learning algorithms: Decision Trees, Random Forests, Lasso Regression, and K-Mean Cluster Analysis. Week 1 Run a Classification Tree. You will need to perform a decision tree analysis to test nonlinear relationships among a series of explanatory variables and a binary, categorical response … Continue reading Data Analysis & Interpretation 4.1: Predicting County Poverty Group with Decision Tree

# Predicting Survival on the Titanic with Logistic Regression

There are two datasets utilized in this project: train and test. Both are from the Kaggle machine learning training competition, Titanic: Machine Learning from Disaster. In this project, I utilize a logistic regression model to predict whether or not a passenger survived. My accuracy ended up being 75.6% but I plan on revisiting this dataset … Continue reading Predicting Survival on the Titanic with Logistic Regression

# Predicting Boston House Prices Using a Linear Regression Model

This machine learning project uses a real-world test dataset for housing statistics in Boston during the 70's. I used a linear regression model to predict the price of homes based on key features. In [1]: # Setup environment %matplotlib inline import pandas as pd import numpy as np import matplotlib.pyplot as plt In [2]: # Import … Continue reading Predicting Boston House Prices Using a Linear Regression Model

# Classifying the Iris (Flower) Species Using K-Nearest Neighbor

Iris Species Classification This was a tutorial machine learning project from the book, Introduction to Machine Learning with Python (Ch. 1). This uses the k-nearest neighbors algorithm to predict the classification of iris flowers based on sepal and pedal width and length. In [25]: import pandas as pd import matplotlib.pyplot as plt import numpy as … Continue reading Classifying the Iris (Flower) Species Using K-Nearest Neighbor