A Comparison of Supervised Machine Learning Models
Goal: To explore supervised machine learning models through education data
Research Question: Can a supervised machine learning model classify degree-granting postsecondary institutions as being located in the Northeast vs. in the West?
Leveraged domain knowledge of regional differences so that project focus could be on learning about supervised machine learning.
Data
Integrated Postsecondary Education Data System (IPEDS)
Institutional characteristics (categorical)
Enrolled student characteristics (continuous)
College Scorecard (US Department of Education)
Financial aid applicant characteristics
1,100+ institutions; 27 predictor variables
Models Tested
Baseline
K-Nearest Neighbors
Decision Tree
Random Forest & Optimized Random Forest
AdaBoost
XGBoost
Final Model
Optimized Random Forest
Process
Train-Test Split (70:30)
Optimize & Evaluate
Evaluation Metrics
Accuracy
Precision
Recall
F1 Score