Course Overview
A dramatic increase in computing power has enabled new areas of data science to develop in statistical modeling and artificial intelligence, often called "machine learning". Machine learning covers predictive and descriptive learning, and bridges theoretical and empirical ideas across disciplines. We will focus on concepts and methods for predictive learning: estimating models from data to predict unknown outcomes. Model types will include decision trees, linear models, nearest neighbor methods, and others as time permits. We will cover classification and regression using these models, as well as methods needed to handle large datasets. Lastly, we will discuss deep neural networks and other methods at the forefront of machine learning. We situate the course components in the "data science life cycle" as part of the larger set of practices in the discovery and communication of scientific findings.
The course will include lectures, readings, homework assignments, exams, and a class project. Most course activities will use Python with the Pandas and scikit-learn libraries.
Instructor
Teaching Assistant
Prerequisites
Familiarity with tabular data and data types, implemented in Python using Pandas. One of STAT/CS/IS 107, IS 205, INFO 407, or equivalent Python/Pandas experience recommended. Sophomore, Junior, or Senior standing.
Textbook
- Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2 (3rd ed.), by Raschka & Mirjalili. Packt Publishing, 2019.
Assessment
| Component | Weight |
|---|---|
| Weekly homework | 35% |
| In-class quizzes | 10% |
| Exams (midterm + final) | 35% |
| Class project | 20% |
Policies at a Glance
Late Work
Assignments accepted late for 80% of points until the last lecture. Exams and the project are not accepted late without prior approval. You receive 4 late days (all-or-nothing, max 2 days per assignment).
Academic Integrity
Use of AI tools (e.g., ChatGPT) is permitted provided you attribute the source and clearly indicate what was adapted vs. original.
Readings
Schedule
Weekly assignments are released each Friday and due by end of the following Friday. Readings reinforce class discussion — reading ahead is encouraged.
| Week | Topics | Readings (R) / Assignments (A) Due |
|---|---|---|
| 1 | Course introduction: Machine learning, AI, and data science | R – Ch. 1 (pp. 1–17) |
| 2 | Classification with k-nearest neighbors | R – Ch. 3 k-NN (pp. 103–107) · A – Week 1 homework |
| 3 | Decision tree concepts | R – Ch. 3 decision trees (pp. 90–100) · A – Week 2 homework |
| 4 | Cross-validation | R – Ch. 4 (pp. 121–124); Ch. 6 k-fold (pp. 195–201) · A – Week 3 homework |
| 5 | Regression with linear models | R – Ch. 10 (pp. 315–341) · A – Week 4 homework |
| 6 | Logistic regression and linear SVM | R – Ch. 3 (pp. 60–84) · A – Week 5 homework |
| 7 | Regression with k-NN and trees | R – Ch. 10 (pp. 325–350) · A – Week 6 homework |
| 8 | Midterm exam | A – Midterm (CBTF) · A – Week 7 homework |
| 9 | Evaluating ML accuracy and fairness | R – Ch. 6 (pp. 207–222) · A – Project proposal due |
| 10 | Feature selection and dimensionality reduction | R – Ch. 4 (pp. 127–143); Ch. 5 PCA (pp. 145–159) · A – Week 9 homework |
| 11 | Clustering | R – Ch. 11 (pp. 353–367) · A – Week 10 homework |
| 12 | Deep neural networks | R – Ch. 12–13 (pp. 383–423, 462–470) · A – Week 11 homework |
| 13 | Language model concepts | R – What are Large Language Models? · A – Week 12 homework |
| 14 | Class project presentations | A – Project report due |
| 15 | Finals week | A – Final exam (CBTF) |