School of Information Sciences Computer Vision and Machine Learning Group

IS517: Methods of Data Science

Fall 2025

Time: Tuesdays, 09:00 AM - 11:50 AM
Location: 46 Grad Sch of Lib & Info Science

[Syllabus]

Course Description

A dramatic increase in computing power has enabled new areas of data science to develop in statistical modeling and analysis. These areas cover predictive and descriptive learning and bridge between ideas and theory in statistics, computer science, and artificial intelligence. We will cover methods including predictive learning: estimating models from data to predict future outcomes. Regression topics include linear regression with recent advances using large numbers of variables, smoothing techniques, additive models, and local regression. Classification topics include linear regression, regularization, logistics regression, discriminant analysis, splines, support vector machines, generalized additive models, naive Bayes, mixture models and nearest neighbor methods as time permits. We situate the course components in the “data science lifecycle” as part of the larger set of practices in the discovery and communication of scientific findings.

This course will move rapidly. The course will include computer exercises using Python and other relevant computing languages.

Pre- and Co-requisites

LIS542 Data, Stat, Info, or equivalent (e.g. intro probability/stats STAT100, CS361, or ECON202) and LIS490IDS/CS398ID/STAT490 or CS101 or equivalent; or consent of the instructor. Linear Algebra recommended at the level of MATH125; Calculus recommended at the level of MATH220.

Course Materials

Our main textbook is: An Introduction to Statistical Learning, by James, Witten, Hastie, Tibshirani (ISL). With supplemental text: Elements of Statistical Learning, 2nd Edition, by Hastie, Tibshirani, Friedman (ESL).

Assignments and Methods of Assessment

Class Project

This is a group project. Ideally, every group should have two students. If this is not possible, please discuss with the instructor.

The project proposal will describe the proposed dataset(s), the original research question(s), and the proposed method of solution. This will likely be the novel application of a regression or classification technique from class. It is at most one page in length.

The final project will carry out the research in the proposal. There will be a project proposal presentation describing the research question(s), dataset(s), method, and possibly the expected results. There will be a final project presentation describing the research question(s), dataset(s), method, and comparing the expected and actual results.

Team

Instructor: Yaoyao Liu
Office location: Room 5125, 614 East Daniel St
Email: lyy@illinois.edu

Schedule (subject to revision)

Week Topic
Week 1 Syllabus + Data Science Intro
Week 2 Statistical Learning Foundations and Review (ISL Chapter 2)
Week 3 Linear Regression (ISL Chapter 3)
Week 4 Classification (ISL Chapter 4)
Week 5 Resampling (ISL Chapter 5)
Week 6 Linear Model Selection and Regularization (ISL Chapter 6)
Week 7 Splines / Generalized Additive Models (ISL Chapter 7)
Week 8 Project Proposal + Slides Due / Project Proposal Presentation
Week 9 Tree Based Methods (ISL Chapter 8)
Week 10 Support Vector Machines (ISL Chapter 9)
Week 11 Unsupervised Learning (ISL Chapter 10)
Week 12 Final Presentation Slides Due / Final Project Presentations
Week 13 Final Project Presentations
Week 14 Final Project Report Due


School of Information Sciences Computer Vision and Machine Learning Group

614 E. Daniel St. MC-314
Champaign, IL 61820-7999

Phone: (217) 300-0910

Email: vision@ischool.illinois.edu