Time: Tuesdays, 09:00 AM - 11:50 AM
Location: 46 Grad Sch of Lib & Info Science
[Syllabus]
A dramatic increase in computing power has enabled new areas of data science to develop in statistical modeling and analysis. These areas cover predictive and descriptive learning and bridge between ideas and theory in statistics, computer science, and artificial intelligence. We will cover methods including predictive learning: estimating models from data to predict future outcomes. Regression topics include linear regression with recent advances using large numbers of variables, smoothing techniques, additive models, and local regression. Classification topics include linear regression, regularization, logistics regression, discriminant analysis, splines, support vector machines, generalized additive models, naive Bayes, mixture models and nearest neighbor methods as time permits. We situate the course components in the “data science lifecycle” as part of the larger set of practices in the discovery and communication of scientific findings.
This course will move rapidly. The course will include computer exercises using Python and other relevant computing languages.
LIS542 Data, Stat, Info, or equivalent (e.g. intro probability/stats STAT100, CS361, or ECON202) and LIS490IDS/CS398ID/STAT490 or CS101 or equivalent; or consent of the instructor. Linear Algebra recommended at the level of MATH125; Calculus recommended at the level of MATH220.
Our main textbook is: An Introduction to Statistical Learning, by James, Witten, Hastie, Tibshirani (ISL). With supplemental text: Elements of Statistical Learning, 2nd Edition, by Hastie, Tibshirani, Friedman (ESL).
This is a group project. Ideally, every group should have two students. If this is not possible, please discuss with the instructor.
The project proposal will describe the proposed dataset(s), the original research question(s), and the proposed method of solution. This will likely be the novel application of a regression or classification technique from class. It is at most one page in length.
The final project will carry out the research in the proposal. There will be a project proposal presentation describing the research question(s), dataset(s), method, and possibly the expected results. There will be a final project presentation describing the research question(s), dataset(s), method, and comparing the expected and actual results.
Instructor: Yaoyao Liu
Office location: Room 5125, 614 East Daniel St
Email: lyy@illinois.edu
Week | Topic |
---|---|
Week 1 | Syllabus + Data Science Intro |
Week 2 | Statistical Learning Foundations and Review (ISL Chapter 2) |
Week 3 | Linear Regression (ISL Chapter 3) |
Week 4 | Classification (ISL Chapter 4) |
Week 5 | Resampling (ISL Chapter 5) |
Week 6 | Linear Model Selection and Regularization (ISL Chapter 6) |
Week 7 | Splines / Generalized Additive Models (ISL Chapter 7) |
Week 8 | Project Proposal + Slides Due / Project Proposal Presentation |
Week 9 | Tree Based Methods (ISL Chapter 8) |
Week 10 | Support Vector Machines (ISL Chapter 9) |
Week 11 | Unsupervised Learning (ISL Chapter 10) |
Week 12 | Final Presentation Slides Due / Final Project Presentations |
Week 13 | Final Project Presentations |
Week 14 | Final Project Report Due |
614 E. Daniel St. MC-314
Champaign, IL 61820-7999
Phone: (217) 300-0910
Email: vision@ischool.illinois.edu