An Empirical Introduction to Statistical Modeling
Course: BBS706 - An Empirical Introduction to Statistical Modeling
Professor: Manuel Garber, manuel.garber@umassmed.edu
Semester Offered: Fall 2021, Fall 2023
Last Taught: Fall 2019
Syllabus: Lecture and textbook based course on statistical modeling and machine learning, with exercises on analyzing real data.
Course Summary and Objectives: This course covers the most common approaches to modeling high dimensional data. We begin with a brief introduction to linear algebra and methods that heavily rely on linear algebra—clustering and dimensionality reduction. We then focus on regression (linear, non-linear and logistic) models as well as non-linear classification (support vector machines, neural networks). The goal is twofold: i) To understand both conceptually and mathematically, how and why the approach works and ii) to be able to apply the technique to a real dataset.
Methodology: Students will present take turns to lead the discussion based on the book chapter scheduled for the week. The course will include an experimental dataset that will be the bases for applying and comparing different modeling and classification methods. Following the completion of the method theory we will discuss its applicability to the datasets in the book and finally to the course dataset. Because of the emphasis on student leading discussions no audits will be allowed, only students taking the course for credit will be able to participate in class.
Course Topics:
- Introduction to statistical Modeling
- What can it be used for
- Examples
- Mathematical background
- Linear algebra
- Data pre-processing
- Linear Regression Models
- Linear Regression
- Penalized Linear Regression
- Non-linear Regression Models
- Support Vector Machines
- Neural Networks
- Regression trees
- Regression Trees
- Random Forests
- Discriminant analysis
- Logistic Regression
- Linear Discriminant Analysis
- Nonlinear classifiers
- Support Vector Machines
- K-nearest neighbors
- Classification Trees
- Filter Methods
- Consequences of Non-informative features
- Feature reduction methods
Course Materials:
Textbook: Applied Predictive Modeling, Max Kuhn and Kjell Johnson, Springer, 2013
Suggested reading: An introduction to Statistical Learning with applications in R, Garret James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Springer, 2013
Evaluation:
Following each section, students will be expected to apply the method cover to the class dataset. Course evaluation will be based on three parameters:
- Their application of the method to the dataset
- Their discussion leads
- Their class intervention