Computational topics will include gradient methods, stochastic methods, Bayesian methods, splitting methods, interior point methods, linear programming, and methods for large matrices. Theoretical topics will include convex analysis, duality, rates of convergence, and advanced topics in linear algebra. Homework assignments will require both mathematical work on paper and implementation of algorithms.
The basics
Communications and questions for this course will be managed through the course slack channel.
If you have a question for the professor of the TAs I suggest you use slack rather than email to make sure it doesn’t get lost in the aether.
This is an in-person class. Some lectures may be posted online when they are available. Some will not be posted online.
When: TuTh 11:00pm - 12:15pm | Where: CHE 2110 | |
Instructor: Tom Goldstein | Office Hours: Th 12:15-1:30pm, Iribe 4144 | |
TA: Hamid Kazemi | Office Hours: TBD | use slack |
TA: Shweta Bhardwaj | Office Hours: TBD | use slack |
Exams
Exam problems will be reflective of homework problems in their style and content coverage. Problems can cover material presented in the slides, in addition to material from the homework. You are expected to understand slide material to the level of detail presented in the slides (but no more). You can be tested on the content of and solutions to homework problems.
You won’t be expected to recall complex formulas (e.g., rank-1 update formulas). But you may be expected to know how to use formulas if you are provided them
Exam information below is subject to change
Midterm exam topics
- Bayesian statistics, maximum likelihood, and priors
- Norms
- Objective functions for linear classifiers
- Fourier transforms and convolutions
- Continuous derivative and gradient
- discrete gradient operator and total variation
- chain rule / backprop / autdiff
- quadratic forms
- second-order taylor expansion
- matrix factorizations
- methods for solving linear systems (direct, iterative, randomized)
- definition of convex sets
- convex functions
- epigraph
- quasiconvexity
- convexity preserving operations
- differential properties of convex function
- Lipschitz constants and strong convexity constants
- Solutions to all homework problems up to and including homework 5.
Final exam
This exam will be on Sat May 11 at 8am. Topics include
- Gradient descent methods: stepsize restrictions, backtracking, and convergence rates
- Quasi-newton methods: know the conceptual basics of how they work, and the convergence rates. You will not be asked to recite specific update formulas for BFGS or CG.
- Duality: Be able to write the Lagrangian of a problem. Know how to find the dual of a problem by removing primal variables from the Lagrangian. Know what the conjugate function is, and how to use it to dualize a problem. Know Slater’s condition, and the difference between strong/weak duality.
- Prox methods: Know what a proximal operator is, what it does, and how to write the prox operator of a simple function. Know how to use a characteristic function to add a constraint to a problem, and what it means to compute the prox operator of a characteristic function. Know the steps of forward-backward splitting. Know stepsize restrictions and convergence rates. Be able to write the steps of a forward-backward solver for a particular problem.
- Lagrangian methods: be able to write the augmented Lagrangian for a problem. Be able to write the steps of an ADMM solver for a particular problem. Know the steps of PDHG. Be able to write a PDHG solver for a particular problem.
- Stochastic methods: know the basics of how SGD and MCMC work. When would you use these methods? What sort of convergence rates do you get with them?
Things you DON’T need to know:
- You don’t need to know the specific objective functions for the applications/problems given as examples in class.
- You don’t need to know the specific steps of methods with “complicated” update formulas, like Nesterov’s method, Adaptive (BB) stepsize rules, BFGS, etc, although you are expected to know the differences between these methods and how they work on a conceptual level.
Coursework & grading
Homework will be assigned approximately once per week. Homework assignments will consist of both programming exercises and theoretical problem sets. Programming assignments will be cumulative - you will need the results of early assignments to complete assignments that are given later in the course. The completed homework assignments you turn in must represent your own work.
The approximate grade breakdown of the course will be
- 50% homework
- 25% midterm exam
- 25% final exam.
I reserve the right to change the relative weighting of homework/exams at any time. Furthermore, because this course has gone online, the structure of exams will be quite different than we’ve had in the past and I’m still looking into different options.
Note: This is a PhD qualifying course for computer science.
Homework
Assignments will be distributed as Jupyter notebooks using Python 3. If you’ve never used notebooks before, you can find a tutorial here. For each assignment, you’ll turn in your completed and runnable notebook. You should complete your assignments using a Python version between 3.6 and 3.9 (inclusive).
Homework will be turned in using Elms/Canvas. Follow this link to Elms, and then look for the course page after logging in.
Homework 1: Linear algebra review (Due Wed Feb 9)
Solutions
Lecture Slides
These slides evolve from year to year, sometimes dramatically. For this reason, slides might get updated shortly before or after a lecture.
Slides below have not been updated…
Course Overview
Linear Algebra Review
Convolutions and FFT
Optimization Problems
Machine learning basics
Calculus
Quadratic Forms
Convex Functions
Gradient Methods
Quasi-Newton Methods
Duality
Proximal Methods
Lagrangian Methods
Random topics | MCMC code example
Bonus topics (not on exam):
Neural Net Design
Linear Programming
Semidefinite Programming
Interior Point Method
Recorded lectures
This is an in-person class. Recordings of lectures may not always be provided, and recordings will not be made available immediately after lecture. Here are some recordings from 2023 and 2022 that are available.
Optimization problems part 1
Optimization problems part 2
Optimization problems part 3
Calculus
Quadratics
Numerical linear algebra
Large matrices
Convex functions
Convex functions, part 2
Gradient descent, part 1
Gradient descent, part 2
Gradient descent, part 3
Convnets, part 1
Convnets, part 2
Quasi-Newton, part 1
Quasi-Newton, part 2
Duality, part 1
Duality, part 2
Duality, part 3
Forward-backward, part 1
Forward-backward, part 2
Lagrangian, part 1
Lagrangian, part 2
Lagrangian, part 3
SGD
Monte-Carlo
Book & Other Sources
All course materials are available for free online. Suggested reading material for various topics includes:
- Derivatives and gradients: Convex Optimization by Boyd and Vandenberghe, Appendix A
- Numerical Linear Algebra: Numerical Linear Algebra by Trefethen and Bau
- Randomized matrix factorizations: Finding Structure with Randomness
- L1 models and sparsity: Sparse modeling for Image and Vision Processing
- Convex functions and gradient methods: Convex Optimization by Boyd and Vandenberghe
- Line search methods: Chapter 3 of Nocedal and Wright
- Convergence rates for gradient methods: Optimal Rates in Convex Optimization
- Quasi-Newton methods: Chapter 3 of Nocedal and Wright
- Proximal methods: A Field Guide to Forward-Backward Splitting
- ADMM: Fast Alternating Direction Optimization Methods
- Consensus ADMM: Distributed Optimization and Statistical Learning
- Unwrapped ADMM: Unwrapping ADMM
- PDHG: Adaptive Primal-Dual Hybrid Gradient Methods
- SGD: Incremental Gradient, Subgradient, and Proximal Methods
- SGD convergence rates: Stochastic Gradient Descent for Non-Smooth Optimization
- Monte-Carlo: An Introduction to MCMC for Machine Learning
- Barrier Methods: Convex Optimization by Boyd and Vandenberghe, chapter 11
- Primal-Dual Interior Point Methods: Nocedal and Wright, chapter 14
- Interior Point Methods overview: IP methods: 25 years later
- Semi-definite programming: Vandenbergh and Boyd
- Metric learning: Distance Metric Learning for LMNN