CMSC 828L Deep Learning
Staff
Professor David Jacobs AV Williams, 4421
Email: djacobs-at-cs
TAs: Justin Terry Email: justinkterry – at - gmail
Chen Zhu Email: chenzhu-at-cs
Office Hours:
Monday, 5-6. Justin.
Tuesday, 3-4. David.
Wednesday, 10-11, David
Wednesday, 5-6. Justin.
Thursday, 4-6. Chen.
Location: TA office hours will be in 4101 or 4103 AV Williams, depending on availability
(check both rooms). Prof. Jacobs office hours will be in 4421 AV Williams.
Readings
The following two books are
available on line.
Neural Networks and Deep
Learning, by Michael Nielsen
Other reading material appears in the schedule below.
Requirements
Students registered for this class must complete the following assignments:
Presentation: Students will form groups of 3 students. Each group will prepare a 30 minute presentation on a topic of their choice. Students will select two papers, and will present a summary and critical analysis of the material in these papers, along with any other appropriate background or related material. Students should video record their presentation and submit a link to their video. Presentations will be graded on the choice of topic (is the material interesting), clarity of presentation (do we understand the key points), focus (does the presentation highlight the most important parts of the work, rather than uniformly summarizing everything), and analysis (does the presentation help us understand the strengths and limitations of the presented work). Six leading presentations will be selected for live presentation to the full class.
Problem Sets: There will be three problem sets assigned during the course. These will include programming projects and may also include written exercises.
Midterm: There will be a one week, take-home midterm. This will include paper and pencil exercises.
Final Exam: There will be an in-class final exam.
Course Policies |
|
Course work, late policies, and grading |
Homework
and the take-home midterm are due at the start of class. Problems may be
turned in late, with a penalty of 10% for each day they are late, but may not
be turned in after the start of the next class after they are due. For
example, if a problem set is due on Tuesday, it may be turned in before
Wednesday at 12:30pm, with a 10% penalty, or before Thursday at 12:30pm, with
a 20% penalty, but no later than Thursday at 12:30pm. Some homeworks and the exams may have a special challenge problem. Points from the challenge problems are extra credit. This means that I do not consider these points until after the final course grade cutoffs have been set. Students participating in class discussion or asking good questions may also receive extra credit. Each problem set and the presentation will count for 10% of the final grade. The midterm will count for 20%, and the final will count for 40%. |
Academic Honesty |
All
class work is to be done independently. You are allowed to discuss class
material, homework problems, and general solution strategies with your
classmates. When it comes to formulating/writing/programming solutions you
must work alone. If you make use of other sources in coming up with your
answers you must cite these sources clearly (papers or books in the
literature, friends or classmates, information downloaded from the web,
whatever). It is best to try to solve problems on your own, since problem solving is an important component of the course. But I will not deduct points if you make use of outside help, provided that you cite your sources clearly. Representing other people's work as your own, however, is plagiarism and is in violation of university policies. Instances of academic dishonesty will be dealt with harshly, and usually result in a hearing in front of a student honor council, and a grade of XF. (Note, this and other course policies are taken from those of Prof. David Mount). |
|
Any
student who needs to be excused for an absence from a single lecture,
recitation, or lab due to a medically necessitated absence shall: a) Make a
reasonable attempt to inform the instructor of his/her illness prior to the
class. b) Upon returning to the class, present their instructor with a
self-signed note attesting to the date of their illness. Each note must
contain an acknowledgment by the student that the information provided is
true and correct. Providing false information to University officials is
prohibited under Part 9(h) of the Code of Student Conduct (V-1.00(B) University of Maryland Code of Student Conduct)
and may result in disciplinary action. The self-documentation may not be used
for the Major Scheduled Grading Events as defined below and it may only be
used for only 1 class meeting (or more, if you choose) during the semester. Any student who needs to be excused for a prolonged absence (2 or
more consecutive class meetings), or for a Major Scheduled Grading Event,
must provide written documentation of the illness from the Health Center or
from an outside health care provider. This documentation must verify dates of
treatment and indicate the timeframe that the student was unable to meet
academic responsibilities. In addition, it must contain the name and phone
number of the medical service provider to be used if verification is needed.
No diagnostic information will ever be requested. The Major Scheduled Grading
Events for this course include: the Final exam, as given in University
schedule. |
Academic Accommodations |
Any
student eligible for and requesting reasonable academic accommodations due to
a disability is requested to provide, to the instructor in office hours, a
letter of accommodation from the Office of Disability Support Services (DSS)
within the first two weeks of the semester. |
Assignments
Assigned |
Due |
|
9/11/18 |
9/25/18 |
|
9/25/18 |
10/9/18 |
|
10/9/18 |
10/23/18 |
|
10/23/18 |
10/30/18 |
|
Presentation |
|
11/13/18 |
Tentative Schedule
|
Date |
Topic |
Presenters |
Reading |
Class 1 |
8/28 |
Introduction |
|
|
Class 2 |
8/30 |
Intro to Machine Learning |
|
Deep Learning, Chapter 5 |
Class 3 |
9/4 |
Intro to Machine Learning: Linear models (SVMs and Perceptrons, logistic regression) |
|
For Logistic Regression see this chapter from Cosmo Shalizi |
Class 4 |
9/6 |
Intro to Neural Nets: What a network computes. |
|
Deep Learning, Chapter 6 Neural Networks and Deep Learning, Chapter 2 |
Class 5 |
9/11 |
Training a network: loss functions, backpropagation . |
|
A tutorial on energy based learning, by Lecun et al. Neural Networks and Deep Learning, Chapter 3 |
Class 6 |
9/13 |
Neural networks as universal function approximators |
|
Approximation by superpositions
of a sigmoidal function,
by George Cybenko (1989). Multilayer feedforward
networks are universal approximators, by Kurt Hornik,
Maxwell Stinchcombe, and Halbert
White (1989) Neural Networks and Deep Learning, Chapter 4 |
Class 7 |
9/18 |
Convolution and Fourier Transforms |
|
|
Class 8 |
9/20 |
CNNs contŐd Stochastic Gradient Descent, batch normalization, Siamese
networks, early stopping, transfer learning, brief
history of neural networks. |
Deep Learning, Chapter 7 Deep Learning, Chapter 9 |
|
Class 9 |
9/25 |
Implementation of deep learning. Deep learning frameworks and the software stack, hyperparameter optimization, hardware acceleration, debugging. |
Justin |
|
Class 10 |
9/27 |
Implementation of deep learning, contŐd |
Justin |
|
Class 11 |
10/2 |
Deeper networks. The vanishing gradient, skip connections, resnet. |
|
Very Deep Convolutional Neural
Networks for Large-Scale Image Recognition, by Simonyan
and Zisserman Deep Residual Learning for Image
Recognition by He et al. Residual Networks are Exponential
Ensembles of Relatively Shallow Networks by Veit
et al. Densely
Connected Convolutional Neural Networks by Huang et al. Also of interest: Neural Networks and
Deep Learning Chapter
5 On the
Difficulty of Training Recurrent Neural Networks by Pascanu
et al. |
Class 12 |
10/4 |
Optimization. Convex vs. non-convex functions. Convergence of GD and SGD, Adam optimizer, initialization, leaky RELU, Momentum, Changing step sizes. |
Neural Networks and
Deep Learning Chapter 8 |
|
Class 13 |
10/9 |
Convergence in deep networks. Minima that do/donŐt generalize. Broad vs. narrow minima. GD vs. SGD. The loss landscape. |
Understanding deep learning
requires rethinking generalization, by Zhang et al. Visualizing the loss landscape of
neural nets, by Li et al. VC Dimension and Rademacher compextiy are discussed in many places, eg., these
notes. Keskar, Nitish Shirish, et al. "On large-batch training for deep
learning: Generalization gap and sharp minima." |
|
Class 14 |
10/11 |
Dimensionality reduction, linear (PCA, LDA) and manifolds, random projections. |
PCA (slides from Olga Veksler) LDA (slides from Olga Veksler) An elementary proof of the Johnson-Lindenstrauss Lemma, by Dasgupta and
Gupta |
|
Class 15 |
10/16 |
Low-dimensional embedding, metric learning |
Efficient Estimation of Word Representations in Vector Space by Mikolov et al. Facenet: a Unified Embedding for Face Recognition and Clustering by Schroff et al. Metric Learning, a Survey, by Brian Kulis |
|
Class 16 |
10/18 |
Adversarial attacks |
|
Threat of
Adversarial Attacks on Deep Learning in Computer Vision: A Survey, by Akhtar and Mian Intriguing Properties
of Neural Networks, by Szegedy et al. Explaining
and Harnessing Adversarial Examples by Goodfellow
et al. A
Boundary Tilting Perspective on the Phenomenon of Adversarial Examples by
Tanay and Griffith Poison
Frogs! Targeted Clean Label Attacks on Neural Networks by Shafahi et al. |
Class 17 |
10/23 |
AI Safety and the future of AI |
Justin |
|
Class 18 |
10/25 |
Autoencoders, Variational Autoencoders, and dimensionality reduction in networks |
Chen |
Deep Learning, Chapter 14 Tutorial on
Variational Autoencoder,
by Carl Doersch |
Class 19 |
10/30 |
Generative models, GANs. |
|
Generative
Adversarial Networks by Goodfellow
et al. Towards Principled Methods of
Training Generative Adversarial Networks by Arjovsky
and Bettou Wasserstein
GAN by Arjovsky et al. |
Class 20 |
11/1 |
Go over midterm. Image-to-image translation |
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks by Zhu et al. |
|
Class 21 |
11/6 |
Reinforcement learning |
Reinforcement Learning, An Introduction by Sutton and Barto Understanding Chapters 3 and 6 is important, but reading 4 and 5 will probably help with 6. Chapter 1 is fun and quick to read. |
|
Class 22 |
11/8 |
Deep reinforcement learning |
Reinforcement Learning, An Introduction by Sutton and Barto Deep Learning Sections 16.1, 16.5, 16.6 |
|
Class 23 |
11/13 |
Why are deep networks better than shallow? |
G. F. Montufar, R. Pascanu,
K. Cho, and Y. Bengio. On
the number of linear regions of deep neural networks. In NIPS,
pages 2924–2932, 2014. The Power of Depth for Feedforward Neural Networks Benefits of
depth in neural networks Matus
Telgarsky |
|
Class 24 |
11/15 |
Catching up on previous topics. |
||
Class 25 |
11/20 |
Recurrent neural nets. |
Deep Learning, Chapter 10, especially from the beginning
through 10.2, and Section 10.10 |
|
Class 26 |
11/27 |
Student presentations |
Visual Question Answering -- Ishita, Pranav and Shlok Bayesian Deep Learning -- Sam and Susmija |
DonŐt Just Assume; Look and
Answer: Overcoming Priors for Visual Question Answering by Agrawal et al. Gal,
Yarin, and Zoubin Ghahramani. "Dropout as a Bayesian
approximation: Representing model uncertainty in deep learning." |
Class 27 |
11/29 |
Conclusions |
|
|
Class 28 |
12/4 |
Student presentations |
Mansi, Sahil, and Saumya – Capsule Networks Kamal, Sneha, and Uttaran – Graph Convolutional Networks |
Sara Sabour,
Nicholas Frosst, Geoffrey Hinton, Dynamic
Routing Between Capsules Spectral
Networks and Locally Connected Networks on Graphs |
Class 29 |
12/6 |
Student presentations |
Samuel, Alex and Alex -- Memory Augmented Neural Networks and Meta-Learning Abhishek, Nirat, Snehesh, Chahat – Depth, Pose, and Flow from Images. |
One Shot Learning with
Memory-Augmented Neural Networks, by Santoro et al. Yin and Shi, GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose |
Final |
12/17 1:30-3:30 |
|
|
|