CMSC 828L Deep Learning

Staff

Professor David Jacobs AV Williams, 4421

Email: djacobs-at-cs

TAs: Justin Terry Email: justinkterry – at - gmail

Chen Zhu Email: chenzhu-at-cs

Office Hours:

Monday, 5-6. Justin.

Tuesday, 3-4. David.

Wednesday, 10-11, David

Wednesday, 5-6. Justin.

Thursday, 4-6. Chen.

Location: TA office hours will be in 4101 or 4103 AV Williams, depending on availability

(check both rooms). Prof. Jacobs office hours will be in 4421 AV Williams.

Readings

The following two books are available on line.

Deep Learning, by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Neural Networks and Deep Learning, by Michael Nielsen

Other reading material appears in the schedule below.

Requirements

Students registered for this class must complete the following assignments:

Presentation: Students will form groups of 3 students. Each group will prepare a 30 minute presentation on a topic of their choice. Students will select two papers, and will present a summary and critical analysis of the material in these papers, along with any other appropriate background or related material. Students should video record their presentation and submit a link to their video. Presentations will be graded on the choice of topic (is the material interesting), clarity of presentation (do we understand the key points), focus (does the presentation highlight the most important parts of the work, rather than uniformly summarizing everything), and analysis (does the presentation help us understand the strengths and limitations of the presented work). Six leading presentations will be selected for live presentation to the full class.

Problem Sets: There will be three problem sets assigned during the course. These will include programming projects and may also include written exercises.

Midterm: There will be a one week, take-home midterm. This will include paper and pencil exercises.

Final Exam: There will be an in-class final exam.

Course Policies
Course work, late policies, and grading	Homework and the take-home midterm are due at the start of class. Problems may be turned in late, with a penalty of 10% for each day they are late, but may not be turned in after the start of the next class after they are due. For example, if a problem set is due on Tuesday, it may be turned in before Wednesday at 12:30pm, with a 10% penalty, or before Thursday at 12:30pm, with a 20% penalty, but no later than Thursday at 12:30pm. Some homeworks and the exams may have a special challenge problem. Points from the challenge problems are extra credit. This means that I do not consider these points until after the final course grade cutoffs have been set. Students participating in class discussion or asking good questions may also receive extra credit. Each problem set and the presentation will count for 10% of the final grade. The midterm will count for 20%, and the final will count for 40%.
Academic Honesty	All class work is to be done independently. You are allowed to discuss class material, homework problems, and general solution strategies with your classmates. When it comes to formulating/writing/programming solutions you must work alone. If you make use of other sources in coming up with your answers you must cite these sources clearly (papers or books in the literature, friends or classmates, information downloaded from the web, whatever). It is best to try to solve problems on your own, since problem solving is an important component of the course. But I will not deduct points if you make use of outside help, provided that you cite your sources clearly. Representing other people's work as your own, however, is plagiarism and is in violation of university policies. Instances of academic dishonesty will be dealt with harshly, and usually result in a hearing in front of a student honor council, and a grade of XF. (Note, this and other course policies are taken from those of Prof. David Mount).
Absences	Any student who needs to be excused for an absence from a single lecture, recitation, or lab due to a medically necessitated absence shall: a) Make a reasonable attempt to inform the instructor of his/her illness prior to the class. b) Upon returning to the class, present their instructor with a self-signed note attesting to the date of their illness. Each note must contain an acknowledgment by the student that the information provided is true and correct. Providing false information to University officials is prohibited under Part 9(h) of the Code of Student Conduct (V-1.00(B) University of Maryland Code of Student Conduct) and may result in disciplinary action. The self-documentation may not be used for the Major Scheduled Grading Events as defined below and it may only be used for only 1 class meeting (or more, if you choose) during the semester. Any student who needs to be excused for a prolonged absence (2 or more consecutive class meetings), or for a Major Scheduled Grading Event, must provide written documentation of the illness from the Health Center or from an outside health care provider. This documentation must verify dates of treatment and indicate the timeframe that the student was unable to meet academic responsibilities. In addition, it must contain the name and phone number of the medical service provider to be used if verification is needed. No diagnostic information will ever be requested. The Major Scheduled Grading Events for this course include: the Final exam, as given in University schedule.
Academic Accommodations	Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first two weeks of the semester.

Assignments

	Assigned	Due
Problem Set 1	9/11/18	9/25/18
Problem Set 2	9/25/18	10/9/18
Problem Set 3	10/9/18	10/23/18
Midterm	10/23/18	10/30/18
Presentation		11/13/18

Tentative Schedule

	Date	Topic	Presenters	Reading
Class 1	8/28	Introduction
Class 2	8/30	Intro to Machine Learning		Deep Learning, Chapter 5
Class 3	9/4	Intro to Machine Learning: Linear models (SVMs and Perceptrons, logistic regression)		For Logistic Regression see this chapter from Cosmo Shalizi
Class 4	9/6	Intro to Neural Nets: What a network computes.		Deep Learning, Chapter 6 Neural Networks and Deep Learning, Chapter 2
Class 5	9/11	Training a network: loss functions, backpropagation .		A tutorial on energy based learning, by Lecun et al. Neural Networks and Deep Learning, Chapter 3
Class 6	9/13	Neural networks as universal function approximators		Approximation by superpositions of a sigmoidal function, by George Cybenko (1989). Multilayer feedforward networks are universal approximators, by Kurt Hornik, Maxwell Stinchcombe, and Halbert White (1989) Neural Networks and Deep Learning, Chapter 4
Class 7	9/18	Convolution and Fourier Transforms		Convolution and Fourier Transforms
Class 8	9/20	CNNs cont’d Stochastic Gradient Descent, batch normalization, Siamese networks, early stopping, transfer learning, brief history of neural networks.		Deep Learning, Chapter 7 Deep Learning, Chapter 9
Class 9	9/25	Implementation of deep learning. Deep learning frameworks and the software stack, hyperparameter optimization, hardware acceleration, debugging.	Justin
Class 10	9/27	Implementation of deep learning, cont’d	Justin
Class 11	10/2	Deeper networks. The vanishing gradient, skip connections, resnet.		Very Deep Convolutional Neural Networks for Large-Scale Image Recognition, by Simonyan and Zisserman Deep Residual Learning for Image Recognition by He et al. Residual Networks are Exponential Ensembles of Relatively Shallow Networks by Veit et al. Densely Connected Convolutional Neural Networks by Huang et al. Also of interest: Neural Networks and Deep Learning Chapter 5 On the Difficulty of Training Recurrent Neural Networks by Pascanu et al.
Class 12	10/4	Optimization. Convex vs. non-convex functions. Convergence of GD and SGD, Adam optimizer, initialization, leaky RELU, Momentum, Changing step sizes.		Neural Networks and Deep Learning Chapter 8
Class 13	10/9	Convergence in deep networks. Minima that do/don’t generalize. Broad vs. narrow minima. GD vs. SGD. The loss landscape.		Understanding deep learning requires rethinking generalization, by Zhang et al. Visualizing the loss landscape of neural nets, by Li et al. VC Dimension and Rademacher compextiy are discussed in many places, eg., these notes. Keskar, Nitish Shirish, et al. "On large-batch training for deep learning: Generalization gap and sharp minima."
Class 14	10/11	Dimensionality reduction, linear (PCA, LDA) and manifolds, random projections.		PCA (slides from Olga Veksler) LDA (slides from Olga Veksler) An elementary proof of the Johnson-Lindenstrauss Lemma, by Dasgupta and Gupta
Class 15	10/16	Low-dimensional embedding, metric learning		Efficient Estimation of Word Representations in Vector Space by Mikolov et al. Facenet: a Unified Embedding for Face Recognition and Clustering by Schroff et al. Metric Learning, a Survey, by Brian Kulis
Class 16	10/18	Adversarial attacks		Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey, by Akhtar and Mian Intriguing Properties of Neural Networks, by Szegedy et al. Explaining and Harnessing Adversarial Examples by Goodfellow et al. A Boundary Tilting Perspective on the Phenomenon of Adversarial Examples by Tanay and Griffith Poison Frogs! Targeted Clean Label Attacks on Neural Networks by Shafahi et al.
Class 17	10/23	AI Safety and the future of AI	Justin
Class 18	10/25	Autoencoders, Variational Autoencoders, and dimensionality reduction in networks	Chen	Deep Learning, Chapter 14 Tutorial on Variational Autoencoder, by Carl Doersch
Class 19	10/30	Generative models, GANs.		Generative Adversarial Networks by Goodfellow et al. Towards Principled Methods of Training Generative Adversarial Networks by Arjovsky and Bettou Wasserstein GAN by Arjovsky et al.
Class 20	11/1	Go over midterm. Image-to-image translation		Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks by Zhu et al.
Class 21	11/6	Reinforcement learning		Reinforcement Learning, An Introduction by Sutton and Barto Understanding Chapters 3 and 6 is important, but reading 4 and 5 will probably help with 6. Chapter 1 is fun and quick to read.
Class 22	11/8	Deep reinforcement learning		Reinforcement Learning, An Introduction by Sutton and Barto Deep Learning Sections 16.1, 16.5, 16.6
Class 23	11/13	Why are deep networks better than shallow?		G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio. On the number of linear regions of deep neural networks. In NIPS, pages 2924–2932, 2014. The Power of Depth for Feedforward Neural Networks Ronen Eldan and Ohad Shamir 29th Conference on Learning Theory Benefits of depth in neural networks Matus Telgarsky
Class 24	11/15	Catching up on previous topics.
Class 25	11/20	Recurrent neural nets.		Deep Learning, Chapter 10, especially from the beginning through 10.2, and Section 10.10
Class 26	11/27	Student presentations	Visual Question Answering -- Ishita, Pranav and Shlok Bayesian Deep Learning -- Sam and Susmija	Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering by Agrawal et al. Gal, Yarin, and Zoubin Ghahramani. "Dropout as a Bayesian approximation: Representing model uncertainty in deep learning."
Class 27	11/29	Conclusions
Class 28	12/4	Student presentations	Mansi, Sahil, and Saumya – Capsule Networks Kamal, Sneha, and Uttaran – Graph Convolutional Networks	Sara Sabour, Nicholas Frosst, Geoffrey Hinton, Dynamic Routing Between Capsules Spectral Networks and Locally Connected Networks on Graphs Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun
Class 29	12/6	Student presentations	Samuel, Alex and Alex -- Memory Augmented Neural Networks and Meta-Learning Abhishek, Nirat, Snehesh, Chahat – Depth, Pose, and Flow from Images.	One Shot Learning with Memory-Augmented Neural Networks, by Santoro et al. Yin and Shi, GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose
Final	12/17 1:30-3:30