Machine
Learning at Scale
|
In the last few years machine learning has
grown from the use of large data sets to solve specific tasks (eg., training models to perform object
classification using one million images in ImageNet) to the use of much
larger datasets to train more general purpose
representations that can be used to solve more general tasks. For
example, Foundation image models are trained using billions of image-text
pairs and used to solve a wide range of image understanding problems.
Most dramatically, Large Language Models are trained using trillions of
tokens and then customized to address a wide range of applications. In
this course we will study the techniques that have enabled machine learning
systems to make use of such large training sets and the results produced by
these systems. For example, using very large amounts of data has
shifted some of the focus of training from supervised to self-supervised
learning. Automatic data curation and cleaning has also become a
more pressing problem. Much of this data has been collected from
society at large, and so we will also discuss some of the societal
implications of the collection and use of this data by large companies.
|
Assignments
Students
will work throughout the semester on a project. This may take several forms.
1) Students may produce novel research. In this case the final project should
resemble a research paper suitable for submission to a workshop or
conference, including motivation for the problem, discussion of relevant
prior work, methods developed, and experimental results.
2) Students may analyze the performance
of an existing system or published research work. This could include, for example, an
analysis of bias in foundation models or an attempt to understand better what
information is captured by image or video embeddings produced by foundation
models. This analysis should probably
include new experimental results, though theoretical analysis is also
welcome.
3) Students may produce a research
proposal. This could take the form of
an NSF-style proposal https://www.nsf.gov/pubs/policydocs/pappguide/nsf16001/gpg_index.jsp. We can distribute sample NSF proposals to
interested students. The proposal
should have a compelling overall vision, take account of all relevant past
work to explain what is innovative, and provide a detailed description of
proposed work. The best proposals will
often contain persuasive initial results.
Students may work on these projects in
groups of up to three students. The
more students who are working on a common project, the more comprehensive it
will be expected to be.
Due Dates:
Students
should turn in a short (less than 5 pages) proposal for their intended
project by Wed. October 16. This
is meant to help me provide feedback on proposed projects.
The
final papers is due on the last day of classes, December
9.
In
addition, students will be required to turn in one page
critiques of ten of the papers assigned as reading for the class. Each critique should contain two
paragraphs. The first paragraph should
summarize the paper. The second
paragraph should provide some critical insight into the paper and/or
interesting discussion questions for the paper. These critiques are due prior to the class
in which the paper will be discussed; late summaries will not be accepted. Please hand these in as printed papers
prior to the start of class. Students will be expected to be prepared to
discuss their critiques in class. Do
not turn in more than one summary per class.
Do not use LLMs to generate the critiques.
|
Disability Support
Any student eligible for
and requesting reasonable academic accommodations due to a disability is
requested to provide, to the instructor in office hours, a letter of
accommodation from the Office of Disability Support Services (DSS) within
the first TWO weeks of the semester.
|
Academic Integrity
In this course you are
responsible for both the University’s Code of Academic
Integrity and the University of Maryland
Guidelines for Acceptable Use of Computing Resources. Any
evidence of unacceptable use of computer accounts or unauthorized cooperation
on tests, quizzes and assignments will be submitted to the Student Honor
Council, which could result in an XF for the course, suspension, or expulsion
from the University.
Any work
that you hand in must be your own work.
Any sources that you draw from, including other students or LLMs,
should be appropriately acknowledged.
Plagiarism is a serious offense, and will not be tolerated.
|
Anti-Harassment
The open exchange of
ideas, the freedom of thought and expression, and respectful scientific
debate are central to the aims and goals of this course. These require a
community and an environment that recognizes the inherent worth of every
person and group, that fosters dignity, understanding, and mutual respect,
and that embraces diversity. Harassment and hostile behavior are unwelcome in
any part of this course. This includes: speech or behavior that intimidates,
creates discomfort, or interferes with a person’s participation or
opportunity for participation in the course. We aim for this course to be an
environment where harassment in any form does not happen, including but not
limited to: harassment based on race, gender, religion, age, color, national origin,
ancestry, disability, sexual orientation, or gender identity. Harassment
includes degrading verbal comments, deliberate intimidation, stalking,
harassing photography or recording, inappropriate physical contact, and
unwelcome sexual attention. Please contact an instructor or CS staff member
if you have questions or if you feel you are the victim of harassment (or
otherwise witness harassment of others).
|
Course evaluations
We welcome your
suggestions for improving this class, please don’t hesitate to share it with
the instructor or the TAs during the semester! You will also be asked to give
feedback using the CourseEvalUM system
at the end of the semester. Your feedback will help us make the course
better.
Web Accessibility
|
Office
Hours
Prof. Jacobs
will have office hours, Wednesdays, 10-11 in Iribe,
4240. In addition, students should
feel free to schedule meetings with Prof. Jacobs at other times.
|
|
Topic
|
Assigned
Reading
|
Additional
resources
|
1. 8/26
|
Introduction
|
|
|
2. 8/28
|
Review of
some ML concepts and history
|
If you aren’t
completely familiar with machine learning, Hal’s book (http://ciml.info/) provides a good undergraduate
introduction. You should probably read
it.
The Deep
Learning book (https://www.deeplearningbook.org/)
provides a comprehensive discussion of deep learning. Chapter 5 provides an introduction to ML,
which will serve as a reference for this lecture. Chapters 6-9 will introduce concepts in Deep
Learning that we’ll use in class.
http://neuralnetworksanddeeplearning.com/
provides a short, very clear introduction to neural networks.
|
|
3. 9/4
|
Markov Chains
and Language sequence prediction
|
Prediction and Entropy of
Printed English, by Shannon, Bell System Technical Journal.
Efficient Estimation of Word
Representation in Vector Space, by Mikolov et
al., Arxiv, 2013.
|
The
Mixture Distribution Transition Model for High Order Markov Chains and
non-Gaussian Time Series by Berchtold and
Raftery
|
4. 9/9
|
Transformers
and word embeddings
|
Attention is all you need, Vaswani et al., Neurips
2017.
BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding, by Devlin et al. Proc.
Of NAACL-HLT, 2019.
|
Formal
Algorithms for Transformers by Phuong and Hutter,
Arxiv. I
highly recommend this for a clear and precise description of Transformers.
Electra: Pre-training text
encoders as discriminators rather than generators, Clark et al., ICLR
2020.
|
5. 9/11
|
LLMs
pretraining
|
LLaMA: Open and Efficient Foundation Language Models,
by Touvron et al., Arxiv
2023.
|
|
6. 9/16
|
Vision
Transformers
|
Masked
Autoencoders Are Scalable Vision Learners, by He et al., CVPR 2022.
|
|
7. 9/18
|
Self-supervised
learning - Vision
|
Emerging
Properties in Self-Supervised Vision Transformers, by Caron, et al., ICCV
2021.
Vision Transformers Need Registers, Darcet et al., ICLR 2024.
|
SimCLR slides
|
8. 9/23
|
Vision-Language
Models
|
Learning
Transferable Visual Models From Natural Language
Supervision by Radford et
al., ICML 2021.
|
Demystifying
CLIP data, Xu et al., Arxiv 2024.
|
9. 9/25
|
Reinforcement
learning
|
Reinforcement Learning,
an Introduction, Sutton and Barto Reading the
first six chapters will give you a good intro.
Proximal
Policy Optimization Algorithms, by Schulman et al., Arxiv,
2017.
|
OpenAI discussion of PPO
|
10. 9/30
|
Video
Generation
|
Reducing
Activation Recomputation in Large Transformer
Models, by Korthikanti et al., Arxiv, 2022.
Emu:
Enhancing Image Generation Models Using Photogenic Needles in a Haystack,
by Dai et al., Arxiv, 2023.
|
|
11. 10/2
|
LLMs fin
e-tuning
|
Training
language models to follow instructions with human feedback, by Ouyang et
al., Arxiv 2022.
|
Learning
to Summarize from Human Feedback, by Stiennon
et al., Neurips 2020. Gives more detail on how InstructGPT works.
Towards Understanding
Sycophancy in Language Models, by Sharma et al., ICLR 2024.
Llama 2:
Open Foundation and Fine-Tuned Chat Models by Touvron
et al., Arxiv 2023.
SELF-INSTRUCT: Aligning Language
Models with Self-Generated Instructions by Wang et al., ACL 2023.
The Curious Case of Neural
Text Degeneration, by Holtzman et al, ICLR 2020.
|
12. 10/7
|
Data
|
AI models collapse when trained on recursively
generated data, by Shumailov et al., Nature, 2024.
|
|
13. 10/9
|
Class Cancelled
|
|
|
14. 10/14
|
Multi-modal
LLMs
|
Blip-2:
Bootstrapping language-image pre-training with frozen image encoders and
large language models, by Li et al., ICML 2023.
Flamingo:
a Visual Language Model for Few-Shot Learning by Alayrac
et al., Neurips 2022.
|
Visual
Instruction Tuning, Liu et al., Neurips 2023.
The
Llama 3 Herd of Models, Meta, 2024.
The
Claude 3 Model Family: Opus, Sonnet, Haiku, Anthropic.
Gemini: A Family of Highly Capable
Multimodal Models, DeepMind
GPT-4 Technical Report, OpenAI
|
15. 10/16
|
Understanding
fine-tuning and in-context learning?
+
Bias, …
|
LIMA:
Less Is More for Alignment, Zhou et al., Neurips
2024.
Rethinking
the role of demonstrations: what makes it work?, by Min et al., EMNLP,
2022
On
the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, by Bender et al., Proceedings
of the 2021 ACM conference on fairness, accountability, and transparency.
|
Women
also Snowboard, by Hendricks et al., ECCV 2018
|
16. 10/21
|
Privacy and
Trust in Foundation models
|
Open
Sesame! Universal Black-box Jailbreaking of Large Language Models, by Lapid et al., ICLR Workshop on Secure and Trustworthy
LLMs, 2024.
Scalable Extraction of Training Data
from (Production) Language Models by Nasr, et al., Arxiv,
2023.
DECODINGTRUST:
A Comprehensive Assessment of Trustworthiness in GPT Models, by Wang
|
Badllama 3: removing safety finetuning from Llama 3 in
minutes, by Volkov, Arxiv, 2024.
Foundational Challenges in Assuring
Alignment and Safety of Large Language Models, by Anwar et al., Arxiv, 2024.
|
17. 10/23
|
Efficiency in
Training
|
FLASHATTENTION:
Fast and Memory-Efficient Exact Attention with IO-Awareness by Dao, et
al., Neurips,
2022.
LLM.int8():
8-bit Matrix Multiplication for Transformers at Scale, by Dettmers, et al., Neurips,
2022.
QLoRA: Efficient Finetuning of Quantized LLMs, by Dettmers, et al., Neurips 2023.
|
LLM.int8()
blog
Training Deep
Nets with Sublinear Memory Cost by Chen et al., Arxiv
2016.
|
18. 10/28
|
Efficiency
cont’d
|
|
|
19. 10/30
|
Parallelism
|
Megatron-LM:
Training Multi-Billion Parameter Language Models Using Model Parallelism,
by Shoeybi et al., Arxiv,
2020.
Ring
Attention with Blockwise Transformers for
Near-Infinite Context, by Liu et al., ICLR, 2024.
ZeRO: Memory Optimizations Toward Training Trillion
Parameter Models, by Rajbhandari et al., Sc20, 2020.
|
Large
Scale Distributed Deep Networks by Dean et al., Neurips
2012.
Scaling
Distributed Machine Learning with the Parameter Server, by Li et al., 11th
USENIX Symposium on Operating Systems Design and Implementation, 2014.
|
20. 11/4
|
Climate
Foundation Models
|
Aurora:
A Foundation Model of the Atmosphere by Bodnar et al., Arxiv
2024.
GenCast: Diffusion-based ensemble forecasting for
medium-range weather, Price et al., Arxiv
2024.
|
GraphCast: Learning skillful medium-range global
weather forecasting, Lam et al., Arxiv,
2022.
ClimaX: A foundation model for weather and climate, Nguyen et al., Arxiv 2023.
|
21. 11/6
|
Near dupl
icate Detection
|
Mining Massive Data
Sets Chapter 3, by Ullman.
On the resemblance
and containment of documents, by Broder, Proceedings. Compression and
Complexity of SEQUENCES 1997.
A
Self-Supervised Descriptor for Image Copy Detection, by Pizzi et al., CVPR, 2022.
|
|
22. 11/11
|
Data cleaning
|
DataComp-LM:
In search of the next generation of training sets for language models, by
Li et al., Arxiv, 2024.
Filtering,
Distillation, and Hard Negatives for Vision-Language Pre-Training, by Radenovic et al., CVPR, 2023.
DINOv2: Learning Robust Visual
Features without Supervision, by Oquab et al., TMLR,
2024.
The FineWeb
Datasets: Decanting the Web for the Finest Text Data at Scale, by Penedo et al., Arxiv, 2024.
|
LAION
and CSAM
|
23. 11/13
|
Data
ownership, issues of restricted data
|
Surveillance
Capitalism, by Zuboff, Project Syndicate, 2020.
Why
Technology Favors Tyranny, by Harrari, The Atlantic,
2018.
AI
Art and its Impact on Artists, by Jiang et al., AIES 2023.
|
|
24. 11/18
|
Generative
models overview
Image
Generation Diffusion
|
Denoising
Diffusion Probabilistic Models, by Ho et al., Neurips,
2020.
High-Resolution
Image Synthesis with Latent Diffusion Models, by Rombach
et al., CVPR 2022.
Scalable
Diffusion Models with Transformers, by Peebles and Xie,
ICCV 2023.
|
Tutorials:
Arash, Yang
Trolls
Used Her Face to Make Fake Porn. There
Was Nothing She Could , Do, by Kraft, 2024, New
York Times Magazine.
|
25. 11/
20
|
Generation
auto-regressive
|
|
|
26. 11/25
|
Molecular machine learning at scale
|
An
Introduction to Electrocatalyst Design using Machine Learning for Renewable
Energy Storage
https://arxiv.org/abs/2010.09435
A Hitchhiker's Guide to Geometric GNNs for 3D
Atomic Systems
https://arxiv.org/abs/2312.07511
Towards Training Billion Parameter Graph Neural
Networks for Atomic Simulations
https://openreview.net/forum?id=0jP2n0YFmKG
|
|
27. 12/2
|
Scaling Laws
|
Scaling Laws
for Neural Language Models, by Kaplan, et al., Arxiv,
2020.
Training
Compute Optimal Large Language Models, Neurips,
2022.
|
|
28. 12/4
|
Chain of
thought, Emergent Properties
|
Chain-of-Thought
Prompting Elicits Reasoning in Large Language Models, by Wei et al., Neurips 2022.
Are
Emergent Abilities of Large Language Models a Mirage?,
by Schaeffer et al., Neurips 2024.
|
|
29. 12/9
|
Conclusions
|
|
|
??
|
Other
Possible Topics
|
Distillation
and/or model pruning
Other
applications (autonomous driving, …)
Existential
risks of AI
|
|
Other Papers
of Interest
|
InstructBLIP:
Towards General-purpose Vision-Language Models with Instruction Tuning by
Dai et al., Neurips 2023.
What learning
algorithm is in-context learning?, by Akyurek
et al., ICLR 2023.
The Platonic Representation Hypothesis,
by Huh et al., Arxiv, 2024.
Sparks of Artificial General
Intelligence: Early experiments with GPT-4, by Bubeck
et al., Arxiv, 2023.
Sora,
by OpenAI
VideoPoet: A Large Language Model for Zero-Shot Video
Generation, by Kondratyuk et al., ICML 2024.
Genie: Generative Interactive
Environments, by Bruce et al., 2024.
More Sora
Video Poet web page, with
videos
|