Course Description: The future of Artificial Intelligence demands a paradigm shift towards multimodal perception, enabling systems to interpret and fuse information from diverse sensory inputs. While we humans perceive the world by looking, listening, touching, smelling, and tasting, traditional form of machine intelligence has primarily focused on a single sensory modality, often vision. To truly understand the world around us, AI must learn to jointly interpret multimodal signals. This graduate-level seminar course explores computer vision from a multimodal perspective, focusing on learning algorithms that augment vision with other essential modalities, such as audio, touch, language, and more. The majority of the course will consist of student presentations, experiments, and paper discussions, and we will delve into the latest research and advancements in multimodal perception.
Instructor: Ruohan Gao (rhgao [at] umd.edu)
Office: IRB-4248
Office hours: by appointment (send email)
TA: Kelin Yu (kyu85 [at] umd.edu)
Office: IRB-3116
Office hours: by appointment (send email)
- Lectures: Tuesday 3:30PM - 6:00PM Eastern Time at CSI 2120.
- Piazza: We will be using Piazza as the primary platform for communication.
- Canvas: Grades will be released on Canvas.
- Gradescope: Coding assignment is submitted through Gradescope.
- Topic Preferences: Submit your topic preferences for presentations through this Google Form.
Course Prerequisites: Familiarity with introductory courses in computer vision (CMSC426 or similar) and machine/deep learning (CMSC422 or similar) is recommended; Ability to understand and analyze conference papers in this area is required; Programming with deep learning frameworks is needed for experiment presentations and projects. I would strongly suggest scanning through a few papers and the topics on the syllabus to gauge what kind of background is expected. You don't have to know every single algorithm/tool/feature a given paper mentions, but you should feel comfortable following the key ideas. Please talk to me if you are unsure whether the course is a good match for your background.
Requirements Summary:
- Paper Reviews: writing two paper reviews each week (except for the two weeks you are presenting) and submitting a PDF on Canvas.
- Class Discussion: participating in discussions during class.
- Paper Presentation: presenting one optional paper for the assigned topic individually.
- Experiment Presentation: presenting experiment results for one of the required papers of the assigned topic with a partner.
- Coding Assignment: one warmup coding assignment during the first half of the course.
- Midterm Exam: a takehome midterm exam that contains some questions based on readings and lectures and a mock peer review task.
- Final Project: completing a research-oriented final project with one or two partners.
Grading Summary:
- 20% Paper Presentations (once on paper presentation and once on experiment presentation)
- 20% Homework Assignments (weekly paper reviews and one coding assignment)
- 30% Midterm Exam
- 30% Final Project (including project proposal, extended abstract, final report, and presentation)
- 5% Extra Credit on Class Participation (paper discussions, debate, exceptional presentation, etc.)
- Monday each week: paper reviews for that week are due at 8pm ET.
- One week before your paper presentation date: send presentation slides draft to instructor by email.
- Tuesday, Feb 4: paper presentation topic preference due.
- Friday, Feb 21: coding assignment released.
- Monday, March 10: coding assignment due at 11:59pm ET.
- Monday, March 24: one page project proposal due at 11:59pm ET.
- Friday, April 25: four page extended abstract for peer review due at 11:59pm ET.
- Monday, April 28: take-home midterm exam released.
- Friday, May 2: take-home midterm exam due at 11:59pm ET.
- Friday, May 16: final project report due at 11:59pm ET.
- Marked in Green: denotes required papers (2 each week), which the entire class should read and write paper reviews; experiment presenters will present these papers.
- Marked in Blue: denotes optional papers (4 each week), which paper presenters will present; optional reading for the rest of the class.
- Unmarked: denotes reference papers, which are listed as reference if you want to read further on that topic or for your final project.