PhD Proposal: Learning Structured Alignment: From Visual Understanding to Robot Control

Talk
Shuaiyi Huang
Time: 
01.10.2025 16:30 to 17:30
Location: 

IRB IRB-3137

The past decade has witnessed remarkable progress in computer vision and robotics, driven by deep learning. However, a fundamental challenge remains: how to efficiently align and establish correspondences across different domains and modalities. In this talk, we introduce novel frameworks for learning structured alignment, progressing from visual understanding to robot control.
First, we present our work SCorrSAN for learning pixel-level semantic alignment in image pairs using sparse key point supervision through a teacher-student framework. Next, we highlight our work PointVIS and UVIS on instance-level alignment in videos, where we achieve high-quality video instance segmentation results using only point or without any supervision. Then, transitioning to robotics, we introduce ARDuP, a method for video-based policy learning that aligns generated visual plans with language instructions for effective control.
Finally, we will discuss proposed future research directions. These include aligning agent behavior with human preferences under noisy feedback and learning dense point alignment to enhance video action recognition.