PhD Defense: Advancing Visual Assets: Taming Deep Priors for Editing and Enhancement

Talk
Yiran Xu
Time: 
04.01.2025 12:30 to 15:00
Location: 

Visual data, including images and videos, are valuable assets with broad commercial, artistic, and personal significance. As digital content consumption continues to grow, there is an increasing demand for methods that can enhance visual quality, improve adaptability across different formats, and enable efficient content editing. However, achieving these enhancements manually is both labor-intensive and technically challenging. Recent advances in deep learning have introduced powerful generative models (e.g., GANs, diffusion models) and human-aligned visual representations (e.g., VGG, DINO-v2) that offer promising capabilities for improving visual assets. Yet, directly applying these models to real-world editing and enhancement tasks often introduces artifacts and inconsistencies, such as temporal flickering in videos, limited generalization to out-of-distribution (OOD) data, and misalignment between high-level priors and low-level structures. This thesis explores strategies to “tame” these deep priors, converting their potential into more controllable and reliable tools for visual asset enhancement.This dissertation presents four key contributions in editing and enhancement, each demonstrating how to adapt deep priors for improving visual content usability, quality, and consistency. First, we introduce a video editing framework that enforces temporal consistency by optimizing latent codes and the generator itself, reducing flickering artifacts in edited videos. Second, we propose a method to improve generative priors for OOD data using a volumetric decomposition approach, enabling high-fidelity image reconstructions while maintaining editability. Third, we develop VideoGigaGAN, a large-scale video super-resolution model that extends an image super-resolution model to video, enhancing both spatial resolution and temporal coherence. Finally, we explore image retargeting by leveraging perceptual priors to intelligently adapt content to different aspect ratios without compromising visual coherence.By addressing these challenges, this thesis contributes to the broader goal of harnessing deep priors for real-world visual asset enhancement. The proposed approaches demonstrate that by adapting and refining generative priors, we can develop more reliable, high-quality, and scalable solutions for visual editing tasks. These contributions have potential applications in media production, content creation, digital art, and real-time video processing, paving the way for future research in deep learning-driven visual content adaptation.