Video Decomposition Prior: Editing Videos Layer by Layer
Accepted in International Conference on Learning Representations (ICLR 2024)

 


 

Abstract

 

In the evolving landscape of video editing methodologies, a majority of deep learning techniques are often reliant on extensive datasets of observed input and ground truth sequence pairs for optimal performance. Such reliance often falters when acquiring data becomes challenging, especially in tasks like video dehazing and relighting, where replicating identical motions and camera angles in both corrupted and ground truth sequences is complicated. Moreover, these conventional methodologies perform best when the test distribution closely mirrors the training distribution. Recognizing these challenges, this paper introduces a novel video decomposition prior `VDP' framework which derives inspiration from professional video editing practices. Our methodology does not mandate task-specific external data corpus collection, instead pivots to utilizing the motion and appearance of the input video. VDP framework decomposes a video sequence into a set of multiple RGB layers and associated opacity levels. These set of layers are then manipulated individually to obtain the desired results. We addresses tasks such as video object segmentation, dehazing, and relighting. Moreover, we introduce a novel logarithmic video decomposition formulation for video relighting tasks, setting a new benchmark over the existing methodologies. We evaluate our approach on standard video datasets like DAVIS, REVIDE, & SDSD and show qualitative results on a diverse array of internet videos.

 

 

Video Relighting (Qualitative Results - Internet Videos)

 

 

Qualitative Results on SDSD dataset:

 

Lowlit Input Video
ZeroDCE++
SDSD
StableLLVE
Ours

 

Comparison with ZeroDCE++ (Zoomed-in):

 

ZeroDCE++
Ours
Note, in the above video examples, it can be observed that the output produced by ZeroDCE++ have a lot of flickering noise effects.

 
 
 

 

Video Decomposition - UVOS (Qualitative Results - DAVIS Dataset)

 

Input Video
Alpha
Object Layer1
Background Layer

 

Multiple Layers Decomposition:

 

 
 
 

 

Coherent Edits (Qualitative Results)

 

Edit propagation (Added Decals):

 

Original Video
Edits
OURS
Deformable Sprites
Naive Flow warping

 

Edit propagation (Salient Object Stylized):

 

Edit
Foreground Stylized

 

Edit propagation (Swap Background):

 

 
 
 

Video Dehazing (Qualitative Results - Internet Videos)

 

 

Qualitative Evaluations (On REVIDE dataset):

 

Hazy Video
Ours
CGIDN
DoubleDIP