Research
I am interested in solving problems using less supervision and uncurated as well as synthetic data. Recently I have been working on improving recognition using generation especially using diffusion models as synthetic data sources. I have previously explored tasks across recognition and generation focusing on different supervision strategies and propose modified architectures and losses to utilize the data better under different settings.
|
|
Gen2Det: Generate to Detect
Saksham Suri,
Fanyi Xiao,
Animesh Sinha,
Sean Chang Culatana,
Raghuraman Krishnamoorth,
Chenchen Zhu,
Abhinav Shrivastava
Under Submission
Paper
Utilizing synthetic data from state-of-the-art diffusion models to improve object detection and segmentation performance.
|
|
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri*,
Matthew Walmer*,
Kamal Gupta,
Abhinav Shrivastava
Under Submission
Project Page | Paper | Code
Self-supervised and lightweight technique to learn a feature transform for generating dense features from pre-trained ViTs.
|
|
GRIT: GAN Residuals for Image-to-Image Translation
Saksham Suri*,
Moustafa Meshry*,
Larry S. Davis,
Abhinav Shrivastava
Winter Conference on Applications of Computer Vision (WACV), 2024
Project Page | Paper
Decouple the optimization of reconstruction and adversarial losses by synthesizing an image as a combination of its reconstruction (low-frequency) and GAN residual (high-frequency) components.
|
|
Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization
Soumik Mukhopadhyay,
Saksham Suri,
Ravi Teja Gadde,
Abhinav Shrivastava
Winter Conference on Applications of Computer Vision (WACV), 2024
Project Page
We propose Diff2Lip an audio-conditioned diffusion-based model which is able to do lip synchronization in-the-wild while preserving image fidelity and identity.
|
|
SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining
Saksham Suri*,
Saketh Rambhatla*,
Rama Chellappa,
Abhinav Shrivastava
IEEE/CVF International Conference on Computer Vision (ICCV) , 2023
Project Page | Paper | Code
Propose an end-to-end system that learns to separate the proposals into labeled and unlabeled regions using Pseudo-positive mining to tackle sparsely annotated object detection.
|
|
Teaching Matters: Investigating the Role of Supervision in Vision Transformers
Matthew Walmer*,
Saksham Suri*,
Kamal Gupta,
Abhinav Shrivastava
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
Project Page | Paper | Code
Study the effect of supervision and losses in trianing ViTs through attention, feature and downstream task based analysis.
|
|
Towards Discovery and Attribution of Open-world GAN Generated Images
Sharath Girish*,
Saksham Suri*,
Saketh Rambhatla,
Abhinav Shrivastava
IEEE/CVF International Conference on Computer Vision (ICCV) , 2021
Project Page | Paper | arXiv
Proposed an iterative algorithm for discovering images generated from GANs in an open world setup. Also show applications in an online never ending discovery.
|
|
Learned Spatial Representations for Few-shot Talking-Head Synthesis
Moustafa Meshry,
Saksham Suri,
Larry S. Davis,
Abhinav Shrivastava
IEEE/CVF International Conference on Computer Vision (ICCV) , 2021
Project Page | Paper | arXiv
We propose a novel framework which disentangles spatial and style information for image synthesis. A latent spatial layout for the target image is generated, which is used to produce per-pixel style modulation parameters for the final synthesis..
|
|
Improving Face Recognition Performance using TeCS2 Dictionary
Saksham Suri,
Anush Sankaran,
Mayank Vatsa,
Richa Singh
Pattern Recognition Letters, 2020
Paper
Incorporating task agnostic color, shape, texture and symmetry attributes to task specific deep learning classifiers for face recognition.
|
|
An Interpretable Generative Model for Handwritten Digits Synthesis
Yao Zhu,
Saksham Suri,
Pranav Kulkarni, Yueru Chen,
Jiali Duan,
C. -C. Jay Kuo
International Conference on Image Processing (ICIP) , 2019
Paper
Propose a non deep learning based approach to handwritten digit synthesis which is more interpretable and does not require back-propogation.
|
|
Angel or Demon? Characterizing Variations Across Twitter Timeline of Technical Support Campaigners
S. Gupta,
G. S. Bhatia,
Saksham Suri,
D. Kuchhal,
P. Gupta,
M. Ahamad,
M. Gupta,
P. Kumaraguru
The Journal of Web Science Vol.6 , 2019
Paper
Analyzing and identifying the presence of fake tech support accounts on twitter.
|
|
On matching faces with alterations due to plastic surgery and disguise
Saksham Suri,
Anush Sankaran,
Mayank Vatsa,
Richa Singh
IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS) , 2018
Paper
A novel approach to perform face recognition in the presence of plastic surgery and disguise.
|
|