Saksham Suri

I am a Ph.D student in the department of Computer Science at University of Maryland (UMD), College Park, advised by Prof. Abhinav Shrivastava. My research lies in the field of computer vision and machine learning.

During my PhD I have been fortunate to intern with Meta and Amazon. I completed my undergrad at IIIT Delhi (2019) majoring in Computer Science and Engineering where I worked at IAB Lab and Precog.

In the past, I have been fortunate to have worked with Rama Chellappa, Mayank Vatsa, Richa Singh, Ponnurangam Kumaraguru, C. -C. Jay Kuo, Anush Sankaran who have helped me grow as a researcher and a person.

Email  /  CV  /  Google Scholar  /  Twitter

profile photo
Research

I am interested in solving problems using less supervision and uncurated as well as synthetic data. Recently I have been working on improving recognition using generation especially using diffusion models as synthetic data sources. I have previously explored tasks across recognition and generation focusing on different supervision strategies and propose modified architectures and losses to utilize the data better under different settings.

grit Gen2Det: Generate to Detect
Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorth, Chenchen Zhu, Abhinav Shrivastava
Under Submission
Paper

Utilizing synthetic data from state-of-the-art diffusion models to improve object detection and segmentation performance.

grit LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri*, Matthew Walmer*, Kamal Gupta, Abhinav Shrivastava
Under Submission
Project Page | Paper | Code

Self-supervised and lightweight technique to learn a feature transform for generating dense features from pre-trained ViTs.

grit GRIT: GAN Residuals for Image-to-Image Translation
Saksham Suri*, Moustafa Meshry*, Larry S. Davis, Abhinav Shrivastava
Winter Conference on Applications of Computer Vision (WACV), 2024
Project Page | Paper

Decouple the optimization of reconstruction and adversarial losses by synthesizing an image as a combination of its reconstruction (low-frequency) and GAN residual (high-frequency) components.

diff2lip Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization
Soumik Mukhopadhyay, Saksham Suri, Ravi Teja Gadde, Abhinav Shrivastava
Winter Conference on Applications of Computer Vision (WACV), 2024
Project Page

We propose Diff2Lip an audio-conditioned diffusion-based model which is able to do lip synchronization in-the-wild while preserving image fidelity and identity.

saod SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining
Saksham Suri*, Saketh Rambhatla*, Rama Chellappa, Abhinav Shrivastava
IEEE/CVF International Conference on Computer Vision (ICCV) , 2023
Project Page | Paper | Code

Propose an end-to-end system that learns to separate the proposals into labeled and unlabeled regions using Pseudo-positive mining to tackle sparsely annotated object detection.

vit_analysis Teaching Matters: Investigating the Role of Supervision in Vision Transformers
Matthew Walmer*, Saksham Suri*, Kamal Gupta, Abhinav Shrivastava
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
Project Page | Paper | Code

Study the effect of supervision and losses in trianing ViTs through attention, feature and downstream task based analysis.

gan Towards Discovery and Attribution of Open-world GAN Generated Images
Sharath Girish*, Saksham Suri*, Saketh Rambhatla, Abhinav Shrivastava
IEEE/CVF International Conference on Computer Vision (ICCV) , 2021
Project Page | Paper | arXiv

Proposed an iterative algorithm for discovering images generated from GANs in an open world setup. Also show applications in an online never ending discovery.

gan Learned Spatial Representations for Few-shot Talking-Head Synthesis
Moustafa Meshry, Saksham Suri, Larry S. Davis, Abhinav Shrivastava
IEEE/CVF International Conference on Computer Vision (ICCV) , 2021
Project Page | Paper | arXiv

We propose a novel framework which disentangles spatial and style information for image synthesis. A latent spatial layout for the target image is generated, which is used to produce per-pixel style modulation parameters for the final synthesis..

prl Improving Face Recognition Performance using TeCS2 Dictionary
Saksham Suri, Anush Sankaran, Mayank Vatsa, Richa Singh
Pattern Recognition Letters, 2020
Paper

Incorporating task agnostic color, shape, texture and symmetry attributes to task specific deep learning classifiers for face recognition.

icip An Interpretable Generative Model for Handwritten Digits Synthesis
Yao Zhu, Saksham Suri, Pranav Kulkarni, Yueru Chen, Jiali Duan, C. -C. Jay Kuo
International Conference on Image Processing (ICIP) , 2019
Paper

Propose a non deep learning based approach to handwritten digit synthesis which is more interpretable and does not require back-propogation.

ad Angel or Demon? Characterizing Variations Across Twitter Timeline of Technical Support Campaigners
S. Gupta, G. S. Bhatia, Saksham Suri, D. Kuchhal, P. Gupta, M. Ahamad, M. Gupta, P. Kumaraguru
The Journal of Web Science Vol.6 , 2019
Paper

Analyzing and identifying the presence of fake tech support accounts on twitter.

btas On matching faces with alterations due to plastic surgery and disguise
Saksham Suri, Anush Sankaran, Mayank Vatsa, Richa Singh
IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS) , 2018
Paper

A novel approach to perform face recognition in the presence of plastic surgery and disguise.


Template credits