PhD Defense: Foundations of Trustworthy Deep Learning: Fairness, Robustness, and Explainability

Talk
Vedant Nanda
Time: 
04.01.2024 14:00 to 15:00
Location: 

IRB IRB-5105

Deep Learning (DL) models, especially with the rise of the so-called foundation models, are increasingly used in real-world applications either as autonomous systems (eg: facial recognition), as decision aids (eg: medical imaging, writing assistants), and even to generate novel content (eg: chatbots, image generators). This naturally results in concerns about the trustworthiness of these systems, for example, do the models systematically perform worse for certain subgroups? Are the outputs of these models reliable under perturbations to the inputs? This thesis aims to strengthen the foundations of DL models, so they can be trusted in deployment. I will cover three important aspects of trust: fairness, robustness, and explainability. I will argue that we need to expand the scope of each of these aspects when applying them to DL models and carefully consider possible tradeoffs between these desirable but sometimes conflicting notions of trust.In the first part, I will present two works that show how thinking about fairness for deep learning (DL) introduces new challenges, especially due to their overparametrized nature and susceptibility to adversarial attacks. In the second part, I will argue that to get truly robust models, we must focus on a more general notion of robustness: measuring the alignment of invariances of DL models with other models of perception such as humans. Finally, in the last part, I will show how even a small subset of randomly chosen neurons from a pre-trained representation can transfer very well to downstream tasks which challenges existing beliefs in the explainability literature that claim individual neurons learn disjoint semantically meaningful concepts.