Privacy, Copyright, and Data Integrity: The Cascading Implications of Generative AI

Talk
Niloofar Mireshghallah
Time: 
02.17.2025 11:00 to 12:00

The rapid adoption of generative AI has created a cycle where personal information cascades perpetually: from people to models to applications and online platforms, then back through scrapers into the system. Simple blanket rules such as "don't train on this data" or "don't share sensitive information" are inadequate, as we face training data scarcity while these models are already deeply integrated into people's daily lives. In this talk, rather than examining data, people, and models in isolation and setting rigid rules, we will reason about their interplay by discussing three research directions: (1) measuring the imprint of data on models through novel membership inference attacks and uncovering memorization patterns, (2) developing algorithmic approaches to help people control the exposure of their data while preserving utility, and (3) grounding model evaluations in legal and social frameworks, particularly the theory of contextual integrity. Looking ahead, we discuss emerging directions in building on-device privacy controls and nudging mechanisms, formalizing semantic memorization, and developing model capabilities such as abstraction, composition, and inhibition to enable controllable generation of outputs.