PhD Proposal: On Provable Robustness against Data Poisoning
IRB IRB-4105
https://umd.zoom.us/j/2766083717
In data poisoning, attackers maliciously manipulate training data to influence the behavior of learning algorithms. In this talk, I will present the progress my coauthors and I have made in provably mitigating the threat of data poisoning. First, I will introduce aggregation-based certified defenses against general data poisoning. Next, I will discuss the Lethal Dose Conjecture, a theoretical framework that targets the fundamentals of robustness against data poisoning. This conjecture connects optimal robustness to few-shot learning with clean distributions, and I will provide theoretical results that verify the conjecture in multiple cases. I will also explain the significance of this conjecture: If it holds true for a given task, aggregation-based defenses will be asymptotically optimal. Essentially, if we have the most data-efficient learner, we can transform it into one of the most robust defenses against data poisoning, thereby reducing the challenge of data poisoning defense to few-shot learning. Considering that defending against general data poisoning can be theoretically very difficult, where should we go from here? I will present an idea that employs temporal concepts to measure attack budgets, leading to novel threat models of data poisoning that are applicable in practical scenarios where traditional threat models fall short.