**This is a tentative schedule, and will be filled out with more details over the first two weeks of the semester.**
Weeks 1 and 2 (Jan 26/28, Feb 2/4): Introduction, Regulations [show/hide]
Brief Description: We will go through the privacy and ethics issues in some detail, discuss some of the regulations like GDPR, and how they relate to data management systems and data architectures.
Comparing the benefits of pseudonymisation and anonymisation under the GDPR; Hintze and El Emam; Data Protection and Privacy, 2018.
Week 3 (Feb 9/11): New Systems Proposals[show/hide]
Brief Description: In the last few years, several vision papers have laid out thoughts on how to build data management systems to support GDPR and likes. These papers will be helpful in making the challenges in designing privacy-first systems more concrete.
Weeks 4-5 (Feb 16/18/23/25): Statistical Privacy, Pseudonymization, and De-Anonymization[show/hide]
Brief Description: We will discuss the original proposals for statistical privacy (k-anonymity, l-diversity, etc), and the work on differential privacy, both theoretical and systems. We will also discuss some of the papers on de-anonymization.
Required Readings - Feb 16, 2021:
Broken promises of privacy: responding to the surprising failure of anonymization; Paul Ohms; 2009 (Chapter I only)
Unique in the crowd: The privacy bounds of human mobility; De Montjoye et al.; Nature Scientific Reports; 2013
Required Readings - Feb 18, 2021:
Practical privacy: the SuLQ framework (Sections 1-3.2); Avrim Blum, Cynthia Dwork, Frank McSherry, Kobbi Nissim; PODS 2005
Dwork, Roth; Algorithmic Foundations of Differential Privacy (First Two Chapters); Foundations and Trends 9(3-4), 2014
Required Readings - Feb 23, 2021:
Privacy: Theory meets practice on the map; Ashwin Machanavajjhala, Daniel Kifer, John Abowd, Johannes Gehrke, Lars Vilhuber; ICDE 2008
Relevant Papers (Required readings to be posted later):
A critical appraisal of the Article 29 Working Party Opinion 05/2014 on data anonymization techniques; El Emam and Alvarez; Internaional Data Privacy law, 2014.
Robust De-anonymization of Large Sparse Datasets; A. Narayanan and V. Shmatikov; IEEE Symposium on Security and Privacy, 2008
The Fienberg Problem: How to Allow Human Interactive Data Analysis in the Age of Differential Privacy; Cynthia Dwork and Jonathan Ullman; J. Priv. Confidentiality; 2010
PCPs and the hardness of generating private synthetic data; Ullman, Jonathan and Vadhan, Salil; Theory of Cryptography Conference, 2011
Week 6 (Mar 4): Privacy and Machine Learning[show/hide]
Brief Description: We will discuss some of the recent work on machine learning in a privacy-preserving way, focusing primarily on work on using differential privacy and federated learning.
Relevant Papers (Required readings to be posted later):
Abadi et al, "Differentially Private Deep Learning", ACM CCS 2016
Advances and Open Problems in Federated Learning; Kairouz et al. (Large Survey-style Paper, with Chapter 4 focusing on Privacy Issues).
Weeks 7-8-9 (Mar 9/11/23/25/30, April 1/6/8): Encrypted Databases/Secure Computation[show/hide]
Brief Description: We will discuss the work on using encryption to hide data from curious or malicious adversaries while allowing users to run queries, and different types of attacks on the data.
Brief Description: We will discuss the work on supporting fairness, explanability, and other ethics issues, focusing on the work in databases or systems.
Brief Description: We will discuss a few other technologies that are relevant in this context, at a high level. This include, e.g., blockchain-based proposals to bring auditability and multi-party transactions in database systems.
Relevant Papers (Required readings to be posted later):
SAQE: Practical Privacy-Preserving Approximate Query Processing for Data Federations; Bater et al.; VLDB 2020.
Veritas: Shared verifiable databases and tables in the cloud; Allen et al.; CIDR 2019
IntegriDB: Verifiable SQL for outsourced databases; Zhang et al.; SIGSAC 2015
Sieve: A Middleware Approach to Scalable Access Control for Database Management Systems; Pappachan, Primal and Yus, Roberto and Mehrotra, Sharad and Freytag, Johann-Christoph; VLDB 2020
Obscure: Information-theoretic oblivious and verifiable aggregation queries; Gupta et al.; VLDB 2019
PANDA: Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data; Mehrotra et al.; ACM Transactions on Management Information Systems (TMIS), 2020