diff --git a/website/docs/resources/slides/lecture-welcome.md b/website/docs/resources/slides/lecture-welcome.md index 10d2620..c9d06e5 100644 --- a/website/docs/resources/slides/lecture-welcome.md +++ b/website/docs/resources/slides/lecture-welcome.md @@ -14,10 +14,6 @@ date: September 04, 2019 ![](images/broken.png) -## It's really difficult! - -![](images/netflix.png) - # What topics to cover? ## A really, really vast field @@ -173,7 +169,7 @@ date: September 04, 2019 ## Todos for you 0. Complete the [course survey](https://forms.gle/NvYx3BM7HVkuzYdG6) 1. Explore the [course website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/) -2. Think about which lecture you want to present and summarize +2. Think about which lecture you want to present 3. Think about which lecture you want to summarize 4. Form project groups and brainstorm topics @@ -217,10 +213,28 @@ date: September 04, 2019 - Really, really hard to think about side information - May not even be public at time of data release! -## Netflix challenge +## Netflix prize - Database of movie ratings - Published: ID number, movie rating, and rating date -- Attack: from public IMDB ratings, recover names for Netflix data +- Competition: predict which movies IDs will like +- Result + - Tons of teams competed + - Winner: beat Netflix's best by **10%** + +> A triumph for machine learning contests! + +## + +![](images/netflix.png) + + +## Privacy flaw? +- Attack + - Public info on IMDB: names, ratings, dates + - Reconstruct names for Netflix IDs +- Result + - Netflix settled lawsuit ($10 million) + - Netflix canceled future challenges ## "Blending in a crowd" - Only release records that are similar to others @@ -233,14 +247,17 @@ date: September 04, 2019 - First few queries fine, then suddenly total violation - Again, interacts poorly with side-information -## Differential privacy -- Proposed by Dwork, McSherry, Nissim, Smith (2006) +# Differential privacy + +## Yet another privacy definition > A new approach to formulating privacy goals: the risk to one’s privacy, or in > general, any type of risk... should not substantially increase as a result of > participating in a statistical database. This is captured by differential > privacy. +- Proposed by Dwork, McSherry, Nissim, Smith (2006) + ## Basic setting - Private data: set of records from individuals - Each individual: one record @@ -256,3 +273,10 @@ subset $S$ of outputs, we have: $$ \Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta $$ + +## Basic reading +> Output of program doesn't depend too much on any single person's data + +- Property of the algorithm/query/program + - No: "this data is differentially private" + - Yes: "this query is differentially private"