Complete first lecture.

2018-09-04 22:31:32 -05:00 · 2018-09-04 22:31:32 -05:00 · a82541f53a
commit a82541f53a
parent 89fa5195f8
4 changed files with 57 additions and 2 deletions
--- a/website/docs/resources/slides/images/broken.png
+++ b/website/docs/resources/slides/images/broken.png
--- a/website/docs/resources/slides/images/iot-cameras.png
+++ b/website/docs/resources/slides/images/iot-cameras.png
--- a/website/docs/resources/slides/images/netflix.png
+++ b/website/docs/resources/slides/images/netflix.png
--- a/website/docs/resources/slides/lecture01.md
+++ b/website/docs/resources/slides/lecture01.md
@ -8,10 +8,16 @@ date: September 05, 2018
 ## It's everywhere!
 ![](images/iot-cameras.png)
 ## Stuff is totally insecure!
 ![](images/broken.png)
 ## It's really difficult!
 ![](images/netflix.png)
 # What topics to cover?
 ## A really, really vast field
@ -109,19 +115,68 @@ date: September 05, 2018
 # Defining privacy
 ## What does privacy mean?
- Many meanings of privacy
+- Many kinds of "privacy breaches"
    - Obvious: third party learns your private data
    - Retention: you give data, company keeps it forever
    - Passive: you don't know your data is collected
 ## Why is privacy hard?
 - Hard to pin down what privacy means!
 - Once data is out, can't put it back into the bottle
 - Privacy-preserving data release today may violate privacy tomorrow, combined
  with "side-information"
 - Data may be used many times, often doesn't change
 ## Hiding private data
- Remove "personally identifiable information"
+- Delete "personally identifiable information"
    - Name and age
    - Birthday
    - Social security number
    - ...
 - Publish the "anonymized" or "sanitized" data
 ## Problem: not enough
 - Can match up anonymized data with public sources
 - *De-anonymize* data, associate names to records
 - Really, really hard to think about side information
    - May not even be public at time of data release!
 ## Netflix challenge
 - Database of movie ratings
 - Published: ID number, movie rating, and rating date
 - Attack: from public IMDB ratings, recover names for Netflix data
 ## "Blending in a crowd"
 - Only release records that are similar to others
 - *k-anonymity*: require at least k identical records
 - Other variants: *l-diversity*, *t-closeness*, ...
 ## Problem: composition
 - Repeating k-anonymous releases may lose privacy
 - Privacy protection may fall off a cliff
    - First few queries fine, then suddenly total violation
 - Again, interacts poorly with side-information
 ## Differential privacy
 - Proposed by Dwork, McSherry, Nissim, Smith (2006)
 > A new approach to formulating privacy goals: the risk to one’s privacy, or in
 > general, any type of risk... should not substantially increase as a result of
 > participating in a statistical database.  This is captured by differential
 > privacy.
 ## Basic setting
 - Private data: set of records from individuals
    - Each individual: one record
    - Example: set of medical records
 - Private query: function from database to output
    - Randomized: adds noise to protect privacy
 ## Basic definition
 A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two
 databases $db, db'$ that differ in **one individual's record**, and for every
 subset $S$ of outputs, we have:
 $$
 \Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
 $$