Complete first lecture.

This commit is contained in:
Justin Hsu 2018-09-04 22:31:32 -05:00
parent 89fa5195f8
commit a82541f53a
4 changed files with 57 additions and 2 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 803 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 165 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 681 KiB

View File

@ -8,10 +8,16 @@ date: September 05, 2018
## It's everywhere! ## It's everywhere!
![](images/iot-cameras.png)
## Stuff is totally insecure! ## Stuff is totally insecure!
![](images/broken.png)
## It's really difficult! ## It's really difficult!
![](images/netflix.png)
# What topics to cover? # What topics to cover?
## A really, really vast field ## A really, really vast field
@ -109,19 +115,68 @@ date: September 05, 2018
# Defining privacy # Defining privacy
## What does privacy mean? ## What does privacy mean?
- Many meanings of privacy - Many kinds of "privacy breaches"
- Obvious: third party learns your private data
- Retention: you give data, company keeps it forever
- Passive: you don't know your data is collected
## Why is privacy hard? ## Why is privacy hard?
- Hard to pin down what privacy means!
- Once data is out, can't put it back into the bottle
- Privacy-preserving data release today may violate privacy tomorrow, combined
with "side-information"
- Data may be used many times, often doesn't change
## Hiding private data ## Hiding private data
- Remove "personally identifiable information" - Delete "personally identifiable information"
- Name and age
- Birthday
- Social security number
- ...
- Publish the "anonymized" or "sanitized" data
## Problem: not enough ## Problem: not enough
- Can match up anonymized data with public sources
- *De-anonymize* data, associate names to records
- Really, really hard to think about side information
- May not even be public at time of data release!
## Netflix challenge
- Database of movie ratings
- Published: ID number, movie rating, and rating date
- Attack: from public IMDB ratings, recover names for Netflix data
## "Blending in a crowd" ## "Blending in a crowd"
- Only release records that are similar to others
- *k-anonymity*: require at least k identical records
- Other variants: *l-diversity*, *t-closeness*, ...
## Problem: composition ## Problem: composition
- Repeating k-anonymous releases may lose privacy
- Privacy protection may fall off a cliff
- First few queries fine, then suddenly total violation
- Again, interacts poorly with side-information
## Differential privacy ## Differential privacy
- Proposed by Dwork, McSherry, Nissim, Smith (2006)
> A new approach to formulating privacy goals: the risk to ones privacy, or in
> general, any type of risk... should not substantially increase as a result of
> participating in a statistical database. This is captured by differential
> privacy.
## Basic setting
- Private data: set of records from individuals
- Each individual: one record
- Example: set of medical records
- Private query: function from database to output
- Randomized: adds noise to protect privacy
## Basic definition ## Basic definition
A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two
databases $db, db'$ that differ in **one individual's record**, and for every
subset $S$ of outputs, we have:
$$
\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
$$