Complete first lecture.
This commit is contained in:
parent
89fa5195f8
commit
a82541f53a
BIN
website/docs/resources/slides/images/broken.png
Normal file
BIN
website/docs/resources/slides/images/broken.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 803 KiB |
BIN
website/docs/resources/slides/images/iot-cameras.png
Normal file
BIN
website/docs/resources/slides/images/iot-cameras.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 165 KiB |
BIN
website/docs/resources/slides/images/netflix.png
Normal file
BIN
website/docs/resources/slides/images/netflix.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 681 KiB |
@ -8,10 +8,16 @@ date: September 05, 2018
|
||||
|
||||
## It's everywhere!
|
||||
|
||||
![](images/iot-cameras.png)
|
||||
|
||||
## Stuff is totally insecure!
|
||||
|
||||
![](images/broken.png)
|
||||
|
||||
## It's really difficult!
|
||||
|
||||
![](images/netflix.png)
|
||||
|
||||
# What topics to cover?
|
||||
|
||||
## A really, really vast field
|
||||
@ -109,19 +115,68 @@ date: September 05, 2018
|
||||
# Defining privacy
|
||||
|
||||
## What does privacy mean?
|
||||
- Many meanings of privacy
|
||||
- Many kinds of "privacy breaches"
|
||||
- Obvious: third party learns your private data
|
||||
- Retention: you give data, company keeps it forever
|
||||
- Passive: you don't know your data is collected
|
||||
|
||||
## Why is privacy hard?
|
||||
- Hard to pin down what privacy means!
|
||||
- Once data is out, can't put it back into the bottle
|
||||
- Privacy-preserving data release today may violate privacy tomorrow, combined
|
||||
with "side-information"
|
||||
- Data may be used many times, often doesn't change
|
||||
|
||||
## Hiding private data
|
||||
- Remove "personally identifiable information"
|
||||
- Delete "personally identifiable information"
|
||||
- Name and age
|
||||
- Birthday
|
||||
- Social security number
|
||||
- ...
|
||||
- Publish the "anonymized" or "sanitized" data
|
||||
|
||||
## Problem: not enough
|
||||
- Can match up anonymized data with public sources
|
||||
- *De-anonymize* data, associate names to records
|
||||
- Really, really hard to think about side information
|
||||
- May not even be public at time of data release!
|
||||
|
||||
## Netflix challenge
|
||||
- Database of movie ratings
|
||||
- Published: ID number, movie rating, and rating date
|
||||
- Attack: from public IMDB ratings, recover names for Netflix data
|
||||
|
||||
## "Blending in a crowd"
|
||||
- Only release records that are similar to others
|
||||
- *k-anonymity*: require at least k identical records
|
||||
- Other variants: *l-diversity*, *t-closeness*, ...
|
||||
|
||||
## Problem: composition
|
||||
- Repeating k-anonymous releases may lose privacy
|
||||
- Privacy protection may fall off a cliff
|
||||
- First few queries fine, then suddenly total violation
|
||||
- Again, interacts poorly with side-information
|
||||
|
||||
## Differential privacy
|
||||
- Proposed by Dwork, McSherry, Nissim, Smith (2006)
|
||||
|
||||
> A new approach to formulating privacy goals: the risk to one’s privacy, or in
|
||||
> general, any type of risk... should not substantially increase as a result of
|
||||
> participating in a statistical database. This is captured by differential
|
||||
> privacy.
|
||||
|
||||
## Basic setting
|
||||
- Private data: set of records from individuals
|
||||
- Each individual: one record
|
||||
- Example: set of medical records
|
||||
- Private query: function from database to output
|
||||
- Randomized: adds noise to protect privacy
|
||||
|
||||
## Basic definition
|
||||
A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two
|
||||
databases $db, db'$ that differ in **one individual's record**, and for every
|
||||
subset $S$ of outputs, we have:
|
||||
|
||||
$$
|
||||
\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
|
||||
$$
|
||||
|
Reference in New Issue
Block a user