2018-09-04 05:02:03 +00:00
|
|
|
|
---
|
2018-09-10 04:45:12 +00:00
|
|
|
|
author: Topics in Security and Privacy Technologies (CS 839)
|
2018-09-04 05:02:03 +00:00
|
|
|
|
title: Lecture 01
|
|
|
|
|
date: September 05, 2018
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
# Security and Privacy
|
|
|
|
|
|
|
|
|
|
## It's everywhere!
|
|
|
|
|
|
2018-09-05 03:31:32 +00:00
|
|
|
|
![](images/iot-cameras.png)
|
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
## Stuff is totally insecure!
|
|
|
|
|
|
2018-09-05 03:31:32 +00:00
|
|
|
|
![](images/broken.png)
|
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
## It's really difficult!
|
|
|
|
|
|
2018-09-05 03:31:32 +00:00
|
|
|
|
![](images/netflix.png)
|
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
# What topics to cover?
|
|
|
|
|
|
|
|
|
|
## A really, really vast field
|
|
|
|
|
- Things we will not be able to cover:
|
|
|
|
|
- Real-world attacks
|
|
|
|
|
- Computer systems security
|
|
|
|
|
- Defenses and countermeasures
|
|
|
|
|
- Social aspects of security
|
|
|
|
|
- Theoretical cryptography
|
|
|
|
|
- ...
|
|
|
|
|
|
|
|
|
|
## Theme 1: Formalizing S&P
|
|
|
|
|
- Mathematically formalize notions of security
|
|
|
|
|
- Rigorously prove security
|
|
|
|
|
- Guarantee that certain breakages can't occur
|
|
|
|
|
|
|
|
|
|
> Remember: definitions are tricky things!
|
|
|
|
|
|
|
|
|
|
## Theme 2: Automating S&P
|
|
|
|
|
- Use computers to help build more secure systems
|
|
|
|
|
- Automatically check security properties
|
|
|
|
|
- Search for attacks and vulnerabilities
|
|
|
|
|
|
|
|
|
|
## Our focus: four modules
|
|
|
|
|
1. Differential privacy
|
|
|
|
|
2. Applied cryptography
|
|
|
|
|
3. Language-based security
|
|
|
|
|
4. Adversarial machine learning
|
|
|
|
|
|
|
|
|
|
# Differential privacy
|
|
|
|
|
|
|
|
|
|
## A mathematically solid definition of privacy
|
|
|
|
|
- Simple and clean formal property
|
|
|
|
|
- Satisfied by many algorithms
|
|
|
|
|
- Degrades gracefully under composition
|
|
|
|
|
|
|
|
|
|
# Applied crypto
|
|
|
|
|
|
|
|
|
|
## Computing in an untrusted world
|
|
|
|
|
- Proving you know something without revealing it
|
|
|
|
|
- Certifying that you did a computation correctly
|
|
|
|
|
- Computing on encrypted data, without decryption
|
|
|
|
|
- Computing joint answer without revealing your data
|
|
|
|
|
|
|
|
|
|
# Language-based security
|
|
|
|
|
|
|
|
|
|
## Ensure security by construction
|
|
|
|
|
- Programming languages for security
|
|
|
|
|
- Compiler checks that programs are secure
|
|
|
|
|
- Information flow, privacy, cryptography, ...
|
|
|
|
|
|
|
|
|
|
# Adversarial machine learning
|
|
|
|
|
|
|
|
|
|
## Manipulating ML systems
|
|
|
|
|
- Crafting examples to fool ML systems
|
|
|
|
|
- Messing with training data
|
|
|
|
|
- Extracting training information
|
|
|
|
|
|
|
|
|
|
# Tedious course details
|
|
|
|
|
|
|
|
|
|
## Class format
|
|
|
|
|
- Three components:
|
|
|
|
|
1. Paper presentations
|
|
|
|
|
2. Final project
|
|
|
|
|
3. Class participation
|
|
|
|
|
- Annoucement/schedule/materials: on [website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs839/)
|
|
|
|
|
- Class mailing list: [compsci839-1-f18@lists.wisc.edu]()
|
|
|
|
|
|
|
|
|
|
## Paper presentations
|
|
|
|
|
- Sign up to lead a discussion on one paper
|
|
|
|
|
- Suggested topic, papers, and schedule on website
|
|
|
|
|
- Before each presentation:
|
|
|
|
|
- I will send out brief questions
|
|
|
|
|
- Please email me brief answers
|
|
|
|
|
|
|
|
|
|
> If you want advice, come talk to me!
|
|
|
|
|
|
|
|
|
|
## Final project
|
|
|
|
|
- Work individually or in pairs
|
|
|
|
|
- Project details and suggestions on website
|
|
|
|
|
- Key dates:
|
|
|
|
|
- **September 19**: Pick groups and topic
|
|
|
|
|
- **October 15**: Milestone 1
|
|
|
|
|
- **November 14**: Milestone 2
|
|
|
|
|
- **End of class**: Final writeups and presentations
|
|
|
|
|
|
|
|
|
|
> If you want advice, come talk to me!
|
|
|
|
|
|
|
|
|
|
## Todos for you
|
|
|
|
|
0. Complete the course survey
|
|
|
|
|
1. Check out the course website
|
|
|
|
|
2. Think about what paper you want to present
|
|
|
|
|
3. Brainstorm project topics
|
|
|
|
|
|
|
|
|
|
# Defining privacy
|
|
|
|
|
|
|
|
|
|
## What does privacy mean?
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Many kinds of "privacy breaches"
|
|
|
|
|
- Obvious: third party learns your private data
|
|
|
|
|
- Retention: you give data, company keeps it forever
|
|
|
|
|
- Passive: you don't know your data is collected
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Why is privacy hard?
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Hard to pin down what privacy means!
|
|
|
|
|
- Once data is out, can't put it back into the bottle
|
|
|
|
|
- Privacy-preserving data release today may violate privacy tomorrow, combined
|
|
|
|
|
with "side-information"
|
|
|
|
|
- Data may be used many times, often doesn't change
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Hiding private data
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Delete "personally identifiable information"
|
|
|
|
|
- Name and age
|
|
|
|
|
- Birthday
|
|
|
|
|
- Social security number
|
|
|
|
|
- ...
|
|
|
|
|
- Publish the "anonymized" or "sanitized" data
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Problem: not enough
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Can match up anonymized data with public sources
|
|
|
|
|
- *De-anonymize* data, associate names to records
|
|
|
|
|
- Really, really hard to think about side information
|
|
|
|
|
- May not even be public at time of data release!
|
|
|
|
|
|
|
|
|
|
## Netflix challenge
|
|
|
|
|
- Database of movie ratings
|
|
|
|
|
- Published: ID number, movie rating, and rating date
|
|
|
|
|
- Attack: from public IMDB ratings, recover names for Netflix data
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## "Blending in a crowd"
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Only release records that are similar to others
|
|
|
|
|
- *k-anonymity*: require at least k identical records
|
|
|
|
|
- Other variants: *l-diversity*, *t-closeness*, ...
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Problem: composition
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Repeating k-anonymous releases may lose privacy
|
|
|
|
|
- Privacy protection may fall off a cliff
|
|
|
|
|
- First few queries fine, then suddenly total violation
|
|
|
|
|
- Again, interacts poorly with side-information
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Differential privacy
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Proposed by Dwork, McSherry, Nissim, Smith (2006)
|
|
|
|
|
|
|
|
|
|
> A new approach to formulating privacy goals: the risk to one’s privacy, or in
|
|
|
|
|
> general, any type of risk... should not substantially increase as a result of
|
|
|
|
|
> participating in a statistical database. This is captured by differential
|
|
|
|
|
> privacy.
|
|
|
|
|
|
|
|
|
|
## Basic setting
|
|
|
|
|
- Private data: set of records from individuals
|
|
|
|
|
- Each individual: one record
|
|
|
|
|
- Example: set of medical records
|
|
|
|
|
- Private query: function from database to output
|
|
|
|
|
- Randomized: adds noise to protect privacy
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Basic definition
|
2018-09-05 03:31:32 +00:00
|
|
|
|
A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two
|
|
|
|
|
databases $db, db'$ that differ in **one individual's record**, and for every
|
|
|
|
|
subset $S$ of outputs, we have:
|
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
|
|
|
|
|
$$
|