cs763/website/docs/resources/slides/lecture01.md

---
author: Advanced Topics in Security and Privacy (CS 839)
title: Lecture 01
date: September 05, 2018
---

# Security and Privacy

## It's everywhere!

![](images/iot-cameras.png)

## Stuff is totally insecure!

![](images/broken.png)

## It's really difficult!

![](images/netflix.png)

# What topics to cover?

## A really, really vast field
- Things we will not be able to cover:
    - Real-world attacks
    - Computer systems security
    - Defenses and countermeasures
    - Social aspects of security
    - Theoretical cryptography
    - ...

## Theme 1: Formalizing S&P
- Mathematically formalize notions of security
- Rigorously prove security
- Guarantee that certain breakages can't occur

> Remember: definitions are tricky things!

## Theme 2: Automating S&P
- Use computers to help build more secure systems
- Automatically check security properties
- Search for attacks and vulnerabilities

## Our focus: four modules
1. Differential privacy
2. Applied cryptography
3. Language-based security
4. Adversarial machine learning

# Differential privacy

## A mathematically solid definition of privacy
- Simple and clean formal property
- Satisfied by many algorithms
- Degrades gracefully under composition

# Applied crypto

## Computing in an untrusted world
- Proving you know something without revealing it
- Certifying that you did a computation correctly
- Computing on encrypted data, without decryption
- Computing joint answer without revealing your data

# Language-based security

## Ensure security by construction
- Programming languages for security
- Compiler checks that programs are secure
- Information flow, privacy, cryptography, ...

# Adversarial machine learning

## Manipulating ML systems
- Crafting examples to fool ML systems
- Messing with training data
- Extracting training information

# Tedious course details

## Class format
- Three components:
    1. Paper presentations
    2. Final project
    3. Class participation
- Annoucement/schedule/materials: on [website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs839/)
- Class mailing list: [compsci839-1-f18@lists.wisc.edu]()

## Paper presentations
- Sign up to lead a discussion on one paper
- Suggested topic, papers, and schedule on website
- Before each presentation:
    - I will send out brief questions
    - Please email me brief answers

> If you want advice, come talk to me!

## Final project
- Work individually or in pairs
- Project details and suggestions on website
- Key dates:
    - **September 19**: Pick groups and topic
    - **October 15**: Milestone 1
    - **November 14**: Milestone 2
    - **End of class**: Final writeups and presentations

> If you want advice, come talk to me!

## Todos for you
0. Complete the course survey
1. Check out the course website
2. Think about what paper you want to present
3. Brainstorm project topics

# Defining privacy

## What does privacy mean?
- Many kinds of "privacy breaches"
    - Obvious: third party learns your private data
    - Retention: you give data, company keeps it forever
    - Passive: you don't know your data is collected

## Why is privacy hard?
- Hard to pin down what privacy means!
- Once data is out, can't put it back into the bottle
- Privacy-preserving data release today may violate privacy tomorrow, combined
  with "side-information"
- Data may be used many times, often doesn't change

## Hiding private data
- Delete "personally identifiable information"
    - Name and age
    - Birthday
    - Social security number
    - ...
- Publish the "anonymized" or "sanitized" data

## Problem: not enough
- Can match up anonymized data with public sources
- *De-anonymize* data, associate names to records
- Really, really hard to think about side information
    - May not even be public at time of data release!

## Netflix challenge
- Database of movie ratings
- Published: ID number, movie rating, and rating date
- Attack: from public IMDB ratings, recover names for Netflix data

## "Blending in a crowd"
- Only release records that are similar to others
- *k-anonymity*: require at least k identical records
- Other variants: *l-diversity*, *t-closeness*, ...

## Problem: composition
- Repeating k-anonymous releases may lose privacy
- Privacy protection may fall off a cliff
    - First few queries fine, then suddenly total violation
- Again, interacts poorly with side-information

## Differential privacy
- Proposed by Dwork, McSherry, Nissim, Smith (2006)

> A new approach to formulating privacy goals: the risk to one’s privacy, or in
> general, any type of risk... should not substantially increase as a result of
> participating in a statistical database.  This is captured by differential
> privacy.

## Basic setting
- Private data: set of records from individuals
    - Each individual: one record
    - Example: set of medical records
- Private query: function from database to output
    - Randomized: adds noise to protect privacy

## Basic definition
A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two
databases $db, db'$ that differ in **one individual's record**, and for every
subset $S$ of outputs, we have:

$$
\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
$$
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
+								---
 								author: Advanced Topics in Security and Privacy (CS 839)
 								title: Lecture 01
 								date: September 05, 2018
 								---
 								# Security and Privacy
 								## It's everywhere!
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								![](images/iot-cameras.png)
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
+								## Stuff is totally insecure!
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								![](images/broken.png)
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
+								## It's really difficult!
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								![](images/netflix.png)
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
+								# What topics to cover?
 								## A really, really vast field
 								- Things we will not be able to cover:
 								    - Real-world attacks
 								    - Computer systems security
 								    - Defenses and countermeasures
 								    - Social aspects of security
 								    - Theoretical cryptography
 								    - ...
 								## Theme 1: Formalizing S&P
 								- Mathematically formalize notions of security
 								- Rigorously prove security
 								- Guarantee that certain breakages can't occur
 								> Remember: definitions are tricky things!
 								## Theme 2: Automating S&P
 								- Use computers to help build more secure systems
 								- Automatically check security properties
 								- Search for attacks and vulnerabilities
 								## Our focus: four modules
 . Differential privacy
 . Applied cryptography
 . Language-based security
 . Adversarial machine learning
 								# Differential privacy
 								## A mathematically solid definition of privacy
 								- Simple and clean formal property
 								- Satisfied by many algorithms
 								- Degrades gracefully under composition
 								# Applied crypto
 								## Computing in an untrusted world
 								- Proving you know something without revealing it
 								- Certifying that you did a computation correctly
 								- Computing on encrypted data, without decryption
 								- Computing joint answer without revealing your data
 								# Language-based security
 								## Ensure security by construction
 								- Programming languages for security
 								- Compiler checks that programs are secure
 								- Information flow, privacy, cryptography, ...
 								# Adversarial machine learning
 								## Manipulating ML systems
 								- Crafting examples to fool ML systems
 								- Messing with training data
 								- Extracting training information
 								# Tedious course details
 								## Class format
 								- Three components:
 . Paper presentations
 . Final project
 . Class participation
 								- Annoucement/schedule/materials: on [website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs839/)
 								- Class mailing list: [compsci839-1-f18@lists.wisc.edu]()
 								## Paper presentations
 								- Sign up to lead a discussion on one paper
 								- Suggested topic, papers, and schedule on website
 								- Before each presentation:
 								    - I will send out brief questions
 								    - Please email me brief answers
 								> If you want advice, come talk to me!
 								## Final project
 								- Work individually or in pairs
 								- Project details and suggestions on website
 								- Key dates:
 								    - **September 19**: Pick groups and topic
 								    - **October 15**: Milestone 1
 								    - **November 14**: Milestone 2
 								    - **End of class**: Final writeups and presentations
 								> If you want advice, come talk to me!
 								## Todos for you
 . Complete the course survey
 . Check out the course website
 . Think about what paper you want to present
 . Brainstorm project topics
 								# Defining privacy
 								## What does privacy mean?
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								- Many kinds of "privacy breaches"
 								    - Obvious: third party learns your private data
 								    - Retention: you give data, company keeps it forever
 								    - Passive: you don't know your data is collected
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
 								## Why is privacy hard?
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								- Hard to pin down what privacy means!
 								- Once data is out, can't put it back into the bottle
 								- Privacy-preserving data release today may violate privacy tomorrow, combined
 								  with "side-information"
 								- Data may be used many times, often doesn't change
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
 								## Hiding private data
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								- Delete "personally identifiable information"
 								    - Name and age
 								    - Birthday
 								    - Social security number
 								    - ...
 								- Publish the "anonymized" or "sanitized" data
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
 								## Problem: not enough
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								- Can match up anonymized data with public sources
 								- *De-anonymize* data, associate names to records
 								- Really, really hard to think about side information
 								    - May not even be public at time of data release!
 								## Netflix challenge
 								- Database of movie ratings
 								- Published: ID number, movie rating, and rating date
 								- Attack: from public IMDB ratings, recover names for Netflix data
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
 								## "Blending in a crowd"
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								- Only release records that are similar to others
 								- *k-anonymity*: require at least k identical records
 								- Other variants: *l-diversity*, *t-closeness*, ...
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
 								## Problem: composition
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								- Repeating k-anonymous releases may lose privacy
 								- Privacy protection may fall off a cliff
 								    - First few queries fine, then suddenly total violation
 								- Again, interacts poorly with side-information
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
 								## Differential privacy
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								- Proposed by Dwork, McSherry, Nissim, Smith (2006)
 								> A new approach to formulating privacy goals: the risk to one’s privacy, or in
 								> general, any type of risk... should not substantially increase as a result of
 								> participating in a statistical database.  This is captured by differential
 								> privacy.
 								## Basic setting
 								- Private data: set of records from individuals
 								    - Each individual: one record
 								    - Example: set of medical records
 								- Private query: function from database to output
 								    - Randomized: adds noise to protect privacy
-												Checkpoint first lecture.

											
										
										
											2018-09-04 05:02:03 +00:00
 								## Basic definition
-												Complete first lecture.

											
										
										
											2018-09-05 03:31:32 +00:00
+								A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two
 								databases $db, db'$ that differ in **one individual's record**, and for every
 								subset $S$ of outputs, we have:
 								$$
 								\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
 								$$