2018-09-04 05:02:03 +00:00
|
|
|
|
---
|
2019-09-03 03:46:40 +00:00
|
|
|
|
author: Security and Privacy in Data Science (CS 763)
|
2018-11-21 06:46:50 +00:00
|
|
|
|
title: Course Welcome
|
2020-08-14 15:51:54 +00:00
|
|
|
|
date: September 02, 2020
|
2018-09-04 05:02:03 +00:00
|
|
|
|
---
|
|
|
|
|
|
2020-09-01 23:47:56 +00:00
|
|
|
|
# Welcome to Virtual CS 763!
|
|
|
|
|
|
|
|
|
|
## Norms for virtual class
|
|
|
|
|
- Mute yourself when you are not talking
|
|
|
|
|
- Recommended (not required): turn on your video
|
|
|
|
|
- Use the chat for questions/side discussions
|
|
|
|
|
|
2020-09-02 16:56:41 +00:00
|
|
|
|
> If you wouldn't do it in a real classroom, you probably shouldn't do it
|
|
|
|
|
> virtually.
|
2020-09-01 23:47:56 +00:00
|
|
|
|
|
|
|
|
|
## Guidelines for discussion
|
|
|
|
|
- Basically: **be nice to one another**
|
|
|
|
|
- WAIT: Why Am I Talking?
|
|
|
|
|
- One mic: one person speaks at a time
|
|
|
|
|
|
2020-09-02 16:56:41 +00:00
|
|
|
|
## Remote students
|
2020-09-01 23:47:56 +00:00
|
|
|
|
- Strongly recommended to attend live lectures
|
|
|
|
|
- If you can't (e.g., lecture in the middle of the night):
|
|
|
|
|
- All lectures will be recorded on BBCU: watch them
|
|
|
|
|
- Do **two paper reviews per week** instead of presentation+summary
|
|
|
|
|
|
|
|
|
|
> Let me know ASAP if you are remote so I can set you up with paper reviews
|
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
# Security and Privacy
|
|
|
|
|
|
|
|
|
|
## It's everywhere!
|
|
|
|
|
|
2018-09-05 03:31:32 +00:00
|
|
|
|
![](images/iot-cameras.png)
|
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
## Stuff is totally insecure!
|
|
|
|
|
|
2018-09-05 03:31:32 +00:00
|
|
|
|
![](images/broken.png)
|
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
# What topics to cover?
|
|
|
|
|
|
|
|
|
|
## A really, really vast field
|
|
|
|
|
- Things we will not be able to cover:
|
|
|
|
|
- Real-world attacks
|
|
|
|
|
- Computer systems security
|
|
|
|
|
- Defenses and countermeasures
|
|
|
|
|
- Social aspects of security
|
|
|
|
|
- Theoretical cryptography
|
|
|
|
|
- ...
|
|
|
|
|
|
|
|
|
|
## Theme 1: Formalizing S&P
|
|
|
|
|
- Mathematically formalize notions of security
|
|
|
|
|
- Rigorously prove security
|
|
|
|
|
- Guarantee that certain breakages can't occur
|
|
|
|
|
|
|
|
|
|
> Remember: definitions are tricky things!
|
|
|
|
|
|
|
|
|
|
## Theme 2: Automating S&P
|
|
|
|
|
- Use computers to help build more secure systems
|
|
|
|
|
- Automatically check security properties
|
|
|
|
|
- Search for attacks and vulnerabilities
|
|
|
|
|
|
2019-09-03 03:46:40 +00:00
|
|
|
|
## Five modules
|
2018-09-04 05:02:03 +00:00
|
|
|
|
1. Differential privacy
|
2019-09-03 03:46:40 +00:00
|
|
|
|
2. Adversarial machine learning
|
2020-09-01 23:41:36 +00:00
|
|
|
|
3. Cryptography in machine learning
|
2019-09-03 03:46:40 +00:00
|
|
|
|
4. Algorithmic fairness
|
|
|
|
|
5. PL and verification
|
|
|
|
|
|
|
|
|
|
## This course is broad!
|
|
|
|
|
- Each module could be its own course
|
|
|
|
|
- We won't be able to go super deep
|
|
|
|
|
- You will probably get lost
|
|
|
|
|
- Our goal: broad survey of multiple areas
|
|
|
|
|
- Lightning tour, focus on high points
|
|
|
|
|
|
|
|
|
|
> Hope: find a few things that interest you
|
|
|
|
|
|
|
|
|
|
## This course is technical!
|
|
|
|
|
- Approach each topic from a rigorous point of view
|
|
|
|
|
- Parts of "data science" with **provable guarantees**
|
|
|
|
|
- This is not a "theory course", but...
|
|
|
|
|
|
|
|
|
|
. . .
|
|
|
|
|
|
|
|
|
|
![](images/there-will-be-math.png)
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
# Differential privacy
|
|
|
|
|
|
2019-09-03 03:46:40 +00:00
|
|
|
|
##
|
|
|
|
|
|
|
|
|
|
![](images/privacy.png)
|
|
|
|
|
|
2020-09-02 16:56:41 +00:00
|
|
|
|
## A mathematical definition of privacy
|
2018-09-04 05:02:03 +00:00
|
|
|
|
- Simple and clean formal property
|
|
|
|
|
- Satisfied by many algorithms
|
|
|
|
|
- Degrades gracefully under composition
|
|
|
|
|
|
2019-09-03 03:46:40 +00:00
|
|
|
|
# Adversarial machine learning
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
2019-09-03 03:46:40 +00:00
|
|
|
|
##
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
2019-09-03 03:46:40 +00:00
|
|
|
|
![](images/aml.jpg)
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Manipulating ML systems
|
|
|
|
|
- Crafting examples to fool ML systems
|
|
|
|
|
- Messing with training data
|
|
|
|
|
- Extracting training information
|
|
|
|
|
|
2019-09-04 01:35:58 +00:00
|
|
|
|
# Cryptography in machine learning
|
2019-09-03 03:46:40 +00:00
|
|
|
|
|
|
|
|
|
##
|
|
|
|
|
|
|
|
|
|
![](images/crypto-ml.png)
|
|
|
|
|
|
|
|
|
|
## Crypto in data science
|
|
|
|
|
- Learning models without raw access to private data
|
|
|
|
|
- Collecting analytics data privately, at scale
|
|
|
|
|
- Side channels and implementation issues
|
|
|
|
|
- Verifiable execution of ML models
|
|
|
|
|
- Other topics (e.g., model watermarking)
|
|
|
|
|
|
|
|
|
|
# Algorithmic fairness
|
|
|
|
|
|
|
|
|
|
##
|
|
|
|
|
|
|
|
|
|
![](images/fairness.png)
|
|
|
|
|
|
|
|
|
|
## When is a program "fair"?
|
|
|
|
|
- Individual and group fairness
|
|
|
|
|
- Inherent tradeoffs and challenges
|
|
|
|
|
- Fairness in unsupervised learning
|
|
|
|
|
- Fairness and causal inference
|
|
|
|
|
|
|
|
|
|
# PL and verification
|
|
|
|
|
|
|
|
|
|
##
|
|
|
|
|
|
|
|
|
|
![](images/pl-verif.png)
|
|
|
|
|
|
|
|
|
|
## Proving correctness
|
|
|
|
|
- Programming languages for security and privacy
|
|
|
|
|
- Interpreting neural networks and ML models
|
|
|
|
|
- Verifying properties of neural networks
|
|
|
|
|
- Verifying probabilistic programs
|
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
# Tedious course details
|
|
|
|
|
|
2019-09-03 03:46:40 +00:00
|
|
|
|
## Lecture schedule
|
|
|
|
|
- First ten weeks: **lectures MWF**
|
|
|
|
|
- Intensive lectures, get you up to speed
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- I will present once a week
|
|
|
|
|
- You will present twice a week
|
2019-09-03 03:46:40 +00:00
|
|
|
|
- Last five weeks: **no lectures**
|
|
|
|
|
- Intensive work on projects
|
|
|
|
|
- I will be available to meet, one-on-one
|
|
|
|
|
|
2020-09-02 16:56:41 +00:00
|
|
|
|
> You should attend/watch **all** lectures
|
2019-09-03 03:46:40 +00:00
|
|
|
|
|
2018-09-04 05:02:03 +00:00
|
|
|
|
## Class format
|
|
|
|
|
- Three components:
|
|
|
|
|
1. Paper presentations
|
2019-09-03 03:46:40 +00:00
|
|
|
|
2. Presentation summaries
|
|
|
|
|
3. Final project
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- Announcement/schedule/materials on [website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/)
|
|
|
|
|
- Discussions/forming groups on [Piazza](https://piazza.com/class/ke3clkclul16hq)
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Paper presentations
|
2019-09-03 03:46:40 +00:00
|
|
|
|
- In pairs, lead a discussion on group of papers
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- See website for [detailed instructions](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/assignments/presentations/)
|
2019-09-03 03:46:40 +00:00
|
|
|
|
- See website for [schedule of topics](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/schedule/lectures/)
|
|
|
|
|
- One week **before** presentation: meet with me
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- Come prepared with draft slides and outline
|
2019-09-03 03:46:40 +00:00
|
|
|
|
- Run through your outline, I will give feedback
|
|
|
|
|
|
|
|
|
|
## Presentation summaries
|
|
|
|
|
- In pairs, prepare written summary of another group
|
|
|
|
|
- See website for [detailed instructions](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/assignments/summaries/)
|
|
|
|
|
- See website for [schedule of topics](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/schedule/lectures/)
|
|
|
|
|
- One week **after** presentation: send me summary
|
|
|
|
|
- I will work with you to polish report
|
|
|
|
|
- Writeups will be shared with the class
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Final project
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- In groups of 2-3
|
2019-09-03 03:46:40 +00:00
|
|
|
|
- See website for [project details](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/assignments/project/)
|
2018-09-04 05:02:03 +00:00
|
|
|
|
- Key dates:
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- **October 12**: Milestone 1
|
|
|
|
|
- **November 6**: Milestone 2
|
2018-09-04 05:02:03 +00:00
|
|
|
|
- **End of class**: Final writeups and presentations
|
|
|
|
|
|
|
|
|
|
## Todos for you
|
2020-09-01 23:41:36 +00:00
|
|
|
|
0. Complete the [course survey](https://forms.gle/NWAYMf6ZzV3bFKC46)
|
2019-09-03 03:46:40 +00:00
|
|
|
|
1. Explore the [course website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs763/)
|
2019-09-04 18:47:29 +00:00
|
|
|
|
2. Think about which lecture you want to present
|
2019-09-03 03:46:40 +00:00
|
|
|
|
3. Think about which lecture you want to summarize
|
2019-09-04 01:35:58 +00:00
|
|
|
|
4. Form project groups and brainstorm topics
|
|
|
|
|
|
2020-09-01 23:41:36 +00:00
|
|
|
|
> Sign up for slots and projects [here](https://docs.google.com/spreadsheets/d/1Qiq6RtBiHD6x7t-wPqAykvTDdbbBvZYSMZ9FrKUHKm4/edit?usp=sharing)
|
2019-09-04 01:35:58 +00:00
|
|
|
|
|
|
|
|
|
## We will move quickly
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- First deadline: **next Wednesday, September 9**
|
2019-09-04 01:35:58 +00:00
|
|
|
|
- Form paper and project groups
|
2020-09-01 23:41:36 +00:00
|
|
|
|
- Signup sheet [here](https://docs.google.com/spreadsheets/d/1Qiq6RtBiHD6x7t-wPqAykvTDdbbBvZYSMZ9FrKUHKm4/edit?usp=sharing)
|
|
|
|
|
- First slot is soon: **Monday, September 14**
|
2019-09-04 01:35:58 +00:00
|
|
|
|
- Only slot for presenting differential privacy
|
|
|
|
|
- I will help the first group prepare
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
# Defining privacy
|
|
|
|
|
|
|
|
|
|
## What does privacy mean?
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Many kinds of "privacy breaches"
|
|
|
|
|
- Obvious: third party learns your private data
|
|
|
|
|
- Retention: you give data, company keeps it forever
|
|
|
|
|
- Passive: you don't know your data is collected
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Why is privacy hard?
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Hard to pin down what privacy means!
|
|
|
|
|
- Once data is out, can't put it back into the bottle
|
|
|
|
|
- Privacy-preserving data release today may violate privacy tomorrow, combined
|
|
|
|
|
with "side-information"
|
|
|
|
|
- Data may be used many times, often doesn't change
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Hiding private data
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Delete "personally identifiable information"
|
|
|
|
|
- Name and age
|
|
|
|
|
- Birthday
|
|
|
|
|
- Social security number
|
|
|
|
|
- ...
|
|
|
|
|
- Publish the "anonymized" or "sanitized" data
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Problem: not enough
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Can match up anonymized data with public sources
|
|
|
|
|
- *De-anonymize* data, associate names to records
|
|
|
|
|
- Really, really hard to think about side information
|
|
|
|
|
- May not even be public at time of data release!
|
|
|
|
|
|
2019-09-04 18:47:29 +00:00
|
|
|
|
## Netflix prize
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Database of movie ratings
|
|
|
|
|
- Published: ID number, movie rating, and rating date
|
2019-09-04 18:47:29 +00:00
|
|
|
|
- Competition: predict which movies IDs will like
|
|
|
|
|
- Result
|
|
|
|
|
- Tons of teams competed
|
|
|
|
|
- Winner: beat Netflix's best by **10%**
|
|
|
|
|
|
|
|
|
|
> A triumph for machine learning contests!
|
|
|
|
|
|
|
|
|
|
##
|
|
|
|
|
|
|
|
|
|
![](images/netflix.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Privacy flaw?
|
|
|
|
|
- Attack
|
|
|
|
|
- Public info on IMDB: names, ratings, dates
|
|
|
|
|
- Reconstruct names for Netflix IDs
|
|
|
|
|
- Result
|
|
|
|
|
- Netflix settled lawsuit ($10 million)
|
|
|
|
|
- Netflix canceled future challenges
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## "Blending in a crowd"
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Only release records that are similar to others
|
|
|
|
|
- *k-anonymity*: require at least k identical records
|
|
|
|
|
- Other variants: *l-diversity*, *t-closeness*, ...
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Problem: composition
|
2018-09-05 03:31:32 +00:00
|
|
|
|
- Repeating k-anonymous releases may lose privacy
|
|
|
|
|
- Privacy protection may fall off a cliff
|
|
|
|
|
- First few queries fine, then suddenly total violation
|
|
|
|
|
- Again, interacts poorly with side-information
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
2019-09-04 18:47:29 +00:00
|
|
|
|
# Differential privacy
|
|
|
|
|
|
|
|
|
|
## Yet another privacy definition
|
2018-09-05 03:31:32 +00:00
|
|
|
|
|
|
|
|
|
> A new approach to formulating privacy goals: the risk to one’s privacy, or in
|
|
|
|
|
> general, any type of risk... should not substantially increase as a result of
|
|
|
|
|
> participating in a statistical database. This is captured by differential
|
|
|
|
|
> privacy.
|
|
|
|
|
|
2019-09-04 18:47:29 +00:00
|
|
|
|
- Proposed by Dwork, McSherry, Nissim, Smith (2006)
|
|
|
|
|
|
2018-09-05 03:31:32 +00:00
|
|
|
|
## Basic setting
|
|
|
|
|
- Private data: set of records from individuals
|
|
|
|
|
- Each individual: one record
|
|
|
|
|
- Example: set of medical records
|
|
|
|
|
- Private query: function from database to output
|
|
|
|
|
- Randomized: adds noise to protect privacy
|
2018-09-04 05:02:03 +00:00
|
|
|
|
|
|
|
|
|
## Basic definition
|
2018-09-05 03:31:32 +00:00
|
|
|
|
A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two
|
|
|
|
|
databases $db, db'$ that differ in **one individual's record**, and for every
|
|
|
|
|
subset $S$ of outputs, we have:
|
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta
|
|
|
|
|
$$
|
2019-09-04 18:47:29 +00:00
|
|
|
|
|
|
|
|
|
## Basic reading
|
|
|
|
|
> Output of program doesn't depend too much on any single person's data
|
|
|
|
|
|
|
|
|
|
- Property of the algorithm/query/program
|
|
|
|
|
- No: "this data is differentially private"
|
|
|
|
|
- Yes: "this query is differentially private"
|