--- author: Topics in Security and Privacy Technologies (CS 839) title: Lecture 01 date: September 05, 2018 --- # Security and Privacy ## It's everywhere! ![](images/iot-cameras.png) ## Stuff is totally insecure! ![](images/broken.png) ## It's really difficult! ![](images/netflix.png) # What topics to cover? ## A really, really vast field - Things we will not be able to cover: - Real-world attacks - Computer systems security - Defenses and countermeasures - Social aspects of security - Theoretical cryptography - ... ## Theme 1: Formalizing S&P - Mathematically formalize notions of security - Rigorously prove security - Guarantee that certain breakages can't occur > Remember: definitions are tricky things! ## Theme 2: Automating S&P - Use computers to help build more secure systems - Automatically check security properties - Search for attacks and vulnerabilities ## Our focus: four modules 1. Differential privacy 2. Applied cryptography 3. Language-based security 4. Adversarial machine learning # Differential privacy ## A mathematically solid definition of privacy - Simple and clean formal property - Satisfied by many algorithms - Degrades gracefully under composition # Applied crypto ## Computing in an untrusted world - Proving you know something without revealing it - Certifying that you did a computation correctly - Computing on encrypted data, without decryption - Computing joint answer without revealing your data # Language-based security ## Ensure security by construction - Programming languages for security - Compiler checks that programs are secure - Information flow, privacy, cryptography, ... # Adversarial machine learning ## Manipulating ML systems - Crafting examples to fool ML systems - Messing with training data - Extracting training information # Tedious course details ## Class format - Three components: 1. Paper presentations 2. Final project 3. Class participation - Annoucement/schedule/materials: on [website](https://pages.cs.wisc.edu/~justhsu/teaching/current/cs839/) - Class mailing list: [compsci839-1-f18@lists.wisc.edu]() ## Paper presentations - Sign up to lead a discussion on one paper - Suggested topic, papers, and schedule on website - Before each presentation: - I will send out brief questions - Please email me brief answers > If you want advice, come talk to me! ## Final project - Work individually or in pairs - Project details and suggestions on website - Key dates: - **September 19**: Pick groups and topic - **October 15**: Milestone 1 - **November 14**: Milestone 2 - **End of class**: Final writeups and presentations > If you want advice, come talk to me! ## Todos for you 0. Complete the course survey 1. Check out the course website 2. Think about what paper you want to present 3. Brainstorm project topics # Defining privacy ## What does privacy mean? - Many kinds of "privacy breaches" - Obvious: third party learns your private data - Retention: you give data, company keeps it forever - Passive: you don't know your data is collected ## Why is privacy hard? - Hard to pin down what privacy means! - Once data is out, can't put it back into the bottle - Privacy-preserving data release today may violate privacy tomorrow, combined with "side-information" - Data may be used many times, often doesn't change ## Hiding private data - Delete "personally identifiable information" - Name and age - Birthday - Social security number - ... - Publish the "anonymized" or "sanitized" data ## Problem: not enough - Can match up anonymized data with public sources - *De-anonymize* data, associate names to records - Really, really hard to think about side information - May not even be public at time of data release! ## Netflix challenge - Database of movie ratings - Published: ID number, movie rating, and rating date - Attack: from public IMDB ratings, recover names for Netflix data ## "Blending in a crowd" - Only release records that are similar to others - *k-anonymity*: require at least k identical records - Other variants: *l-diversity*, *t-closeness*, ... ## Problem: composition - Repeating k-anonymous releases may lose privacy - Privacy protection may fall off a cliff - First few queries fine, then suddenly total violation - Again, interacts poorly with side-information ## Differential privacy - Proposed by Dwork, McSherry, Nissim, Smith (2006) > A new approach to formulating privacy goals: the risk to one’s privacy, or in > general, any type of risk... should not substantially increase as a result of > participating in a statistical database. This is captured by differential > privacy. ## Basic setting - Private data: set of records from individuals - Each individual: one record - Example: set of medical records - Private query: function from database to output - Randomized: adds noise to protect privacy ## Basic definition A query $Q$ is **$(\varepsilon, \delta)$-differentially private** if for every two databases $db, db'$ that differ in **one individual's record**, and for every subset $S$ of outputs, we have: $$ \Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta $$