Search CORE

2,391 research outputs found

Recommended from our members

Security, Privacy, and Transparency Guarantees for Machine Learning Systems

Author: Lecuyer Mathias
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Machine learning (ML) is transforming a wide range of applications, promising to bring immense economic and social benefits. However, it also raises substantial security, privacy, and transparency challenges. ML workloads indeed push companies toward aggressive data collection and loose data access policies, placing troves of sensitive user information at risk if the company is hacked. ML also introduces new attack vectors, such as adversarial example attacks, which can completely nullify models’ accuracy under attack. Finally, ML models make complex data-driven decisions, which are opaque to the end-users, and difficult to inspect for programmers. In this dissertation we describe three systems we developed. Each system addresses a dimension of the previous challenges, by combining new practical systems techniques with rigorous theory to achieve a guaranteed level of protection, and make systems easier to understand. First we present Sage, a differentially private ML platform that enforces a meaningful protection semantic for the troves of personal information amassed by today’s companies. Second we describe PixelDP, a defense against adversarial examples that leverages differential privacy theory to provide a guaranteed level of accuracy under attack. Third we introduce Sunlight, a tool to enhance the transparency of opaque targeting services, using rigorous causal inference theory to explain targeting decisions to end-users

Columbia University Academic Commons

Security and Privacy for Big Data: A Systematic Literature Review

Author: Nelson Boel
Olovsson Tomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

Chalmers Research

Recommended from our members

New Data Protection Abstractions for Emerging Mobile and Big Data Workloads

Author: Spahn Riley Burns
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Two recent shifts in computing are challenging the effectiveness of traditional approaches to data protection. Emerging machine learning workloads have complex access patterns and unique leakage characteristics that are not well supported by existing protection approaches. Second, mobile operating systems do not provide sufficient support for fine grained data protection tools forcing users to rely on individual applications to correctly manage and protect data. My thesis is that these emerging workloads have unique characteristics that we can leverage to build new, more effective data protection abstractions. This dissertation presents two new data protection systems for machine learning work-loads and a new system for fine grained data management and protection on mobile devices. First is Sage, a differentially private machine learning platform addressing the two primary challenges of differential privacy: running out of budget and the privacy utility tradeoff. The second system, Pyramid, is the first selective data system. Pyramid leverages count featurization to reduce the amount of data exposed while training classification models by two orders of magnitude. The final system, Pebbles, provides users with logical data objects as a new fine grained data management and protection primitive allowing data management at a higher level of abstraction. Pebbles, leverages high level storage abstractions in mobile operating systems to discover user recognizable application level data objects in unmodified mobile applications

Columbia University Academic Commons

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Author: Inan Huseyin A.
Kumar Girish
Levitan David
Li Xuechen
McAnallen Julia
Shajari Hoda
Sim Robert
Sun Huan
Yue Xiang
Publication venue
Publication date: 18/07/2023
Field of study

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee, such as differential privacy (DP), provides a promising path to mitigating these privacy concerns, but previous approaches in this direction have typically failed to produce synthetic data of high quality. In this work, we show that a simple and practical recipe in the text domain is effective: simply fine-tuning a pretrained generative language model with DP enables the model to generate useful synthetic text with strong privacy protection. Through extensive empirical analyses on both benchmark and private customer data, we demonstrate that our method produces synthetic text that is competitive in terms of utility with its non-private counterpart, meanwhile providing strong protection against potential privacy leakages.Comment: ACL 2023 Main Conference (Honorable Mention

arXiv.org e-Print Archive