Search CORE

24,807 research outputs found

Can Two Walk Together: Privacy Enhancing Methods and Preventing Tracking of Users

Author: Naor Moni
Vexler Neil
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 1st Symposium on Foundations of Responsible Computing (FORC 2020)
Publication date: 01/01/2020
Field of study

We present a new concern when collecting data from individuals that arises from the attempt to mitigate privacy leakage in multiple reporting: tracking of users participating in the data collection via the mechanisms added to provide privacy. We present several definitions for untrackable mechanisms, inspired by the differential privacy framework. Specifically, we define the trackable parameter as the log of the maximum ratio between the probability that a set of reports originated from a single user and the probability that the same set of reports originated from two users (with the same private value). We explore the implications of this new definition. We show how differentially private and untrackable mechanisms can be combined to achieve a bound for the problem of detecting when a certain user changed their private value. Examining Google's deployed solution for everlasting privacy, we show that RAPPOR (Erlingsson et al. ACM CCS, 2014) is trackable in our framework for the parameters presented in their paper. We analyze a variant of randomized response for collecting statistics of single bits, Bitwise Everlasting Privacy, that achieves good accuracy and everlasting privacy, while only being reasonably untrackable, specifically grows linearly in the number of reports. For collecting statistics about data from larger domains (for histograms and heavy hitters) we present a mechanism that prevents tracking for a limited number of responses. We also present the concept of Mechanism Chaining, using the output of one mechanism as the input of another, in the scope of Differential Privacy, and show that the chaining of an

\varepsilon_1

-LDP mechanism with an

\varepsilon_2

-LDP mechanism is

\ln\frac{e^{\varepsilon_1+\varepsilon_2}+1}{e^{\varepsilon_1}+e^{\varepsilon_2}}

-LDP and that this bound is tight.Comment: 45 pages, 4 figures. To appear on FORC 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Recommended from our members

Privacy-Aware Data Analysis: Recent Developments for Statistics and Machine Learning

Author: Lut Yuliia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2022
Field of study

Due to technological development, personal data has become more available to collect, store and analyze. Companies can collect detailed browsing behavior data, health-related data from smartphones and smartwatches, voice and movement recordings from smart home devices. Analysis of such data can bring numerous advantages to society and further development of science and technology. However, given an often sensitive nature of the collected data, people have become increasingly concerned about the data they share and how they interact with new technology. These concerns have motivated companies and public institutions to provide services and products with privacy guarantees. Therefore, many institutions and research communities have adopted the notion of differential privacy to address privacy concerns which has emerged as a powerful technique for enabling data analysis while preventing information leakage about individuals. In simple words, differential privacy allows us to use and analyze sensitive data while maintaining privacy guarantees for every individual data point. As a result, numerous algorithmic private tools have been developed for various applications. However, multiple open questions and research areas remain to be explored around differential privacy in machine learning, statistics, and data analysis, which the existing literature has not covered. In Chapter 1, we provide a brief discussion of the problems and the main contributions that are presented in this thesis. Additionally, we briefly recap the notion of differential privacy with some useful results and algorithms. In Chapter 2, we study the problem of differentially private change-point detection for unknown distributions. The change-point detection problem seeks to identify distributional changes in streams of data. Non-private tools for change-point detection have been widely applied in several settings. However, in certain applications, such as identifying disease outbreaks based on hospital records or IoT devices detecting home activity, the collected data is highly sensitive, which motivates the study of privacy-preserving tools. Much of the prior work on change-point detection---including the only private algorithms for this problem---requires complete knowledge of the pre-change and post-change distributions. However, this assumption is not realistic for many practical applications of interest. In this chapter, we present differentially private algorithms for solving the change-point problem when the data distributions are unknown to the analyst. Additionally, we study the case when data may be sampled from distributions that change smoothly over time rather than fixed pre-change and post-change distributions. Furthermore, our algorithms can be applied to detect changes in linear trends of such data streams. Finally, we also provide a computational study to empirically validate the performance of our algorithms. In Chapter 3, we study the problem of learning from imbalanced datasets, in which the classes are not equally represented, through the lens of differential privacy. A widely used method to address imbalanced data is resampling from the minority class instances. However, when confidential or sensitive attributes are present, data replication can lead to privacy leakage, disproportionally affecting the minority class. This challenge motivates the study of privacy-preserving pre-processing techniques for imbalanced learning. In this work, we present a differentially private synthetic minority oversampling technique (DP-SMOTE) which is based on a widely used non-private oversampling method known as SMOTE. Our algorithm generates differentially private synthetic data from the minority class. We demonstrate the impact of our pre-processing technique on the performance and privacy leakage of various classification methods in a detailed computational study. In Chapter 4, we focus on the analysis of sensitive data that is generated from online internet activity. Accurately analyzing and modeling online browsing behavior play a key role in understanding users and technology interactions. Towards this goal, in this chapter, we present an up-to-date measurement study of online browsing behavior. We study both self-reported and observational browsing data and analyze what underlying features can be learned from statistical analysis of this potentially sensitive data. For this, we empirically address the following questions: (1) Do structural patterns of browsing differ across demographic groups and types of web use?, (2) Do people have correct perceptions of their behavior online?, and (3) Do people change their browsing behavior if they are aware of being observed? In response to these questions, we found little difference across most demographic groups and website categories, suggesting that these features cannot be implied solely from clickstream data. We find that users significantly overestimate the time they spend online but have relatively accurate perceptions of how they spend their time online. We find no significant changes in behavior throughout the study, which may indicate that observation had no effect on behavior or that users were consciously aware of being observed throughout the study

Columbia University Academic Commons

Security Evaluation of Support Vector Machines in Adversarial Environments

Author: A. Barth
A. Christmann
A.A. Cárdenas
B. Biggio
B. Biggio
B. Biggio
B. Nelson
B. Schölkopf
B.I.P. Rubinstein
C. Cortes
G. Cauwenberghs
H. Drucker
H. Lütkepohl
K. Chaudhuri
L. Breiman
L. Sweeney
M. Barreno
M. Brückner
M. Kloft
P. Laskov
R. Young
R.N. Rodrigues
R.O. Duda
V.N. Vapnik
Publication venue
Publication date: 01/01/2014
Field of study

Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machine-learning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in real-world security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversary-aware design of SVMs. Our experiments are easily reproducible thanks to open-source code that we have made available, together with all the employed datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector Machine Applications

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova