7,553 research outputs found
Security Evaluation of Support Vector Machines in Adversarial Environments
Support Vector Machines (SVMs) are among the most popular classification
techniques adopted in security applications like malware detection, intrusion
detection, and spam filtering. However, if SVMs are to be incorporated in
real-world security systems, they must be able to cope with attack patterns
that can either mislead the learning algorithm (poisoning), evade detection
(evasion), or gain information about their internal parameters (privacy
breaches). The main contributions of this chapter are twofold. First, we
introduce a formal general framework for the empirical evaluation of the
security of machine-learning systems. Second, according to our framework, we
demonstrate the feasibility of evasion, poisoning and privacy attacks against
SVMs in real-world security problems. For each attack technique, we evaluate
its impact and discuss whether (and how) it can be countered through an
adversary-aware design of SVMs. Our experiments are easily reproducible thanks
to open-source code that we have made available, together with all the employed
datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector
Machine Applications
Rethinking Location Privacy for Unknown Mobility Behaviors
Location Privacy-Preserving Mechanisms (LPPMs) in the literature largely
consider that users' data available for training wholly characterizes their
mobility patterns. Thus, they hardwire this information in their designs and
evaluate their privacy properties with these same data. In this paper, we aim
to understand the impact of this decision on the level of privacy these LPPMs
may offer in real life when the users' mobility data may be different from the
data used in the design phase. Our results show that, in many cases, training
data does not capture users' behavior accurately and, thus, the level of
privacy provided by the LPPM is often overestimated. To address this gap
between theory and practice, we propose to use blank-slate models for LPPM
design. Contrary to the hardwired approach, that assumes known users' behavior,
blank-slate models learn the users' behavior from the queries to the service
provider. We leverage this blank-slate approach to develop a new family of
LPPMs, that we call Profile Estimation-Based LPPMs. Using real data, we
empirically show that our proposal outperforms optimal state-of-the-art
mechanisms designed on sporadic hardwired models. On non-sporadic location
privacy scenarios, our method is only better if the usage of the location
privacy service is not continuous. It is our hope that eliminating the need to
bootstrap the mechanisms with training data and ensuring that the mechanisms
are lightweight and easy to compute help fostering the integration of location
privacy protections in deployed systems
Context-Aware Generative Adversarial Privacy
Preserving the utility of published datasets while simultaneously providing
provable privacy guarantees is a well-known challenge. On the one hand,
context-free privacy solutions, such as differential privacy, provide strong
privacy guarantees, but often lead to a significant reduction in utility. On
the other hand, context-aware privacy solutions, such as information theoretic
privacy, achieve an improved privacy-utility tradeoff, but assume that the data
holder has access to dataset statistics. We circumvent these limitations by
introducing a novel context-aware privacy framework called generative
adversarial privacy (GAP). GAP leverages recent advancements in generative
adversarial networks (GANs) to allow the data holder to learn privatization
schemes from the dataset itself. Under GAP, learning the privacy mechanism is
formulated as a constrained minimax game between two players: a privatizer that
sanitizes the dataset in a way that limits the risk of inference attacks on the
individuals' private variables, and an adversary that tries to infer the
private variables from the sanitized dataset. To evaluate GAP's performance, we
investigate two simple (yet canonical) statistical dataset models: (a) the
binary data model, and (b) the binary Gaussian mixture model. For both models,
we derive game-theoretically optimal minimax privacy mechanisms, and show that
the privacy mechanisms learned from data (in a generative adversarial fashion)
match the theoretically optimal ones. This demonstrates that our framework can
be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special
Issue on Information Theory in Machine Learning and Data Scienc
On the `Semantics' of Differential Privacy: A Bayesian Formulation
Differential privacy is a definition of "privacy'" for algorithms that
analyze and publish information about statistical databases. It is often
claimed that differential privacy provides guarantees against adversaries with
arbitrary side information. In this paper, we provide a precise formulation of
these guarantees in terms of the inferences drawn by a Bayesian adversary. We
show that this formulation is satisfied by both "vanilla" differential privacy
as well as a relaxation known as (epsilon,delta)-differential privacy. Our
formulation follows the ideas originally due to Dwork and McSherry [Dwork
2006]. This paper is, to our knowledge, the first place such a formulation
appears explicitly. The analysis of the relaxed definition is new to this
paper, and provides some concrete guidance for setting parameters when using
(epsilon,delta)-differential privacy.Comment: Older version of this paper was titled: "A Note on Differential
Privacy: Defining Resistance to Arbitrary Side Information
Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective
Rapid advances in human genomics are enabling researchers to gain a better
understanding of the role of the genome in our health and well-being,
stimulating hope for more effective and cost efficient healthcare. However,
this also prompts a number of security and privacy concerns stemming from the
distinctive characteristics of genomic data. To address them, a new research
community has emerged and produced a large number of publications and
initiatives.
In this paper, we rely on a structured methodology to contextualize and
provide a critical analysis of the current knowledge on privacy-enhancing
technologies used for testing, storing, and sharing genomic data, using a
representative sample of the work published in the past decade. We identify and
discuss limitations, technical challenges, and issues faced by the community,
focusing in particular on those that are inherently tied to the nature of the
problem and are harder for the community alone to address. Finally, we report
on the importance and difficulty of the identified challenges based on an
online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies
(PoPETs), Vol. 2019, Issue
- …