4,863 research outputs found
Publishing Microdata with a Robust Privacy Guarantee
Today, the publication of microdata poses a privacy threat. Vast research has
striven to define the privacy condition that microdata should satisfy before it
is released, and devise algorithms to anonymize the data so as to achieve this
condition. Yet, no method proposed to date explicitly bounds the percentage of
information an adversary gains after seeing the published data for each
sensitive value therein. This paper introduces beta-likeness, an appropriately
robust privacy model for microdata anonymization, along with two anonymization
schemes designed therefor, the one based on generalization, and the other based
on perturbation. Our model postulates that an adversary's confidence on the
likelihood of a certain sensitive-attribute (SA) value should not increase, in
relative difference terms, by more than a predefined threshold. Our techniques
aim to satisfy a given beta threshold with little information loss. We
experimentally demonstrate that (i) our model provides an effective privacy
guarantee in a way that predecessor models cannot, (ii) our generalization
scheme is more effective and efficient in its task than methods adapting
algorithms for the k-anonymity model, and (iii) our perturbation method
outperforms a baseline approach. Moreover, we discuss in detail the resistance
of our model and methods to attacks proposed in previous research.Comment: VLDB201
GLOVE: towards privacy-preserving publishing of record-level-truthful mobile phone trajectories
Datasets of mobile phone trajectories collected by network operators offer an unprecedented opportunity to discover new knowledge from the activity of large populations of millions. However, publishing such trajectories also raises significant privacy concerns, as they contain personal data in the form of individual movement patterns. Privacy risks induce network operators to enforce restrictive confidential agreements in the rare occasions when they grant access to collected trajectories, whereas a less involved circulation of these data would fuel research and enable reproducibility in many disciplines. In this work, we contribute a building block toward the design of privacy-preserving datasets of mobile phone trajectories that are truthful at the record level. We present GLOVE, an algorithm that implements k-anonymity, hence solving the crucial unicity problem that affects this type of data while ensuring that the anonymized trajectories correspond to real-life users. GLOVE builds on original insights about the root causes behind the undesirable unicity of mobile phone trajectories, and leverages generalization and suppression to remove them. Proof-of-concept validations with large-scale real-world datasets demonstrate that the approach adopted by GLOVE allows preserving a substantial level of accuracy in the data, higher than that granted by previous methodologies.This work was supported by the Atracción de Talento Investigador program of the Comunidad de Madrid under Grant No. 2019-T1/TIC-16037 NetSense
Context-Aware Generative Adversarial Privacy
Preserving the utility of published datasets while simultaneously providing
provable privacy guarantees is a well-known challenge. On the one hand,
context-free privacy solutions, such as differential privacy, provide strong
privacy guarantees, but often lead to a significant reduction in utility. On
the other hand, context-aware privacy solutions, such as information theoretic
privacy, achieve an improved privacy-utility tradeoff, but assume that the data
holder has access to dataset statistics. We circumvent these limitations by
introducing a novel context-aware privacy framework called generative
adversarial privacy (GAP). GAP leverages recent advancements in generative
adversarial networks (GANs) to allow the data holder to learn privatization
schemes from the dataset itself. Under GAP, learning the privacy mechanism is
formulated as a constrained minimax game between two players: a privatizer that
sanitizes the dataset in a way that limits the risk of inference attacks on the
individuals' private variables, and an adversary that tries to infer the
private variables from the sanitized dataset. To evaluate GAP's performance, we
investigate two simple (yet canonical) statistical dataset models: (a) the
binary data model, and (b) the binary Gaussian mixture model. For both models,
we derive game-theoretically optimal minimax privacy mechanisms, and show that
the privacy mechanisms learned from data (in a generative adversarial fashion)
match the theoretically optimal ones. This demonstrates that our framework can
be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special
Issue on Information Theory in Machine Learning and Data Scienc
- …