24,831 research outputs found
From the Information Bottleneck to the Privacy Funnel
We focus on the privacy-utility trade-off encountered by users who wish to
disclose some information to an analyst, that is correlated with their private
data, in the hope of receiving some utility. We rely on a general privacy
statistical inference framework, under which data is transformed before it is
disclosed, according to a probabilistic privacy mapping. We show that when the
log-loss is introduced in this framework in both the privacy metric and the
distortion metric, the privacy leakage and the utility constraint can be
reduced to the mutual information between private data and disclosed data, and
between non-private data and disclosed data respectively. We justify the
relevance and generality of the privacy metric under the log-loss by proving
that the inference threat under any bounded cost function can be upper-bounded
by an explicit function of the mutual information between private data and
disclosed data. We then show that the privacy-utility tradeoff under the
log-loss can be cast as the non-convex Privacy Funnel optimization, and we
leverage its connection to the Information Bottleneck, to provide a greedy
algorithm that is locally optimal. We evaluate its performance on the US census
dataset
Privacy Against Statistical Inference
We propose a general statistical inference framework to capture the privacy
threat incurred by a user that releases data to a passive but curious
adversary, given utility constraints. We show that applying this general
framework to the setting where the adversary uses the self-information cost
function naturally leads to a non-asymptotic information-theoretic approach for
characterizing the best achievable privacy subject to utility constraints.
Based on these results we introduce two privacy metrics, namely average
information leakage and maximum information leakage. We prove that under both
metrics the resulting design problem of finding the optimal mapping from the
user's data to a privacy-preserving output can be cast as a modified
rate-distortion problem which, in turn, can be formulated as a convex program.
Finally, we compare our framework with differential privacy.Comment: Allerton 2012, 8 page
Privacy Tradeoffs in Predictive Analytics
Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
Context-Aware Generative Adversarial Privacy
Preserving the utility of published datasets while simultaneously providing
provable privacy guarantees is a well-known challenge. On the one hand,
context-free privacy solutions, such as differential privacy, provide strong
privacy guarantees, but often lead to a significant reduction in utility. On
the other hand, context-aware privacy solutions, such as information theoretic
privacy, achieve an improved privacy-utility tradeoff, but assume that the data
holder has access to dataset statistics. We circumvent these limitations by
introducing a novel context-aware privacy framework called generative
adversarial privacy (GAP). GAP leverages recent advancements in generative
adversarial networks (GANs) to allow the data holder to learn privatization
schemes from the dataset itself. Under GAP, learning the privacy mechanism is
formulated as a constrained minimax game between two players: a privatizer that
sanitizes the dataset in a way that limits the risk of inference attacks on the
individuals' private variables, and an adversary that tries to infer the
private variables from the sanitized dataset. To evaluate GAP's performance, we
investigate two simple (yet canonical) statistical dataset models: (a) the
binary data model, and (b) the binary Gaussian mixture model. For both models,
we derive game-theoretically optimal minimax privacy mechanisms, and show that
the privacy mechanisms learned from data (in a generative adversarial fashion)
match the theoretically optimal ones. This demonstrates that our framework can
be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special
Issue on Information Theory in Machine Learning and Data Scienc
- …