25 research outputs found
Information Extraction Under Privacy Constraints
A privacy-constrained information extraction problem is considered where for
a pair of correlated discrete random variables governed by a given
joint distribution, an agent observes and wants to convey to a potentially
public user as much information about as possible without compromising the
amount of information revealed about . To this end, the so-called {\em
rate-privacy function} is introduced to quantify the maximal amount of
information (measured in terms of mutual information) that can be extracted
from under a privacy constraint between and the extracted information,
where privacy is measured using either mutual information or maximal
correlation. Properties of the rate-privacy function are analyzed and
information-theoretic and estimation-theoretic interpretations of it are
presented for both the mutual information and maximal correlation privacy
measures. It is also shown that the rate-privacy function admits a closed-form
expression for a large family of joint distributions of . Finally, the
rate-privacy function under the mutual information privacy measure is
considered for the case where has a joint probability density function
by studying the problem where the extracted information is a uniform
quantization of corrupted by additive Gaussian noise. The asymptotic
behavior of the rate-privacy function is studied as the quantization resolution
grows without bound and it is observed that not all of the properties of the
rate-privacy function carry over from the discrete to the continuous case.Comment: 55 pages, 6 figures. Improved the organization and added detailed
literature revie
Privacy-Aware Guessing Efficiency
We investigate the problem of guessing a discrete random variable under a
privacy constraint dictated by another correlated discrete random variable ,
where both guessing efficiency and privacy are assessed in terms of the
probability of correct guessing. We define as the maximum
probability of correctly guessing given an auxiliary random variable ,
where the maximization is taken over all ensuring that the
probability of correctly guessing given does not exceed . We
show that the map is strictly increasing,
concave, and piecewise linear, which allows us to derive a closed form
expression for when and are connected via a
binary-input binary-output channel. For being pairs of independent
and identically distributed binary random vectors, we similarly define
under the assumption that is also
a binary vector. Then we obtain a closed form expression for
for sufficiently large, but nontrivial
values of .Comment: ISIT 201
Privacy-Aware MMSE Estimation
We investigate the problem of the predictability of random variable under
a privacy constraint dictated by random variable , correlated with ,
where both predictability and privacy are assessed in terms of the minimum
mean-squared error (MMSE). Given that and are connected via a
binary-input symmetric-output (BISO) channel, we derive the \emph{optimal}
random mapping such that the MMSE of given is minimized while
the MMSE of given is greater than for a
given . We also consider the case where are continuous
and is restricted to be an additive noise channel.Comment: 9 pages, 3 figure
Almost Perfect Privacy for Additive Gaussian Privacy Filters
We study the maximal mutual information about a random variable
(representing non-private information) displayed through an additive Gaussian
channel when guaranteeing that only bits of information is leaked
about a random variable (representing private information) that is
correlated with . Denoting this quantity by , we show that
for perfect privacy, i.e., , one has for any pair of
absolutely continuous random variables and then derive a second-order
approximation for for small . This approximation is
shown to be related to the strong data processing inequality for mutual
information under suitable conditions on the joint distribution . Next,
motivated by an operational interpretation of data privacy, we formulate the
privacy-utility tradeoff in the same setup using estimation-theoretic
quantities and obtain explicit bounds for this tradeoff when is
sufficiently small using the approximation formula derived for
.Comment: 20 pages. To appear in Springer-Verla
Context-Aware Generative Adversarial Privacy
Preserving the utility of published datasets while simultaneously providing
provable privacy guarantees is a well-known challenge. On the one hand,
context-free privacy solutions, such as differential privacy, provide strong
privacy guarantees, but often lead to a significant reduction in utility. On
the other hand, context-aware privacy solutions, such as information theoretic
privacy, achieve an improved privacy-utility tradeoff, but assume that the data
holder has access to dataset statistics. We circumvent these limitations by
introducing a novel context-aware privacy framework called generative
adversarial privacy (GAP). GAP leverages recent advancements in generative
adversarial networks (GANs) to allow the data holder to learn privatization
schemes from the dataset itself. Under GAP, learning the privacy mechanism is
formulated as a constrained minimax game between two players: a privatizer that
sanitizes the dataset in a way that limits the risk of inference attacks on the
individuals' private variables, and an adversary that tries to infer the
private variables from the sanitized dataset. To evaluate GAP's performance, we
investigate two simple (yet canonical) statistical dataset models: (a) the
binary data model, and (b) the binary Gaussian mixture model. For both models,
we derive game-theoretically optimal minimax privacy mechanisms, and show that
the privacy mechanisms learned from data (in a generative adversarial fashion)
match the theoretically optimal ones. This demonstrates that our framework can
be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special
Issue on Information Theory in Machine Learning and Data Scienc