18,011 research outputs found
From the Information Bottleneck to the Privacy Funnel
We focus on the privacy-utility trade-off encountered by users who wish to
disclose some information to an analyst, that is correlated with their private
data, in the hope of receiving some utility. We rely on a general privacy
statistical inference framework, under which data is transformed before it is
disclosed, according to a probabilistic privacy mapping. We show that when the
log-loss is introduced in this framework in both the privacy metric and the
distortion metric, the privacy leakage and the utility constraint can be
reduced to the mutual information between private data and disclosed data, and
between non-private data and disclosed data respectively. We justify the
relevance and generality of the privacy metric under the log-loss by proving
that the inference threat under any bounded cost function can be upper-bounded
by an explicit function of the mutual information between private data and
disclosed data. We then show that the privacy-utility tradeoff under the
log-loss can be cast as the non-convex Privacy Funnel optimization, and we
leverage its connection to the Information Bottleneck, to provide a greedy
algorithm that is locally optimal. We evaluate its performance on the US census
dataset
Privacy Against Statistical Inference
We propose a general statistical inference framework to capture the privacy
threat incurred by a user that releases data to a passive but curious
adversary, given utility constraints. We show that applying this general
framework to the setting where the adversary uses the self-information cost
function naturally leads to a non-asymptotic information-theoretic approach for
characterizing the best achievable privacy subject to utility constraints.
Based on these results we introduce two privacy metrics, namely average
information leakage and maximum information leakage. We prove that under both
metrics the resulting design problem of finding the optimal mapping from the
user's data to a privacy-preserving output can be cast as a modified
rate-distortion problem which, in turn, can be formulated as a convex program.
Finally, we compare our framework with differential privacy.Comment: Allerton 2012, 8 page
Distributed Hypothesis Testing with Privacy Constraints
We revisit the distributed hypothesis testing (or hypothesis testing with
communication constraints) problem from the viewpoint of privacy. Instead of
observing the raw data directly, the transmitter observes a sanitized or
randomized version of it. We impose an upper bound on the mutual information
between the raw and randomized data. Under this scenario, the receiver, which
is also provided with side information, is required to make a decision on
whether the null or alternative hypothesis is in effect. We first provide a
general lower bound on the type-II exponent for an arbitrary pair of
hypotheses. Next, we show that if the distribution under the alternative
hypothesis is the product of the marginals of the distribution under the null
(i.e., testing against independence), then the exponent is known exactly.
Moreover, we show that the strong converse property holds. Using ideas from
Euclidean information theory, we also provide an approximate expression for the
exponent when the communication rate is low and the privacy level is high.
Finally, we illustrate our results with a binary and a Gaussian example
- …