7,849 research outputs found
Quantitative Information Flow and Applications to Differential Privacy
International audienceSecure information flow is the problem of ensuring that the information made publicly available by a computational system does not leak information that should be kept secret. Since it is practically impossible to avoid leakage entirely, in recent years there has been a growing interest in considering the quantitative aspects of information flow, in order to measure and compare the amount of leakage. Information theory is widely regarded as a natural framework to provide firm foundations to quantitative information flow. In this notes we review the two main information-theoretic approaches that have been investigated: the one based on Shannon entropy, and the one based on Rényi min-entropy. Furthermore, we discuss some applications in the area of privacy. In particular, we consider statistical databases and the recently-proposed notion of differential privacy. Using the information-theoretic view, we discuss the bound that differential privacy induces on leakage, and the trade-off between utility and privac
On the relation between Differential Privacy and Quantitative Information Flow
Differential privacy is a notion that has emerged in the community of
statistical databases, as a response to the problem of protecting the privacy
of the database's participants when performing statistical queries. The idea is
that a randomized query satisfies differential privacy if the likelihood of
obtaining a certain answer for a database is not too different from the
likelihood of obtaining the same answer on adjacent databases, i.e. databases
which differ from for only one individual. Information flow is an area of
Security concerned with the problem of controlling the leakage of confidential
information in programs and protocols. Nowadays, one of the most established
approaches to quantify and to reason about leakage is based on the R\'enyi min
entropy version of information theory. In this paper, we analyze critically the
notion of differential privacy in light of the conceptual framework provided by
the R\'enyi min information theory. We show that there is a close relation
between differential privacy and leakage, due to the graph symmetries induced
by the adjacency relation. Furthermore, we consider the utility of the
randomized answer, which measures its expected degree of accuracy. We focus on
certain kinds of utility functions called "binary", which have a close
correspondence with the R\'enyi min mutual information. Again, it turns out
that there can be a tight correspondence between differential privacy and
utility, depending on the symmetries induced by the adjacency relation and by
the query. Depending on these symmetries we can also build an optimal-utility
randomization mechanism while preserving the required level of differential
privacy. Our main contribution is a study of the kind of structures that can be
induced by the adjacency relation and the query, and how to use them to derive
bounds on the leakage and achieve the optimal utility
Differential Privacy versus Quantitative Information Flow
Differential privacy is a notion of privacy that has become very popular in
the database community. Roughly, the idea is that a randomized query mechanism
provides sufficient privacy protection if the ratio between the probabilities
of two different entries to originate a certain answer is bound by e^\epsilon.
In the fields of anonymity and information flow there is a similar concern for
controlling information leakage, i.e. limiting the possibility of inferring the
secret information from the observables. In recent years, researchers have
proposed to quantify the leakage in terms of the information-theoretic notion
of mutual information. There are two main approaches that fall in this
category: One based on Shannon entropy, and one based on R\'enyi's min entropy.
The latter has connection with the so-called Bayes risk, which expresses the
probability of guessing the secret. In this paper, we show how to model the
query system in terms of an information-theoretic channel, and we compare the
notion of differential privacy with that of mutual information. We show that
the notion of differential privacy is strictly stronger, in the sense that it
implies a bound on the mutual information, but not viceversa
Development and Analysis of Deterministic Privacy-Preserving Policies Using Non-Stochastic Information Theory
A deterministic privacy metric using non-stochastic information theory is
developed. Particularly, minimax information is used to construct a measure of
information leakage, which is inversely proportional to the measure of privacy.
Anyone can submit a query to a trusted agent with access to a non-stochastic
uncertain private dataset. Optimal deterministic privacy-preserving policies
for responding to the submitted query are computed by maximizing the measure of
privacy subject to a constraint on the worst-case quality of the response
(i.e., the worst-case difference between the response by the agent and the
output of the query computed on the private dataset). The optimal
privacy-preserving policy is proved to be a piecewise constant function in the
form of a quantization operator applied on the output of the submitted query.
The measure of privacy is also used to analyze the performance of -anonymity
methodology (a popular deterministic mechanism for privacy-preserving release
of datasets using suppression and generalization techniques), proving that it
is in fact not privacy-preserving.Comment: improved introduction and numerical exampl
Distributed Private Heavy Hitters
In this paper, we give efficient algorithms and lower bounds for solving the
heavy hitters problem while preserving differential privacy in the fully
distributed local model. In this model, there are n parties, each of which
possesses a single element from a universe of size N. The heavy hitters problem
is to find the identity of the most common element shared amongst the n
parties. In the local model, there is no trusted database administrator, and so
the algorithm must interact with each of the parties separately, using a
differentially private protocol. We give tight information-theoretic upper and
lower bounds on the accuracy to which this problem can be solved in the local
model (giving a separation between the local model and the more common
centralized model of privacy), as well as computationally efficient algorithms
even in the case where the data universe N may be exponentially large
- …