1,674 research outputs found
Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations
Consider a database of people, each represented by a bit-string of length
corresponding to the setting of binary attributes. A -way marginal
query is specified by a subset of attributes, and a -dimensional
binary vector specifying their values. The result for this query is a
count of the number of people in the database whose attribute vector restricted
to agrees with .
Privately releasing approximate answers to a set of -way marginal queries
is one of the most important and well-motivated problems in differential
privacy. Information theoretically, the error complexity of marginal queries is
well-understood: the per-query additive error is known to be at least
and at most
. However, no polynomial
time algorithm with error complexity as low as the information theoretic upper
bound is known for small . In this work we present a polynomial time
algorithm that, for any distribution on marginal queries, achieves average
error at most . This error
bound is as good as the best known information theoretic upper bounds for
. This bound is an improvement over previous work on efficiently releasing
marginals when is small and when error is desirable. Using private
boosting we are also able to give nearly matching worst-case error bounds.
Our algorithms are based on the geometric techniques of Nikolov, Talwar, and
Zhang. The main new ingredients are convex relaxations and careful use of the
Frank-Wolfe algorithm for constrained convex minimization. To design our
relaxations, we rely on the Grothendieck inequality from functional analysis
Formal Verification of Differential Privacy for Interactive Systems
Differential privacy is a promising approach to privacy preserving data
analysis with a well-developed theory for functions. Despite recent work on
implementing systems that aim to provide differential privacy, the problem of
formally verifying that these systems have differential privacy has not been
adequately addressed. This paper presents the first results towards automated
verification of source code for differentially private interactive systems. We
develop a formal probabilistic automaton model of differential privacy for
systems by adapting prior work on differential privacy for functions. The main
technical result of the paper is a sound proof technique based on a form of
probabilistic bisimulation relation for proving that a system modeled as a
probabilistic automaton satisfies differential privacy. The novelty lies in the
way we track quantitative privacy leakage bounds using a relation family
instead of a single relation. We illustrate the proof technique on a
representative automaton motivated by PINQ, an implemented system that is
intended to provide differential privacy. To make our proof technique easier to
apply to realistic systems, we prove a form of refinement theorem and apply it
to show that a refinement of the abstract PINQ automaton also satisfies our
differential privacy definition. Finally, we begin the process of automating
our proof technique by providing an algorithm for mechanically checking a
restricted class of relations from the proof technique.Comment: 65 pages with 1 figur
Private Graph Data Release: A Survey
The application of graph analytics to various domains have yielded tremendous
societal and economical benefits in recent years. However, the increasingly
widespread adoption of graph analytics comes with a commensurate increase in
the need to protect private information in graph databases, especially in light
of the many privacy breaches in real-world graph data that was supposed to
preserve sensitive information. This paper provides a comprehensive survey of
private graph data release algorithms that seek to achieve the fine balance
between privacy and utility, with a specific focus on provably private
mechanisms. Many of these mechanisms fall under natural extensions of the
Differential Privacy framework to graph data, but we also investigate more
general privacy formulations like Pufferfish Privacy that can deal with the
limitations of Differential Privacy. A wide-ranging survey of the applications
of private graph data release mechanisms to social networks, finance, supply
chain, health and energy is also provided. This survey paper and the taxonomy
it provides should benefit practitioners and researchers alike in the
increasingly important area of private graph data release and analysis
SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication
Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations.Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
- …