1,600 research outputs found
An Adaptive Mechanism for Accurate Query Answering under Differential Privacy
We propose a novel mechanism for answering sets of count- ing queries under
differential privacy. Given a workload of counting queries, the mechanism
automatically selects a different set of "strategy" queries to answer
privately, using those answers to derive answers to the workload. The main
algorithm proposed in this paper approximates the optimal strategy for any
workload of linear counting queries. With no cost to the privacy guarantee, the
mechanism improves significantly on prior approaches and achieves near-optimal
error for many workloads, when applied under (\epsilon, \delta)-differential
privacy. The result is an adaptive mechanism which can help users achieve good
utility without requiring that they reason carefully about the best formulation
of their task.Comment: VLDB2012. arXiv admin note: substantial text overlap with
arXiv:1103.136
Accurate and Efficient Private Release of Datacubes and Contingency Tables
A central problem in releasing aggregate information about sensitive data is
to do so accurately while providing a privacy guarantee on the output. Recent
work focuses on the class of linear queries, which include basic counting
queries, data cubes, and contingency tables. The goal is to maximize the
utility of their output, while giving a rigorous privacy guarantee. Most
results follow a common template: pick a "strategy" set of linear queries to
apply to the data, then use the noisy answers to these queries to reconstruct
the queries of interest. This entails either picking a strategy set that is
hoped to be good for the queries, or performing a costly search over the space
of all possible strategies.
In this paper, we propose a new approach that balances accuracy and
efficiency: we show how to improve the accuracy of a given query set by
answering some strategy queries more accurately than others. This leads to an
efficient optimal noise allocation for many popular strategies, including
wavelets, hierarchies, Fourier coefficients and more. For the important case of
marginal queries we show that this strictly improves on previous methods, both
analytically and empirically. Our results also extend to ensuring that the
returned query answers are consistent with an (unknown) data set at minimal
extra cost in terms of time and noise
Efficient Batch Query Answering Under Differential Privacy
Differential privacy is a rigorous privacy condition achieved by randomizing
query answers. This paper develops efficient algorithms for answering multiple
queries under differential privacy with low error. We pursue this goal by
advancing a recent approach called the matrix mechanism, which generalizes
standard differentially private mechanisms. This new mechanism works by first
answering a different set of queries (a strategy) and then inferring the
answers to the desired workload of queries. Although a few strategies are known
to work well on specific workloads, finding the strategy which minimizes error
on an arbitrary workload is intractable. We prove a new lower bound on the
optimal error of this mechanism, and we propose an efficient algorithm that
approaches this bound for a wide range of workloads.Comment: 6 figues, 22 page
Differentially Private Publication of Sparse Data
The problem of privately releasing data is to provide a version of a dataset
without revealing sensitive information about the individuals who contribute to
the data. The model of differential privacy allows such private release while
providing strong guarantees on the output. A basic mechanism achieves
differential privacy by adding noise to the frequency counts in the contingency
tables (or, a subset of the count data cube) derived from the dataset. However,
when the dataset is sparse in its underlying space, as is the case for most
multi-attribute relations, then the effect of adding noise is to vastly
increase the size of the published data: it implicitly creates a huge number of
dummy data points to mask the true data, making it almost impossible to work
with.
We present techniques to overcome this roadblock and allow efficient private
release of sparse data, while maintaining the guarantees of differential
privacy. Our approach is to release a compact summary of the noisy data.
Generating the noisy data and then summarizing it would still be very costly,
so we show how to shortcut this step, and instead directly generate the summary
from the input data, without materializing the vast intermediate noisy data. We
instantiate this outline for a variety of sampling and filtering methods, and
show how to use the resulting summary for approximate, private, query
answering. Our experimental study shows that this is an effective, practical
solution, with comparable and occasionally improved utility over the costly
materialization approach
- …