5,297 research outputs found
Fast Private Data Release Algorithms for Sparse Queries
We revisit the problem of accurately answering large classes of statistical
queries while preserving differential privacy. Previous approaches to this
problem have either been very general but have not had run-time polynomial in
the size of the database, have applied only to very limited classes of queries,
or have relaxed the notion of worst-case error guarantees. In this paper we
consider the large class of sparse queries, which take non-zero values on only
polynomially many universe elements. We give efficient query release algorithms
for this class, in both the interactive and the non-interactive setting. Our
algorithms also achieve better accuracy bounds than previous general techniques
do when applied to sparse queries: our bounds are independent of the universe
size. In fact, even the runtime of our interactive mechanism is independent of
the universe size, and so can be implemented in the "infinite universe" model
in which no finite universe need be specified by the data curator
Accurate and Efficient Private Release of Datacubes and Contingency Tables
A central problem in releasing aggregate information about sensitive data is
to do so accurately while providing a privacy guarantee on the output. Recent
work focuses on the class of linear queries, which include basic counting
queries, data cubes, and contingency tables. The goal is to maximize the
utility of their output, while giving a rigorous privacy guarantee. Most
results follow a common template: pick a "strategy" set of linear queries to
apply to the data, then use the noisy answers to these queries to reconstruct
the queries of interest. This entails either picking a strategy set that is
hoped to be good for the queries, or performing a costly search over the space
of all possible strategies.
In this paper, we propose a new approach that balances accuracy and
efficiency: we show how to improve the accuracy of a given query set by
answering some strategy queries more accurately than others. This leads to an
efficient optimal noise allocation for many popular strategies, including
wavelets, hierarchies, Fourier coefficients and more. For the important case of
marginal queries we show that this strictly improves on previous methods, both
analytically and empirically. Our results also extend to ensuring that the
returned query answers are consistent with an (unknown) data set at minimal
extra cost in terms of time and noise
Differential Privacy for the Analyst via Private Equilibrium Computation
We give new mechanisms for answering exponentially many queries from multiple
analysts on a private database, while protecting differential privacy both for
the individuals in the database and for the analysts. That is, our mechanism's
answer to each query is nearly insensitive to changes in the queries asked by
other analysts. Our mechanism is the first to offer differential privacy on the
joint distribution over analysts' answers, providing privacy for data analysts
even if the other data analysts collude or register multiple accounts. In some
settings, we are able to achieve nearly optimal error rates (even compared to
mechanisms which do not offer analyst privacy), and we are able to extend our
techniques to handle non-linear queries. Our analysis is based on a novel view
of the private query-release problem as a two-player zero-sum game, which may
be of independent interest
Distributed Private Heavy Hitters
In this paper, we give efficient algorithms and lower bounds for solving the
heavy hitters problem while preserving differential privacy in the fully
distributed local model. In this model, there are n parties, each of which
possesses a single element from a universe of size N. The heavy hitters problem
is to find the identity of the most common element shared amongst the n
parties. In the local model, there is no trusted database administrator, and so
the algorithm must interact with each of the parties separately, using a
differentially private protocol. We give tight information-theoretic upper and
lower bounds on the accuracy to which this problem can be solved in the local
model (giving a separation between the local model and the more common
centralized model of privacy), as well as computationally efficient algorithms
even in the case where the data universe N may be exponentially large
Nearly Optimal Private Convolution
We study computing the convolution of a private input with a public input
, while satisfying the guarantees of -differential
privacy. Convolution is a fundamental operation, intimately related to Fourier
Transforms. In our setting, the private input may represent a time series of
sensitive events or a histogram of a database of confidential personal
information. Convolution then captures important primitives including linear
filtering, which is an essential tool in time series analysis, and aggregation
queries on projections of the data.
We give a nearly optimal algorithm for computing convolutions while
satisfying -differential privacy. Surprisingly, we follow
the simple strategy of adding independent Laplacian noise to each Fourier
coefficient and bounding the privacy loss using the composition theorem of
Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal
noise to add to each Fourier coefficient using convex programming duality. Our
algorithm is very efficient -- it is essentially no more computationally
expensive than a Fast Fourier Transform.
To prove near optimality, we use the recent discrepancy lowerbounds of
Muthukrishnan and Nikolov and derive a spectral lower bound using a
characterization of discrepancy in terms of determinants
Differentially Private Publication of Sparse Data
The problem of privately releasing data is to provide a version of a dataset
without revealing sensitive information about the individuals who contribute to
the data. The model of differential privacy allows such private release while
providing strong guarantees on the output. A basic mechanism achieves
differential privacy by adding noise to the frequency counts in the contingency
tables (or, a subset of the count data cube) derived from the dataset. However,
when the dataset is sparse in its underlying space, as is the case for most
multi-attribute relations, then the effect of adding noise is to vastly
increase the size of the published data: it implicitly creates a huge number of
dummy data points to mask the true data, making it almost impossible to work
with.
We present techniques to overcome this roadblock and allow efficient private
release of sparse data, while maintaining the guarantees of differential
privacy. Our approach is to release a compact summary of the noisy data.
Generating the noisy data and then summarizing it would still be very costly,
so we show how to shortcut this step, and instead directly generate the summary
from the input data, without materializing the vast intermediate noisy data. We
instantiate this outline for a variety of sampling and filtering methods, and
show how to use the resulting summary for approximate, private, query
answering. Our experimental study shows that this is an effective, practical
solution, with comparable and occasionally improved utility over the costly
materialization approach
Exploiting Metric Structure for Efficient Private Query Release
We consider the problem of privately answering queries defined on databases
which are collections of points belonging to some metric space. We give simple,
computationally efficient algorithms for answering distance queries defined
over an arbitrary metric. Distance queries are specified by points in the
metric space, and ask for the average distance from the query point to the
points contained in the database, according to the specified metric. Our
algorithms run efficiently in the database size and the dimension of the space,
and operate in both the online query release setting, and the offline setting
in which they must in polynomial time generate a fixed data structure which can
answer all queries of interest. This represents one of the first subclasses of
linear queries for which efficient algorithms are known for the private query
release problem, circumventing known hardness results for generic linear
queries
- …