368 research outputs found
Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations
Consider a database of people, each represented by a bit-string of length
corresponding to the setting of binary attributes. A -way marginal
query is specified by a subset of attributes, and a -dimensional
binary vector specifying their values. The result for this query is a
count of the number of people in the database whose attribute vector restricted
to agrees with .
Privately releasing approximate answers to a set of -way marginal queries
is one of the most important and well-motivated problems in differential
privacy. Information theoretically, the error complexity of marginal queries is
well-understood: the per-query additive error is known to be at least
and at most
. However, no polynomial
time algorithm with error complexity as low as the information theoretic upper
bound is known for small . In this work we present a polynomial time
algorithm that, for any distribution on marginal queries, achieves average
error at most . This error
bound is as good as the best known information theoretic upper bounds for
. This bound is an improvement over previous work on efficiently releasing
marginals when is small and when error is desirable. Using private
boosting we are also able to give nearly matching worst-case error bounds.
Our algorithms are based on the geometric techniques of Nikolov, Talwar, and
Zhang. The main new ingredients are convex relaxations and careful use of the
Frank-Wolfe algorithm for constrained convex minimization. To design our
relaxations, we rely on the Grothendieck inequality from functional analysis
Accurate and Efficient Private Release of Datacubes and Contingency Tables
A central problem in releasing aggregate information about sensitive data is
to do so accurately while providing a privacy guarantee on the output. Recent
work focuses on the class of linear queries, which include basic counting
queries, data cubes, and contingency tables. The goal is to maximize the
utility of their output, while giving a rigorous privacy guarantee. Most
results follow a common template: pick a "strategy" set of linear queries to
apply to the data, then use the noisy answers to these queries to reconstruct
the queries of interest. This entails either picking a strategy set that is
hoped to be good for the queries, or performing a costly search over the space
of all possible strategies.
In this paper, we propose a new approach that balances accuracy and
efficiency: we show how to improve the accuracy of a given query set by
answering some strategy queries more accurately than others. This leads to an
efficient optimal noise allocation for many popular strategies, including
wavelets, hierarchies, Fourier coefficients and more. For the important case of
marginal queries we show that this strictly improves on previous methods, both
analytically and empirically. Our results also extend to ensuring that the
returned query answers are consistent with an (unknown) data set at minimal
extra cost in terms of time and noise
Nearly Optimal Private Convolution
We study computing the convolution of a private input with a public input
, while satisfying the guarantees of -differential
privacy. Convolution is a fundamental operation, intimately related to Fourier
Transforms. In our setting, the private input may represent a time series of
sensitive events or a histogram of a database of confidential personal
information. Convolution then captures important primitives including linear
filtering, which is an essential tool in time series analysis, and aggregation
queries on projections of the data.
We give a nearly optimal algorithm for computing convolutions while
satisfying -differential privacy. Surprisingly, we follow
the simple strategy of adding independent Laplacian noise to each Fourier
coefficient and bounding the privacy loss using the composition theorem of
Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal
noise to add to each Fourier coefficient using convex programming duality. Our
algorithm is very efficient -- it is essentially no more computationally
expensive than a Fast Fourier Transform.
To prove near optimality, we use the recent discrepancy lowerbounds of
Muthukrishnan and Nikolov and derive a spectral lower bound using a
characterization of discrepancy in terms of determinants
An Adaptive Mechanism for Accurate Query Answering under Differential Privacy
We propose a novel mechanism for answering sets of count- ing queries under
differential privacy. Given a workload of counting queries, the mechanism
automatically selects a different set of "strategy" queries to answer
privately, using those answers to derive answers to the workload. The main
algorithm proposed in this paper approximates the optimal strategy for any
workload of linear counting queries. With no cost to the privacy guarantee, the
mechanism improves significantly on prior approaches and achieves near-optimal
error for many workloads, when applied under (\epsilon, \delta)-differential
privacy. The result is an adaptive mechanism which can help users achieve good
utility without requiring that they reason carefully about the best formulation
of their task.Comment: VLDB2012. arXiv admin note: substantial text overlap with
arXiv:1103.136
Faster Algorithms for Privately Releasing Marginals
We study the problem of releasing -way marginals of a database , while preserving differential privacy. The answer to a -way
marginal query is the fraction of 's records with a given
value in each of a given set of up to columns. Marginal queries enable a
rich class of statistical analyses of a dataset, and designing efficient
algorithms for privately releasing marginal queries has been identified as an
important open problem in private data analysis (cf. Barak et. al., PODS '07).
We give an algorithm that runs in time and releases a
private summary capable of answering any -way marginal query with at most
error on every query as long as . To our
knowledge, ours is the first algorithm capable of privately releasing marginal
queries with non-trivial worst-case accuracy guarantees in time substantially
smaller than the number of -way marginal queries, which is
(for )
Marginal Release Under Local Differential Privacy
Many analysis and machine learning tasks require the availability of marginal
statistics on multidimensional datasets while providing strong privacy
guarantees for the data subjects. Applications for these statistics range from
finding correlations in the data to fitting sophisticated prediction models. In
this paper, we provide a set of algorithms for materializing marginal
statistics under the strong model of local differential privacy. We prove the
first tight theoretical bounds on the accuracy of marginals compiled under each
approach, perform empirical evaluation to confirm these bounds, and evaluate
them for tasks such as modeling and correlation testing. Our results show that
releasing information based on (local) Fourier transformations of the input is
preferable to alternatives based directly on (local) marginals
- …