Search CORE

356 research outputs found

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Author: Dwork Cynthia
Nikolov Aleksandar
Talwar Kunal
Publication venue
Publication date: 06/08/2013
Field of study

Consider a database of

n

people, each represented by a bit-string of length

d

corresponding to the setting of

d

binary attributes. A

k

-way marginal query is specified by a subset

S

k

attributes, and a

|S|

-dimensional binary vector

\beta

specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to

S

agrees with

\beta

. Privately releasing approximate answers to a set of

k

-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least

\Omega(\min\{\sqrt{n},d^{\frac{k}{2}}\})

and at most

\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})

. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small

n

. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most

\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})

. This error bound is as good as the best known information theoretic upper bounds for

k=2

. This bound is an improvement over previous work on efficiently releasing marginals when

k

is small and when error

o(n)

is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis

arXiv.org e-Print Archive

CiteSeerX

Accurate and Efficient Private Release of Datacubes and Contingency Tables

Author: Cormode Graham
Procopiuc Cecilia M.
Srivastava Divesh
Yaroslavtsev Grigory
Publication venue
Publication date: 25/07/2012
Field of study

A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a "strategy" set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies. In this paper, we propose a new approach that balances accuracy and efficiency: we show how to improve the accuracy of a given query set by answering some strategy queries more accurately than others. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise

arXiv.org e-Print Archive

CiteSeerX

Nearly Optimal Private Convolution

Author: A. Bhaskara
A. Blum
C. Dwork
C. Li
C. Li
J. Bolot
J. Thaler
K. Chandrasekaran
L. Lovász
R.M. Gray
T.-H. Hubert Chan
Publication venue
Publication date: 01/01/2013
Field of study

We study computing the convolution of a private input

x

with a public input

h

, while satisfying the guarantees of

(\epsilon, \delta)

-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying

(\epsilon, \delta)

-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants

arXiv.org e-Print Archive

Crossref

An Adaptive Mechanism for Accurate Query Answering under Differential Privacy

Author: Li Chao
Miklau Gerome
Publication venue
Publication date: 01/01/2012
Field of study

We propose a novel mechanism for answering sets of count- ing queries under differential privacy. Given a workload of counting queries, the mechanism automatically selects a different set of "strategy" queries to answer privately, using those answers to derive answers to the workload. The main algorithm proposed in this paper approximates the optimal strategy for any workload of linear counting queries. With no cost to the privacy guarantee, the mechanism improves significantly on prior approaches and achieves near-optimal error for many workloads, when applied under (\epsilon, \delta)-differential privacy. The result is an adaptive mechanism which can help users achieve good utility without requiring that they reason carefully about the best formulation of their task.Comment: VLDB2012. arXiv admin note: substantial text overlap with arXiv:1103.136

arXiv.org e-Print Archive

CiteSeerX

ScholarWorks@UMass Amherst

Faster Algorithms for Privately Releasing Marginals

Author: Thaler Justin
Ullman Jonathan
Vadhan Salil
Publication venue
Publication date: 11/02/2014
Field of study

We study the problem of releasing

k

-way marginals of a database

D \in (\{0,1\}^d)^n

, while preserving differential privacy. The answer to a

k

-way marginal query is the fraction of

D

's records

x \in \{0,1\}^d

with a given value in each of a given set of up to

k

columns. Marginal queries enable a rich class of statistical analyses of a dataset, and designing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf. Barak et. al., PODS '07). We give an algorithm that runs in time

d^{O(\sqrt{k})}

and releases a private summary capable of answering any

k

-way marginal query with at most

\pm .01

error on every query as long as

n \geq d^{O(\sqrt{k})}

. To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with non-trivial worst-case accuracy guarantees in time substantially smaller than the number of

k

-way marginal queries, which is

d^{\Theta(k)}

(for

k \ll d

)

arXiv.org e-Print Archive

Harvard University - DASH

Marginal Release Under Local Differential Privacy

Author: Bassily R.
Chaudhuri A.
Ding B.
Hardt M.
Jurafsky D.
Kairouz P.
Leen T. K.
Narayan A.
Wang T.
Publication venue
Publication date: 08/11/2017
Field of study

Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals

arXiv.org e-Print Archive

Crossref