Search CORE

242 research outputs found

Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output

Author: Huang Junliang
Jin Chi
Wang Liwei
Wang Ziteng
Zhong Yiqiao
Publication venue
Publication date: 06/01/2014
Field of study

We consider accurately answering smooth queries while preserving differential privacy. A query is said to be

K

-smooth if it is specified by a function defined on

[-1,1]^d

whose partial derivatives up to order

K

are all bounded. We develop an

\epsilon

-differentially private mechanism for the class of

K

-smooth queries. The major advantage of the algorithm is that it outputs a synthetic database. In real applications, a synthetic database output is appealing. Our mechanism achieves an accuracy of

O (n^{-\frac{K}{2d+K}}/\epsilon )

, and runs in polynomial time. We also generalize the mechanism to preserve

(\epsilon, \delta)

-differential privacy with slightly improved accuracy. Extensive experiments on benchmark datasets demonstrate that the mechanisms have good accuracy and are efficient

arXiv.org e-Print Archive

CiteSeerX

Order-Revealing Encryption and the Hardness of Private Learning

Author: A Beimel
A Beimel
A Blum
A Boldyreva
A Gupta
B Chor
C Dwork
C Dwork
D Boneh
D Boneh
D Boneh
J Groth
J Thaler
J Ullman
L Pitt
LG Valiant
M Kearns
M Kearns
M Kharitonov
O Goldreich
O Pandey
RA Servedio
RA Servedio
S Garg
S Goldwasser
SP Kasiviswanathan
T Graepel
Z Brakerski
Publication venue
Publication date: 01/01/2015
Field of study

An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient

(\epsilon, \delta)

-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11). To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.Comment: 28 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Private Multiplicative Weights Beyond Linear Queries

Author: Bassily R.
Beimel A.
Dwork C.
Hardt M.
Jain P.
Kasiviswanathan S. P.
Kifer D.
Thakurta A.
Ullman J.
Publication venue
Publication date: 14/03/2015
Field of study

A wide variety of fundamental data analyses in machine learning, such as linear and logistic regression, require minimizing a convex function defined by the data. Since the data may contain sensitive information about individuals, and these analyses can leak that sensitive information, it is important to be able to solve convex minimization in a privacy-preserving way. A series of recent results show how to accurately solve a single convex minimization problem in a differentially private manner. However, the same data is often analyzed repeatedly, and little is known about solving multiple convex minimization problems with differential privacy. For simpler data analyses, such as linear queries, there are remarkable differentially private algorithms such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS 2010) that accurately answer exponentially many distinct queries. In this work, we extend these results to the case of convex minimization and show how to give accurate and differentially private solutions to *exponentially many* convex minimization problems on a sensitive dataset

arXiv.org e-Print Archive

CiteSeerX

Crossref

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Author: Dwork Cynthia
Nikolov Aleksandar
Talwar Kunal
Publication venue
Publication date: 06/08/2013
Field of study

Consider a database of

n

people, each represented by a bit-string of length

d

corresponding to the setting of

d

binary attributes. A

k

-way marginal query is specified by a subset

S

k

attributes, and a

|S|

-dimensional binary vector

\beta

specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to

S

agrees with

\beta

. Privately releasing approximate answers to a set of

k

-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least

\Omega(\min\{\sqrt{n},d^{\frac{k}{2}}\})

and at most

\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})

. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small

n

. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most

\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})

. This error bound is as good as the best known information theoretic upper bounds for

k=2

. This bound is an improvement over previous work on efficiently releasing marginals when

k

is small and when error

o(n)

is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis

arXiv.org e-Print Archive

CiteSeerX

Privately Releasing Conjunctions and the Statistical Query Barrier

Author: Gupta Anupam
Hardt Moritz
Roth Aaron
Ullman Jonathan
Publication venue
Publication date: 01/01/2011
Field of study

Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to the first algorithm that efficiently releases differentially private answers to of all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Privacy and the Complexity of Simple Queries

Author: Ullman Jonathan Robert
Publication venue: 'Harvard University Botany Libraries'
Publication date: 16/09/2013
Field of study

As both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a signiﬁcant inﬂuence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efﬁcient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efﬁcient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efﬁcient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefﬁciency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.Engineering and Applied Science

Harvard University - DASH

Dual Query: Practical Private Query Release for High Dimensional Data

Author: Arias Emilio Jesús Gallego
Gaboardi Marco
Hsu Justin
Roth Aaron
Wu Zhiwei Steven
Publication venue
Publication date: 18/11/2015
Field of study

We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved non-privately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example, our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude

arXiv.org e-Print Archive

CiteSeerX

HAL-MINES ParisTech