Search CORE

371 research outputs found

Strong Hardness of Privacy from Weak Traitor Tracing

Author: Jonathan Ullman
Lucas Kowalczyk
Mark Zhandry
Tal Malkin
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 31/05/2018
Field of study

A central problem in differential privacy is to accurately answer a large family

Q

of statistical queries over a data universe

X

. A statistical query on a dataset

D \in X^n

asks ``what fraction of the elements of

D

satisfy a given predicate

p

X

?\u27\u27 Ignoring computational constraints, it is possible to accurately answer exponentially many queries on an exponential size universe while satisfying differential privacy (Blum et al., STOC\u2708). Dwork et al. (STOC\u2709) and Boneh and Zhandry (CRYPTO\u2714) showed that if both

Q

and

X

are of polynomial size, then there is an efficient differentially private algorithm that accurately answers all the queries. They also proved that if

Q

and

X

are both exponentially large, then under a plausible assumption, no efficient algorithm exists. We show that, under the same assumption, if either the number of queries or the data universe is of exponential size, then there is no differentially private algorithm that answers all the queries. Specifically, we prove that if one-way functions and indistinguishability obfuscation exist, then: 1) For every

n

, there is a family

Q

\tilde{O}(n^7)

queries on a data universe

X

of size

2^d

such that no

poly(n,d)

time differentially private algorithm takes a dataset

D \in X^n

and outputs accurate answers to every query in

Q

. 2) For every

n

, there is a family

Q

2^d

queries on a data universe

X

of size

\tilde{O}(n^7)

such that no

poly(n,d)

time differentially private algorithm takes a dataset

D \in X^n

and outputs accurate answers to every query in

Q

. In both cases, the result is nearly quantitatively tight, since there is an efficient differentially private algorithm that answers

\tilde{\Omega}(n^2)

queries on an exponential size data universe, and one that answers exponentially many queries on a data universe of size

\tilde{\Omega}(n^2)

. Our proofs build on the connection between hardness results in differential privacy and traitor-tracing schemes (Dwork et al., STOC\u2709; Ullman, STOC\u2713). We prove our hardness result for a polynomial size query set (resp., data universe) by showing that they follow from the existence of a special type of traitor-tracing scheme with very short ciphertexts (resp., secret keys), but very weak security guarantees, and then constructing such a scheme

Cryptology ePrint Archive

Order-Revealing Encryption and the Hardness of Private Learning

Author: A Beimel
A Beimel
A Blum
A Boldyreva
A Gupta
B Chor
C Dwork
C Dwork
D Boneh
D Boneh
D Boneh
J Groth
J Thaler
J Ullman
L Pitt
LG Valiant
M Kearns
M Kearns
M Kharitonov
O Goldreich
O Pandey
RA Servedio
RA Servedio
S Garg
S Goldwasser
SP Kasiviswanathan
T Graepel
Z Brakerski
Publication venue
Publication date: 01/01/2015
Field of study

An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient

(\epsilon, \delta)

-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS '08, SIAM J. Comput. '11). To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest.Comment: 28 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Preventing False Discovery in Interactive Data Analysis is Hard

Author: Hardt Moritz
Ullman Jonathan
Publication venue
Publication date: 06/08/2014
Field of study

We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given

n

samples from an unknown distribution can give valid answers to

n^{3+o(1)}

adaptively chosen statistical queries. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution. Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively (no query may depend on the answers to previous queries). Moreover, a recent work by Dwork et al. shows how to accurately answer exponentially many adaptively chosen statistical queries via a computationally inefficient algorithm; and how to answer a quadratic number of adaptive queries via a computationally efficient algorithm. The latter result implies that our result is tight up to a linear factor in

n.

Conceptually, our result demonstrates that achieving statistical validity alone can be a source of computational intractability in adaptive settings. For example, in the modern large collaborative research environment, data analysts typically choose a particular approach based on previous findings. False discovery occurs if a research finding is supported by the data but not by the underlying distribution. While the study of preventing false discovery in Statistics is decades old, to the best of our knowledge our result is the first to demonstrate a computational barrier. In particular, our result suggests that the perceived difficulty of preventing false discovery in today's collaborative research environment may be inherent

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hardness of Non-Interactive Differential Privacy from One-Way Functions

Author: A Beimel
A Blum
A Gupta
A Gupta
A Nikolov
B Chor
B Tang
C Dwork
C Dwork
Cynthia Dwork
D Boneh
D Boneh
D Boneh
J Thaler
J Ullman
J Ullman
L Kowalczyk
L Pitt
M Bun
Moritz Hardt
S Gorbunov
S Vadhan
Y Dodis
Z Brakerski
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 31/05/2018
Field of study

A central challenge in differential privacy is to design computationally efficient non-interactive algorithms that can answer large numbers of statistical queries on a sensitive dataset. That is, we would like to design a differentially private algorithm that takes a dataset

D \in X^n

consisting of some small number of elements

n

from some large data universe

X

, and efficiently outputs a summary that allows a user to efficiently obtain an answer to any query in some large family

Q

. Ignoring computational constraints, this problem can be solved even when

X

and

Q

are exponentially large and

n

is just a small polynomial; however, all algorithms with remotely similar guarantees run in exponential time. There have been several results showing that, under the strong assumption of indistinguishability obfuscation (iO), no efficient differentially private algorithm exists when

X

and

Q

can be exponentially large. However, there are no strong separations between information-theoretic and computationally efficient differentially private algorithms under any standard complexity assumption. In this work we show that, if one-way functions exist, there is no general purpose differentially private algorithm that works when

X

and

Q

are exponentially large, and

n

is an arbitrary polynomial. In fact, we show that this result holds even if

X

is just subexponentially large (assuming only polynomially-hard one-way functions). This result solves an open problem posed by Vadhan in his recent survey

Crossref

Cryptology ePrint Archive

Order-Revealing Encryption and the Hardness of Private Learning

Author: Mark Bun
Mark Zhandry
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 05/05/2015
Field of study

(\epsilon, \delta)

-differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS \u2708, SIAM J. Comput. \u2711). To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest

Cryptology ePrint Archive

Recommended from our members

Privacy and the Complexity of Simple Queries

Author: Ullman Jonathan Robert
Publication venue: 'Harvard University Botany Libraries'
Publication date: 16/09/2013
Field of study

As both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a signiﬁcant inﬂuence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efﬁcient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efﬁcient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efﬁcient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefﬁciency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.Engineering and Applied Science

Harvard University - DASH