371 research outputs found
Strong Hardness of Privacy from Weak Traitor Tracing
A central problem in differential privacy is to accurately answer a large family
of statistical queries over a data universe . A statistical query
on a dataset asks ``what fraction of the elements of satisfy a given
predicate on ?\u27\u27 Ignoring computational constraints, it is possible to accurately
answer exponentially many queries on an exponential size universe while satisfying
differential privacy (Blum et al., STOC\u2708). Dwork et al. (STOC\u2709) and Boneh and
Zhandry (CRYPTO\u2714) showed that if both and are of polynomial size,
then there is an efficient differentially private algorithm that
accurately answers all the queries. They also proved that if and are both
exponentially large, then under a plausible assumption, no efficient
algorithm exists.
We show that, under the same assumption,
if either the number of queries or the data universe is of
exponential size, then there is no differentially private algorithm
that answers all the queries.
Specifically, we prove that if one-way functions and
indistinguishability obfuscation exist, then:
1) For every , there is a family of queries on a data universe of size such that no time differentially private algorithm takes a dataset and outputs accurate answers to every query in .
2) For every , there is a family of queries on a data universe of size such that no time differentially private algorithm takes a dataset and outputs accurate answers to every query in .
In both cases, the result is nearly quantitatively tight, since there
is an efficient differentially private algorithm that answers
queries on an exponential size data universe,
and one that answers exponentially many queries on a data universe of
size .
Our proofs build on the connection between hardness results in
differential privacy and traitor-tracing schemes (Dwork et al.,
STOC\u2709; Ullman, STOC\u2713). We prove our hardness result for a
polynomial size query set (resp., data universe) by showing that they
follow from the existence of a special type of traitor-tracing scheme
with very short ciphertexts (resp., secret keys), but very weak
security guarantees, and then constructing such a scheme
Order-Revealing Encryption and the Hardness of Private Learning
An order-revealing encryption scheme gives a public procedure by which two
ciphertexts can be compared to reveal the ordering of their underlying
plaintexts. We show how to use order-revealing encryption to separate
computationally efficient PAC learning from efficient -differentially private PAC learning. That is, we construct a concept
class that is efficiently PAC learnable, but for which every efficient learner
fails to be differentially private. This answers a question of Kasiviswanathan
et al. (FOCS '08, SIAM J. Comput. '11).
To prove our result, we give a generic transformation from an order-revealing
encryption scheme into one with strongly correct comparison, which enables the
consistent comparison of ciphertexts that are not obtained as the valid
encryption of any message. We believe this construction may be of independent
interest.Comment: 28 page
Preventing False Discovery in Interactive Data Analysis is Hard
We show that, under a standard hardness assumption, there is no
computationally efficient algorithm that given samples from an unknown
distribution can give valid answers to adaptively chosen
statistical queries. A statistical query asks for the expectation of a
predicate over the underlying distribution, and an answer to a statistical
query is valid if it is "close" to the correct expectation over the
distribution.
Our result stands in stark contrast to the well known fact that exponentially
many statistical queries can be answered validly and efficiently if the queries
are chosen non-adaptively (no query may depend on the answers to previous
queries). Moreover, a recent work by Dwork et al. shows how to accurately
answer exponentially many adaptively chosen statistical queries via a
computationally inefficient algorithm; and how to answer a quadratic number of
adaptive queries via a computationally efficient algorithm. The latter result
implies that our result is tight up to a linear factor in
Conceptually, our result demonstrates that achieving statistical validity
alone can be a source of computational intractability in adaptive settings. For
example, in the modern large collaborative research environment, data analysts
typically choose a particular approach based on previous findings. False
discovery occurs if a research finding is supported by the data but not by the
underlying distribution. While the study of preventing false discovery in
Statistics is decades old, to the best of our knowledge our result is the first
to demonstrate a computational barrier. In particular, our result suggests that
the perceived difficulty of preventing false discovery in today's collaborative
research environment may be inherent
Hardness of Non-Interactive Differential Privacy from One-Way Functions
A central challenge in differential privacy is to design computationally efficient non-interactive algorithms that can answer large numbers of statistical queries on a sensitive dataset. That is, we would like to design a differentially private algorithm that takes a dataset consisting of some small number of elements from some large data universe , and efficiently outputs a summary that allows a user to efficiently obtain an answer to any query in some large family .
Ignoring computational constraints, this problem can be solved even when and are exponentially large and is just a small polynomial; however, all algorithms with remotely similar guarantees run in exponential time. There have been several results showing that, under the strong assumption of indistinguishability obfuscation (iO), no efficient differentially private algorithm exists when and can be exponentially large. However, there are no strong separations between information-theoretic and computationally efficient differentially private algorithms under any standard complexity assumption.
In this work we show that, if one-way functions exist, there is no general purpose differentially private algorithm that works when and are exponentially large, and is an arbitrary polynomial. In fact, we show that this result holds even if is just subexponentially large (assuming only polynomially-hard one-way functions). This result solves an open problem posed by Vadhan in his recent survey
Order-Revealing Encryption and the Hardness of Private Learning
An order-revealing encryption scheme gives a public procedure by which two ciphertexts can be compared to reveal the ordering of their underlying plaintexts. We show how to use order-revealing encryption to separate computationally efficient PAC learning from efficient -differentially private PAC learning. That is, we construct a concept class that is efficiently PAC learnable, but for which every efficient learner fails to be differentially private. This answers a question of Kasiviswanathan et al. (FOCS \u2708, SIAM J. Comput. \u2711).
To prove our result, we give a generic transformation from an order-revealing encryption scheme into one with strongly correct comparison, which enables the consistent comparison of ciphertexts that are not obtained as the valid encryption of any message. We believe this construction may be of independent interest
Recommended from our members
Privacy and the Complexity of Simple Queries
As both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a significant influence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efficient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efficient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efficient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefficiency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.Engineering and Applied Science
- …