242 research outputs found
Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output
We consider accurately answering smooth queries while preserving differential
privacy. A query is said to be -smooth if it is specified by a function
defined on whose partial derivatives up to order are all
bounded. We develop an -differentially private mechanism for the
class of -smooth queries. The major advantage of the algorithm is that it
outputs a synthetic database. In real applications, a synthetic database output
is appealing. Our mechanism achieves an accuracy of , and runs in polynomial time. We also
generalize the mechanism to preserve -differential privacy
with slightly improved accuracy. Extensive experiments on benchmark datasets
demonstrate that the mechanisms have good accuracy and are efficient
Order-Revealing Encryption and the Hardness of Private Learning
An order-revealing encryption scheme gives a public procedure by which two
ciphertexts can be compared to reveal the ordering of their underlying
plaintexts. We show how to use order-revealing encryption to separate
computationally efficient PAC learning from efficient -differentially private PAC learning. That is, we construct a concept
class that is efficiently PAC learnable, but for which every efficient learner
fails to be differentially private. This answers a question of Kasiviswanathan
et al. (FOCS '08, SIAM J. Comput. '11).
To prove our result, we give a generic transformation from an order-revealing
encryption scheme into one with strongly correct comparison, which enables the
consistent comparison of ciphertexts that are not obtained as the valid
encryption of any message. We believe this construction may be of independent
interest.Comment: 28 page
Private Multiplicative Weights Beyond Linear Queries
A wide variety of fundamental data analyses in machine learning, such as
linear and logistic regression, require minimizing a convex function defined by
the data. Since the data may contain sensitive information about individuals,
and these analyses can leak that sensitive information, it is important to be
able to solve convex minimization in a privacy-preserving way.
A series of recent results show how to accurately solve a single convex
minimization problem in a differentially private manner. However, the same data
is often analyzed repeatedly, and little is known about solving multiple convex
minimization problems with differential privacy. For simpler data analyses,
such as linear queries, there are remarkable differentially private algorithms
such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS
2010) that accurately answer exponentially many distinct queries. In this work,
we extend these results to the case of convex minimization and show how to give
accurate and differentially private solutions to *exponentially many* convex
minimization problems on a sensitive dataset
Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations
Consider a database of people, each represented by a bit-string of length
corresponding to the setting of binary attributes. A -way marginal
query is specified by a subset of attributes, and a -dimensional
binary vector specifying their values. The result for this query is a
count of the number of people in the database whose attribute vector restricted
to agrees with .
Privately releasing approximate answers to a set of -way marginal queries
is one of the most important and well-motivated problems in differential
privacy. Information theoretically, the error complexity of marginal queries is
well-understood: the per-query additive error is known to be at least
and at most
. However, no polynomial
time algorithm with error complexity as low as the information theoretic upper
bound is known for small . In this work we present a polynomial time
algorithm that, for any distribution on marginal queries, achieves average
error at most . This error
bound is as good as the best known information theoretic upper bounds for
. This bound is an improvement over previous work on efficiently releasing
marginals when is small and when error is desirable. Using private
boosting we are also able to give nearly matching worst-case error bounds.
Our algorithms are based on the geometric techniques of Nikolov, Talwar, and
Zhang. The main new ingredients are convex relaxations and careful use of the
Frank-Wolfe algorithm for constrained convex minimization. To design our
relaxations, we rely on the Grothendieck inequality from functional analysis
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C
on a data set up to small error, but we can only access the data itself using
statistical queries. A trivial solution is to exhaustively ask all queries in
C. Can we do any better?
+ We show that the number of statistical queries necessary and sufficient for
this task is---up to polynomial factors---equal to the agnostic learning
complexity of C in Kearns' statistical query (SQ) model. This gives a complete
answer to the question when running time is not a concern.
+ We then show that the problem can be solved efficiently (allowing arbitrary
error on a small fraction of queries) whenever the answers to C can be
described by a submodular function. This includes many natural concept classes,
such as graph cuts and Boolean disjunctions and conjunctions.
While interesting from a learning theoretic point of view, our main
applications are in privacy-preserving data analysis:
Here, our second result leads to the first algorithm that efficiently
releases differentially private answers to of all Boolean conjunctions with 1%
average error. This presents significant progress on a key open problem in
privacy-preserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any
differentially private algorithm that admits a (potentially
non-privacy-preserving) implementation using only statistical queries. Not only
our algorithms, but also most known private algorithms can be implemented using
only statistical queries, and hence are constrained by these lower bounds. Our
result therefore isolates the complexity of agnostic learning in the SQ-model
as a new barrier in the design of differentially private algorithms
Recommended from our members
Privacy and the Complexity of Simple Queries
As both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a significant influence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efficient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efficient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efficient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefficiency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.Engineering and Applied Science
Dual Query: Practical Private Query Release for High Dimensional Data
We present a practical, differentially private algorithm for answering a
large number of queries on high dimensional datasets. Like all algorithms for
this task, ours necessarily has worst-case complexity exponential in the
dimension of the data. However, our algorithm packages the computationally hard
step into a concisely defined integer program, which can be solved
non-privately using standard solvers. We prove accuracy and privacy theorems
for our algorithm, and then demonstrate experimentally that our algorithm
performs well in practice. For example, our algorithm can efficiently and
accurately answer millions of queries on the Netflix dataset, which has over
17,000 attributes; this is an improvement on the state of the art by multiple
orders of magnitude
- …