30 research outputs found
An Improved Private Mechanism for Small Databases
We study the problem of answering a workload of linear queries ,
on a database of size at most drawn from a universe
under the constraint of (approximate) differential privacy.
Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for
any given and , answers the queries with average error that is
at most a factor polynomial in and
worse than the best possible. Here we improve on this guarantee and give a
mechanism whose competitiveness ratio is at most polynomial in and
, and has no dependence on . Our mechanism
is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in
place of an ad-hoc noise distribution, we use a distribution which is in a
sense optimal for the projection mechanism, and analyze it using convex duality
and the restricted invertibility principle.Comment: To appear in ICALP 2015, Track
Private Multiplicative Weights Beyond Linear Queries
A wide variety of fundamental data analyses in machine learning, such as
linear and logistic regression, require minimizing a convex function defined by
the data. Since the data may contain sensitive information about individuals,
and these analyses can leak that sensitive information, it is important to be
able to solve convex minimization in a privacy-preserving way.
A series of recent results show how to accurately solve a single convex
minimization problem in a differentially private manner. However, the same data
is often analyzed repeatedly, and little is known about solving multiple convex
minimization problems with differential privacy. For simpler data analyses,
such as linear queries, there are remarkable differentially private algorithms
such as the private multiplicative weights mechanism (Hardt and Rothblum, FOCS
2010) that accurately answer exponentially many distinct queries. In this work,
we extend these results to the case of convex minimization and show how to give
accurate and differentially private solutions to *exponentially many* convex
minimization problems on a sensitive dataset
Optimal sequential fingerprinting: Wald vs. Tardos
We study sequential collusion-resistant fingerprinting, where the
fingerprinting code is generated in advance but accusations may be made between
rounds, and show that in this setting both the dynamic Tardos scheme and
schemes building upon Wald's sequential probability ratio test (SPRT) are
asymptotically optimal. We further compare these two approaches to sequential
fingerprinting, highlighting differences between the two schemes. Based on
these differences, we argue that Wald's scheme should in general be preferred
over the dynamic Tardos scheme, even though both schemes have their merits. As
a side result, we derive an optimal sequential group testing method for the
classical model, which can easily be generalized to different group testing
models.Comment: 12 pages, 10 figure
Exploiting Metric Structure for Efficient Private Query Release
We consider the problem of privately answering queries defined on databases
which are collections of points belonging to some metric space. We give simple,
computationally efficient algorithms for answering distance queries defined
over an arbitrary metric. Distance queries are specified by points in the
metric space, and ask for the average distance from the query point to the
points contained in the database, according to the specified metric. Our
algorithms run efficiently in the database size and the dimension of the space,
and operate in both the online query release setting, and the offline setting
in which they must in polynomial time generate a fixed data structure which can
answer all queries of interest. This represents one of the first subclasses of
linear queries for which efficient algorithms are known for the private query
release problem, circumventing known hardness results for generic linear
queries
Hard Communication Channels for Steganography
This paper considers steganography - the concept of hiding the presence of secret messages in legal communications - in the computational setting and its relation to cryptography. Very recently the first (non-polynomial time) steganographic protocol has been shown which, for any communication channel, is provably secure, reliable, and has nearly optimal bandwidth. The security is unconditional, i.e. it does not rely on any unproven complexity-theoretic assumption. This disproves the claim that the existence of one-way functions and access to a communication channel oracle are both necessary and sufficient conditions for the existence of secure steganography in the sense that secure and reliable steganography exists independently of the existence of one-way functions. In this paper, we prove that this equivalence also does not hold in the more realistic setting, where the stegosystem is polynomial time bounded. We prove this by constructing (a) a channel for which secure steganography exists if and only if one-way functions exist and (b) another channel such that secure steganography implies that no one-way functions exist. We therefore show that security-preserving reductions between cryptography and steganography need to be treated very carefully
Preventing False Discovery in Interactive Data Analysis is Hard
We show that, under a standard hardness assumption, there is no
computationally efficient algorithm that given samples from an unknown
distribution can give valid answers to adaptively chosen
statistical queries. A statistical query asks for the expectation of a
predicate over the underlying distribution, and an answer to a statistical
query is valid if it is "close" to the correct expectation over the
distribution.
Our result stands in stark contrast to the well known fact that exponentially
many statistical queries can be answered validly and efficiently if the queries
are chosen non-adaptively (no query may depend on the answers to previous
queries). Moreover, a recent work by Dwork et al. shows how to accurately
answer exponentially many adaptively chosen statistical queries via a
computationally inefficient algorithm; and how to answer a quadratic number of
adaptive queries via a computationally efficient algorithm. The latter result
implies that our result is tight up to a linear factor in
Conceptually, our result demonstrates that achieving statistical validity
alone can be a source of computational intractability in adaptive settings. For
example, in the modern large collaborative research environment, data analysts
typically choose a particular approach based on previous findings. False
discovery occurs if a research finding is supported by the data but not by the
underlying distribution. While the study of preventing false discovery in
Statistics is decades old, to the best of our knowledge our result is the first
to demonstrate a computational barrier. In particular, our result suggests that
the perceived difficulty of preventing false discovery in today's collaborative
research environment may be inherent