28 research outputs found
Tight Lower Bounds for Differentially Private Selection
A pervasive task in the differential privacy literature is to select the
items of "highest quality" out of a set of items, where the quality of each
item depends on a sensitive dataset that must be protected. Variants of this
task arise naturally in fundamental problems like feature selection and
hypothesis testing, and also as subroutines for many sophisticated
differentially private algorithms.
The standard approaches to these tasks---repeated use of the exponential
mechanism or the sparse vector technique---approximately solve this problem
given a dataset of samples. We provide a tight lower
bound for some very simple variants of the private selection problem. Our lower
bound shows that a sample of size is required
even to achieve a very minimal accuracy guarantee.
Our results are based on an extension of the fingerprinting method to sparse
selection problems. Previously, the fingerprinting method has been used to
provide tight lower bounds for answering an entire set of queries, but
often only some much smaller set of queries are relevant. Our extension
allows us to prove lower bounds that depend on both the number of relevant
queries and the total number of queries
Optimal sequential fingerprinting: Wald vs. Tardos
We study sequential collusion-resistant fingerprinting, where the
fingerprinting code is generated in advance but accusations may be made between
rounds, and show that in this setting both the dynamic Tardos scheme and
schemes building upon Wald's sequential probability ratio test (SPRT) are
asymptotically optimal. We further compare these two approaches to sequential
fingerprinting, highlighting differences between the two schemes. Based on
these differences, we argue that Wald's scheme should in general be preferred
over the dynamic Tardos scheme, even though both schemes have their merits. As
a side result, we derive an optimal sequential group testing method for the
classical model, which can easily be generalized to different group testing
models.Comment: 12 pages, 10 figure
The Limits of Post-Selection Generalization
While statistics and machine learning offers numerous methods for ensuring
generalization, these methods often fail in the presence of adaptivity---the
common practice in which the choice of analysis depends on previous
interactions with the same dataset. A recent line of work has introduced
powerful, general purpose algorithms that ensure post hoc generalization (also
called robust or post-selection generalization), which says that, given the
output of the algorithm, it is hard to find any statistic for which the data
differs significantly from the population it came from.
In this work we show several limitations on the power of algorithms
satisfying post hoc generalization. First, we show a tight lower bound on the
error of any algorithm that satisfies post hoc generalization and answers
adaptively chosen statistical queries, showing a strong barrier to progress in
post selection data analysis. Second, we show that post hoc generalization is
not closed under composition, despite many examples of such algorithms
exhibiting strong composition properties