27 research outputs found
The Limits of Post-Selection Generalization
While statistics and machine learning offers numerous methods for ensuring
generalization, these methods often fail in the presence of adaptivity---the
common practice in which the choice of analysis depends on previous
interactions with the same dataset. A recent line of work has introduced
powerful, general purpose algorithms that ensure post hoc generalization (also
called robust or post-selection generalization), which says that, given the
output of the algorithm, it is hard to find any statistic for which the data
differs significantly from the population it came from.
In this work we show several limitations on the power of algorithms
satisfying post hoc generalization. First, we show a tight lower bound on the
error of any algorithm that satisfies post hoc generalization and answers
adaptively chosen statistical queries, showing a strong barrier to progress in
post selection data analysis. Second, we show that post hoc generalization is
not closed under composition, despite many examples of such algorithms
exhibiting strong composition properties
Differential Privacy for Sequential Algorithms
We study the differential privacy of sequential statistical inference and
learning algorithms that are characterized by random termination time. Using
the two examples: sequential probability ratio test and sequential empirical
risk minimization, we show that the number of steps such algorithms execute
before termination can jeopardize the differential privacy of the input data in
a similar fashion as their outputs, and it is impossible to use the usual
Laplace mechanism to achieve standard differentially private in these examples.
To remedy this, we propose a notion of weak differential privacy and
demonstrate its equivalence to the standard case for large i.i.d. samples. We
show that using the Laplace mechanism, weak differential privacy can be
achieved for both the sequential probability ratio test and the sequential
empirical risk minimization with proper performance guarantees. Finally, we
provide preliminary experimental results on the Breast Cancer Wisconsin
(Diagnostic) and Landsat Satellite Data Sets from the UCI repository