10 research outputs found
The Limits of Post-Selection Generalization
While statistics and machine learning offers numerous methods for ensuring
generalization, these methods often fail in the presence of adaptivity---the
common practice in which the choice of analysis depends on previous
interactions with the same dataset. A recent line of work has introduced
powerful, general purpose algorithms that ensure post hoc generalization (also
called robust or post-selection generalization), which says that, given the
output of the algorithm, it is hard to find any statistic for which the data
differs significantly from the population it came from.
In this work we show several limitations on the power of algorithms
satisfying post hoc generalization. First, we show a tight lower bound on the
error of any algorithm that satisfies post hoc generalization and answers
adaptively chosen statistical queries, showing a strong barrier to progress in
post selection data analysis. Second, we show that post hoc generalization is
not closed under composition, despite many examples of such algorithms
exhibiting strong composition properties
The limits of post-selection generalization
While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of *post selection*---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure a property called *post hoc generalization* (Cummings et al., COLT'16), which says that no person when given the output of the algorithm should be able to find any statistic for which the data differs significantly from the population it came from. In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.Published versio
Verification of Neural Networks Local Differential Classification Privacy
Neural networks are susceptible to privacy attacks. To date, no verifier can
reason about the privacy of individuals participating in the training set. We
propose a new privacy property, called local differential classification
privacy (LDCP), extending local robustness to a differential privacy setting
suitable for black-box classifiers. Given a neighborhood of inputs, a
classifier is LDCP if it classifies all inputs the same regardless of whether
it is trained with the full dataset or whether any single entry is omitted. A
naive algorithm is highly impractical because it involves training a very large
number of networks and verifying local robustness of the given neighborhood
separately for every network. We propose Sphynx, an algorithm that computes an
abstraction of all networks, with a high probability, from a small set of
networks, and verifies LDCP directly on the abstract network. The challenge is
twofold: network parameters do not adhere to a known distribution probability,
making it difficult to predict an abstraction, and predicting too large
abstraction harms the verification. Our key idea is to transform the parameters
into a distribution given by KDE, allowing to keep the over-approximation error
small. To verify LDCP, we extend a MILP verifier to analyze an abstract
network. Experimental results show that by training only 7% of the networks,
Sphynx predicts an abstract network obtaining 93% verification accuracy and
reducing the analysis time by x
A necessary and sufficient stability notion for adaptive generalization
We introduce a new notion of the stability of computations, which holds under post-processing and adaptive composition, and show that the notion is both necessary and sufficient to ensure generalization in the face of adaptivity, for any computations that respond to bounded-sensitivity linear queries while providing accuracy with respect to the data sample set. The stability notion is based on quantifying the effect of observing a computation's outputs on the posterior over the data sample elements. We show a separation between this stability notion and previously studied notions