8 research outputs found
Five Proofs of Chernoff's Bound with Applications
We discuss five ways of proving Chernoff's bound and show how they lead to
different extensions of the basic bound.Comment: 16 pages, no figures. This revision slightly updates the presentation
of the IK-metho
Generalization Bounds for Uniformly Stable Algorithms
Uniform stability of a learning algorithm is a classical notion of
algorithmic stability introduced to derive high-probability bounds on the
generalization error (Bousquet and Elisseeff, 2002). Specifically, for a loss
function with range bounded in , the generalization error of a
-uniformly stable learning algorithm on samples is known to be
within of the empirical error with
probability at least . Unfortunately, this bound does not lead to
meaningful generalization bounds in many common settings where . At the same time the bound is known to be tight only when .
We substantially improve generalization bounds for uniformly stable
algorithms without making any additional assumptions. First, we show that the
bound in this setting is with
probability at least . In addition, we prove a tight bound of
on the second moment of the estimation error. The best
previous bound on the second moment is . Our proofs are based
on new analysis techniques and our results imply substantially stronger
generalization guarantees for several well-studied algorithms.Comment: Appeared in Neural Information Processing Systems (NeurIPS), 201
Concentration Bounds for High Sensitivity Functions Through Differential Privacy
A new line of work [Dwork et al. STOC 2015], [Hardt and Ullman FOCS 2014],
[Steinke and Ullman COLT 2015], [Bassily et al. STOC 2016] demonstrates how
differential privacy [Dwork et al. TCC 2006] can be used as a mathematical tool
for guaranteeing generalization in adaptive data analysis. Specifically, if a
differentially private analysis is applied on a sample S of i.i.d. examples to
select a low-sensitivity function f, then w.h.p. f(S) is close to its
expectation, although f is being chosen based on the data.
Very recently, Steinke and Ullman observed that these generalization
guarantees can be used for proving concentration bounds in the non-adaptive
setting, where the low-sensitivity function is fixed beforehand. In particular,
they obtain alternative proofs for classical concentration bounds for
low-sensitivity functions, such as the Chernoff bound and McDiarmid's
Inequality.
In this work, we set out to examine the situation for functions with
high-sensitivity, for which differential privacy does not imply generalization
guarantees under adaptive analysis. We show that differential privacy can be
used to prove concentration bounds for such functions in the non-adaptive
setting
A New Analysis of Differential Privacy's Generalization Guarantees
We give a new proof of the "transfer theorem" underlying adaptive data
analysis: that any mechanism for answering adaptively chosen statistical
queries that is differentially private and sample-accurate is also accurate
out-of-sample. Our new proof is elementary and gives structural insights that
we expect will be useful elsewhere. We show: 1) that differential privacy
ensures that the expectation of any query on the posterior distribution on
datasets induced by the transcript of the interaction is close to its true
value on the data distribution, and 2) sample accuracy on its own ensures that
any query answer produced by the mechanism is close to its posterior
expectation with high probability. This second claim follows from a thought
experiment in which we imagine that the dataset is resampled from the posterior
distribution after the mechanism has committed to its answers. The transfer
theorem then follows by summing these two bounds, and in particular, avoids the
"monitor argument" used to derive high probability bounds in prior work. An
upshot of our new proof technique is that the concrete bounds we obtain are
substantially better than the best previously known bounds, even though the
improvements are in the constants, rather than the asymptotics (which are known
to be tight). As we show, our new bounds outperform the naive
"sample-splitting" baseline at dramatically smaller dataset sizes compared to
the previous state of the art, bringing techniques from this literature closer
to practicality
Hypothesis Set Stability and Generalization
We present a study of generalization for data-dependent hypothesis sets. We
give a general learning guarantee for data-dependent hypothesis sets based on a
notion of transductive Rademacher complexity. Our main result is a
generalization bound for data-dependent hypothesis sets expressed in terms of a
notion of hypothesis set stability and a notion of Rademacher complexity for
data-dependent hypothesis sets that we introduce. This bound admits as special
cases both standard Rademacher complexity bounds and algorithm-dependent
uniform stability bounds. We also illustrate the use of these learning bounds
in the analysis of several scenarios.Comment: Published in NeurIPS 2019. This version is equivalent to the
camera-ready version but also includes the supplementary materia
Generalization for Adaptively-chosen Estimators via Stable Median
Datasets are often reused to perform multiple statistical analyses in an
adaptive way, in which each analysis may depend on the outcomes of previous
analyses on the same dataset. Standard statistical guarantees do not account
for these dependencies and little is known about how to provably avoid
overfitting and false discovery in the adaptive setting. We consider a natural
formalization of this problem in which the goal is to design an algorithm that,
given a limited number of i.i.d.~samples from an unknown distribution, can
answer adaptively-chosen queries about that distribution.
We present an algorithm that estimates the expectations of arbitrary
adaptively-chosen real-valued estimators using a number of samples that scales
as . The answers given by our algorithm are essentially as accurate
as if fresh samples were used to evaluate each estimator. In contrast, prior
work yields error guarantees that scale with the worst-case sensitivity of each
estimator. We also give a version of our algorithm that can be used to verify
answers to such queries where the sample complexity depends logarithmically on
the number of queries (as in the reusable holdout technique).
Our algorithm is based on a simple approximate median algorithm that
satisfies the strong stability guarantees of differential privacy. Our
techniques provide a new approach for analyzing the generalization guarantees
of differentially private algorithms.Comment: To appear in Conference on Learning Theory (COLT) 201
PAC learning with stable and private predictions
We study binary classification algorithms for which the prediction on any
point is not too sensitive to individual examples in the dataset. Specifically,
we consider the notions of uniform stability (Bousquet and Elisseeff, 2001) and
prediction privacy (Dwork and Feldman, 2018). Previous work on these notions
shows how they can be achieved in the standard PAC model via simple aggregation
of models trained on disjoint subsets of data. Unfortunately, this approach
leads to a significant overhead in terms of sample complexity. Here we
demonstrate several general approaches to stable and private prediction that
either eliminate or significantly reduce the overhead. Specifically, we
demonstrate that for any class of VC dimension there exists a
-uniformly stable algorithm for learning with excess error
using samples. We also show that this
bound is nearly tight. For -differentially private prediction we give
two new algorithms: one using samples and
another one using samples. The
best previously known bounds for these problems are and
, respectively
High probability generalization bounds for uniformly stable algorithms with nearly optimal rate
Algorithmic stability is a classical approach to understanding and analysis
of the generalization error of learning algorithms. A notable weakness of most
stability-based generalization bounds is that they hold only in expectation.
Generalization with high probability has been established in a landmark paper
of Bousquet and Elisseeff (2002) albeit at the expense of an additional
factor in the bound. Specifically, their bound on the estimation
error of any -uniformly stable learning algorithm on samples and
range in is with probability . The
overhead makes the bound vacuous in the common settings where . A stronger bound was recently proved by the authors (Feldman and
Vondrak, 2018) that reduces the overhead to at most . Still, both
of these results give optimal generalization bounds only when .
We prove a nearly tight bound of on the estimation error of any -uniformly
stable algorithm. It implies that for algorithms that are uniformly stable with
, estimation error is essentially the same as the
sampling error. Our result leads to the first high-probability generalization
bounds for multi-pass stochastic gradient descent and regularized ERM for
stochastic convex problems with nearly optimal rate --- resolving open problems
in prior work. Our proof technique is new and we introduce several analysis
tools that might find additional applications.Comment: this is a follow-up to and has minor text overlap with
arXiv:1812.09859; v2: minor revision following acceptance for presentation at
COLT 201