18 research outputs found
PAC-Bayes bounds for stable algorithms with instance-dependent priors
PAC-Bayes bounds have been proposed to get risk estimates based on a training
sample. In this paper the PAC-Bayes approach is combined with stability of the
hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting
is used with a Gaussian prior centered at the expected output. Thus a novelty
of our paper is using priors defined in terms of the data-generating
distribution. Our main result estimates the risk of the randomized algorithm in
terms of the hypothesis stability coefficients. We also provide a new bound for
the SVM classifier, which is compared to other known bounds experimentally.
Ours appears to be the first stability-based bound that evaluates to
non-trivial values.Comment: 16 pages, discussion of theory and experiments in the main body,
detailed proofs and experimental details in the appendice
PAC-Bayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a finite
set of training examples and the output of the learning process is a
data-dependent distribution over a space of hypotheses. The learned
data-dependent distribution is then used to make randomized predictions, and
the high-level theme addressed here is guaranteeing the quality of predictions
on examples that were not seen during training, i.e. generalization. In this
setting the unknown quantity of interest is the expected risk of the
data-dependent randomized predictor, for which upper bounds can be derived via
a PAC-Bayes analysis, leading to PAC-Bayes bounds.
Specifically, we present a basic PAC-Bayes inequality for stochastic kernels,
from which one may derive extensions of various known PAC-Bayes bounds as well
as novel bounds. We clarify the role of the requirements of fixed 'data-free'
priors, bounded losses, and i.i.d. data. We highlight that those requirements
were used to upper-bound an exponential moment term, while the basic PAC-Bayes
theorem remains valid without those restrictions. We present three bounds that
illustrate the use of data-dependent priors, including one for the unbounded
square loss.Comment: In NeurIPS 2020. Version 3 is the final published paper. Note that
this paper is an enhanced version of the short paper with the same title that
was presented at the NeurIPS 2019 Workshop on Machine Learning with
Guarantees. Important update: the PAC-Bayes type inequality for unbounded
loss functions (Section 2.3) is ne
PAC-Bayes Unexpected Bernstein Inequality
We present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless Ln, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace Ln by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough n). Theoretically, unlike existing bounds, our new bound can be expected to converge to 0 faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with X2 taken outside its expectation