14 research outputs found
PAC-Bayes Un-Expected Bernstein Inequality
We present a new PAC-Bayesian generalization bound. Standard bounds contain a complexity term which dominates unless , the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough ). Theoretically, unlike existing bounds, our new bound can be expected to converge to faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and excess risk bounds --- for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with taken outside its expectation
PAC-Bayes Un-Expected Bernstein Inequality
International audienceWe present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless , the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough ). Theoretically, unlike existing bounds, our new bound can be expected to converge to faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with taken outside its expectation
PAC-Bayesian Bound for the Conditional Value at Risk
Conditional Value at Risk (CVaR) is a family of "coherent risk measures"
which generalize the traditional mathematical expectation. Widely used in
mathematical finance, it is garnering increasing interest in machine learning,
e.g., as an alternate approach to regularization, and as a means for ensuring
fairness. This paper presents a generalization bound for learning algorithms
that minimize the CVaR of the empirical loss. The bound is of PAC-Bayesian type
and is guaranteed to be small when the empirical CVaR is small. We achieve this
by reducing the problem of estimating CVaR to that of merely estimating an
expectation. This then enables us, as a by-product, to obtain concentration
inequalities for CVaR even when the random variable in question is unbounded
PAC-Bayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a finite
set of training examples and the output of the learning process is a
data-dependent distribution over a space of hypotheses. The learned
data-dependent distribution is then used to make randomized predictions, and
the high-level theme addressed here is guaranteeing the quality of predictions
on examples that were not seen during training, i.e. generalization. In this
setting the unknown quantity of interest is the expected risk of the
data-dependent randomized predictor, for which upper bounds can be derived via
a PAC-Bayes analysis, leading to PAC-Bayes bounds.
Specifically, we present a basic PAC-Bayes inequality for stochastic kernels,
from which one may derive extensions of various known PAC-Bayes bounds as well
as novel bounds. We clarify the role of the requirements of fixed 'data-free'
priors, bounded losses, and i.i.d. data. We highlight that those requirements
were used to upper-bound an exponential moment term, while the basic PAC-Bayes
theorem remains valid without those restrictions. We present three bounds that
illustrate the use of data-dependent priors, including one for the unbounded
square loss.Comment: In NeurIPS 2020. Version 3 is the final published paper. Note that
this paper is an enhanced version of the short paper with the same title that
was presented at the NeurIPS 2019 Workshop on Machine Learning with
Guarantees. Important update: the PAC-Bayes type inequality for unbounded
loss functions (Section 2.3) is ne
PAC-Bayes Generalisation Bounds for Heavy-Tailed Losses through Supermartingales
While PAC-Bayes is now an established learning framework for light-tailed
losses (\emph{e.g.}, subgaussian or subexponential), its extension to the case
of heavy-tailed losses remains largely uncharted and has attracted a growing
interest in recent years. We contribute PAC-Bayes generalisation bounds for
heavy-tailed losses under the sole assumption of bounded variance of the loss
function. Under that assumption, we extend previous results from
\citet{kuzborskij2019efron}. Our key technical contribution is exploiting an
extention of Markov's inequality for supermartingales. Our proof technique
unifies and extends different PAC-Bayesian frameworks by providing bounds for
unbounded martingales as well as bounds for batch and online learning with
heavy-tailed losses.Comment: New Section 3 on Online PAC-Baye
PAC-Bayes Unexpected Bernstein Inequality
We present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless Ln, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace Ln by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough n). Theoretically, unlike existing bounds, our new bound can be expected to converge to 0 faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with X2 taken outside its expectation
New Family of Generalization Bounds Using Samplewise Evaluated CMI
We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger\u27s PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds. Finally, we derive high-probability versions of some of these average bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using them to recover average and high-probability generalization bounds for multiclass classification with finite Natarajan dimension
A New Family of Generalization Bounds Using Samplewise Evaluated CMI
We present a new family of information-theoretic generalization bounds, in
which the training loss and the population loss are compared through a jointly
convex function. This function is upper-bounded in terms of the disintegrated,
samplewise, evaluated conditional mutual information (CMI), an information
measure that depends on the losses incurred by the selected hypothesis, rather
than on the hypothesis itself, as is common in probably approximately correct
(PAC)-Bayesian results. We demonstrate the generality of this framework by
recovering and extending previously known information-theoretic bounds.
Furthermore, using the evaluated CMI, we derive a samplewise, average version
of Seeger's PAC-Bayesian bound, where the convex function is the binary KL
divergence. In some scenarios, this novel bound results in a tighter
characterization of the population loss of deep neural networks than previous
bounds. Finally, we derive high-probability versions of some of these average
bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using
them to recover average and high-probability generalization bounds for
multiclass classification with finite Natarajan dimension.Comment: NeurIPS 202
PAC-Bayes analysis beyond the usual bounds
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed âdata-freeâ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss