Search CORE

14 research outputs found

PAC-Bayes Un-Expected Bernstein Inequality

Author: Grunwald PD
Guedj B
Mhammedi Z
Publication venue: NIPS
Publication date: 01/01/2019
Field of study

We present a new PAC-Bayesian generalization bound. Standard bounds contain a

\sqrt{L_n \cdot \operatorname{KL}/n}

complexity term which dominates unless

L_n

, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace

L_n

by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough

n

). Theoretically, unlike existing bounds, our new bound can be expected to converge to

0

faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and excess risk bounds --- for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with

X^2

taken outside its expectation

UCL Discovery

PAC-Bayes Un-Expected Bernstein Inequality

Author: Grünwald Peter
Guedj Benjamin
Mhammedi Zakaria
Publication venue: HAL CCSD
Publication date: 09/12/2019
Field of study

International audienceWe present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless

L_n

, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace

L_n

n

). Theoretically, unlike existing bounds, our new bound can be expected to converge to

0

faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with

X^2

taken outside its expectation

INRIA a CCSD electronic archive server

PAC-Bayesian Bound for the Conditional Value at Risk

Author: Guedj Benjamin
Mhammedi Zakaria
Williamson Robert C.
Publication venue
Publication date: 25/06/2020
Field of study

Conditional Value at Risk (CVaR) is a family of "coherent risk measures" which generalize the traditional mathematical expectation. Widely used in mathematical finance, it is garnering increasing interest in machine learning, e.g., as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the CVaR of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical CVaR is small. We achieve this by reducing the problem of estimating CVaR to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for CVaR even when the random variable in question is unbounded

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

UCL Discovery

HAL Descartes

PAC-Bayes Analysis Beyond the Usual Bounds

Author: Kuzborskij Ilja
Rivasplata Omar
Shawe-Taylor John
Szepesvari Csaba
Publication venue
Publication date: 06/12/2020
Field of study

We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed 'data-free' priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss.Comment: In NeurIPS 2020. Version 3 is the final published paper. Note that this paper is an enhanced version of the short paper with the same title that was presented at the NeurIPS 2019 Workshop on Machine Learning with Guarantees. Important update: the PAC-Bayes type inequality for unbounded loss functions (Section 2.3) is ne

arXiv.org e-Print Archive

UCL Discovery

PAC-Bayes Generalisation Bounds for Heavy-Tailed Losses through Supermartingales

Author: Guedj Benjamin
Haddouche Maxime
Publication venue
Publication date: 24/04/2023
Field of study

While PAC-Bayes is now an established learning framework for light-tailed losses (\emph{e.g.}, subgaussian or subexponential), its extension to the case of heavy-tailed losses remains largely uncharted and has attracted a growing interest in recent years. We contribute PAC-Bayes generalisation bounds for heavy-tailed losses under the sole assumption of bounded variance of the loss function. Under that assumption, we extend previous results from \citet{kuzborskij2019efron}. Our key technical contribution is exploiting an extention of Markov's inequality for supermartingales. Our proof technique unifies and extends different PAC-Bayesian frameworks by providing bounds for unbounded martingales as well as bounds for batch and online learning with heavy-tailed losses.Comment: New Section 3 on Online PAC-Baye

arXiv.org e-Print Archive

PAC-Bayes Unexpected Bernstein Inequality

Author: Grünwald P.D. (Peter)
Guedj B. (Benjamin)
Mhammedi Z. (Zakaria)
Publication venue
Publication date: 01/12/2019
Field of study

We present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless Ln, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace Ln by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough n). Theoretically, unlike existing bounds, our new bound can be expected to converge to 0 faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with X2 taken outside its expectation

CWI's Institutional Repository

New Family of Generalization Bounds Using Samplewise Evaluated CMI

Author: Durisi Giuseppe
Hellstr\uf6m Fredrik
Publication venue
Publication date: 01/01/2022
Field of study

We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger\u27s PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds. Finally, we derive high-probability versions of some of these average bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using them to recover average and high-probability generalization bounds for multiclass classification with finite Natarajan dimension

arXiv.org e-Print Archive

Chalmers Research

A New Family of Generalization Bounds Using Samplewise Evaluated CMI

Author: Durisi Giuseppe
Hellström Fredrik
Publication venue
Publication date: 27/03/2023
Field of study

We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger's PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds. Finally, we derive high-probability versions of some of these average bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using them to recover average and high-probability generalization bounds for multiclass classification with finite Natarajan dimension.Comment: NeurIPS 202

arXiv.org e-Print Archive

PAC-Bayes analysis beyond the usual bounds

Author: Kuzborskij I
Rivasplata O
Shawe-Taylor J
Szepesvári C
Publication venue: Neural Information Processing Systems (NeurIPS)
Publication date: 06/12/2020
Field of study

We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed ‘data-free’ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss

UCL Discovery