Search CORE

129 research outputs found

Generalisation under gradient descent via deterministic PAC-Bayes

Author: Clerico Eugenio
Deligiannidis George
Doucet Arnaud
Farghly Tyler
Guedj Benjamin
Publication venue
Publication date: 04/04/2023
Field of study

We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics

arXiv.org e-Print Archive

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

Author: Finzi Marc
Goldblum Micah
Kapoor Sanyam
Lotfi Sanae
Potapczynski Andres
Wilson Andrew Gordon
Publication venue
Publication date: 24/11/2022
Field of study

While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tasks, including transfer learning. We use these tight bounds to better understand the role of model size, equivariance, and the implicit biases of optimization, for generalization in deep learning. Notably, we find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor. We also argue for data-independent bounds in explaining generalization.Comment: NeurIPS 2022. Code is available at https://github.com/activatedgeek/tight-pac-baye

arXiv.org e-Print Archive

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Author: Biggs Felix
Guedj Benjamin
Publication venue: HAL CCSD
Publication date: 14/02/2022
Field of study

25 pages, 12 figuresWe focus on a specific class of shallow neural networks with a single hidden layer, namely those with

L_2

-normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST

INRIA a CCSD electronic archive server