129 research outputs found
Generalisation under gradient descent via deterministic PAC-Bayes
We establish disintegrated PAC-Bayesian generalisation bounds for models
trained with gradient descent methods or continuous gradient flows. Contrary to
standard practice in the PAC-Bayesian setting, our result applies to
optimisation algorithms that are deterministic, without requiring any
de-randomisation step. Our bounds are fully computable, depending on the
density of the initial distribution and the Hessian of the training objective
over the trajectory. We show that our framework can be applied to a variety of
iterative optimisation algorithms, including stochastic gradient descent (SGD),
momentum-based schemes, and damped Hamiltonian dynamics
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
While there has been progress in developing non-vacuous generalization bounds
for deep neural networks, these bounds tend to be uninformative about why deep
learning works. In this paper, we develop a compression approach based on
quantizing neural network parameters in a linear subspace, profoundly improving
on previous results to provide state-of-the-art generalization bounds on a
variety of tasks, including transfer learning. We use these tight bounds to
better understand the role of model size, equivariance, and the implicit biases
of optimization, for generalization in deep learning. Notably, we find large
models can be compressed to a much greater extent than previously known,
encapsulating Occam's razor. We also argue for data-independent bounds in
explaining generalization.Comment: NeurIPS 2022. Code is available at
https://github.com/activatedgeek/tight-pac-baye
Non-Vacuous Generalisation Bounds for Shallow Neural Networks
25 pages, 12 figuresWe focus on a specific class of shallow neural networks with a single hidden layer, namely those with -normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST
- …