172 research outputs found
Wasserstein PAC-Bayes Learning: A Bridge Between Generalisation and Optimisation
PAC-Bayes learning is an established framework to assess the generalisation
ability of learning algorithm during the training phase. However, it remains
challenging to know whether PAC-Bayes is useful to understand, before training,
why the output of well-known algorithms generalise well. We positively answer
this question by expanding the \emph{Wasserstein PAC-Bayes} framework, briefly
introduced in \cite{amit2022ipm}. We provide new generalisation bounds
exploiting geometric assumptions on the loss function. Using our framework, we
prove, before any training, that the output of an algorithm from
\citet{lambert2022variational} has a strong asymptotic generalisation ability.
More precisely, we show that it is possible to incorporate optimisation results
within a generalisation framework, building a bridge between PAC-Bayes and
optimisation algorithms
PAC-Bayes analysis beyond the usual bounds
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed ‘data-free’ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss
PAC-Bayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a finite
set of training examples and the output of the learning process is a
data-dependent distribution over a space of hypotheses. The learned
data-dependent distribution is then used to make randomized predictions, and
the high-level theme addressed here is guaranteeing the quality of predictions
on examples that were not seen during training, i.e. generalization. In this
setting the unknown quantity of interest is the expected risk of the
data-dependent randomized predictor, for which upper bounds can be derived via
a PAC-Bayes analysis, leading to PAC-Bayes bounds.
Specifically, we present a basic PAC-Bayes inequality for stochastic kernels,
from which one may derive extensions of various known PAC-Bayes bounds as well
as novel bounds. We clarify the role of the requirements of fixed 'data-free'
priors, bounded losses, and i.i.d. data. We highlight that those requirements
were used to upper-bound an exponential moment term, while the basic PAC-Bayes
theorem remains valid without those restrictions. We present three bounds that
illustrate the use of data-dependent priors, including one for the unbounded
square loss.Comment: In NeurIPS 2020. Version 3 is the final published paper. Note that
this paper is an enhanced version of the short paper with the same title that
was presented at the NeurIPS 2019 Workshop on Machine Learning with
Guarantees. Important update: the PAC-Bayes type inequality for unbounded
loss functions (Section 2.3) is ne
Randomized learning and generalization of fair and private classifiers: From PAC-Bayes to stability and differential privacy
We address the problem of randomized learning and generalization of fair and private classifiers. From one side we want to ensure that sensitive information does not unfairly influence the outcome of a classifier. From the other side we have to learn from data while preserving the privacy of individual observations. We initially face this issue in the PAC-Bayes framework presenting an approach which trades off and bounds the risk and the fairness of the randomized (Gibbs) classifier. Our new approach is able to handle several different state-of-the-art fairness measures. For this purpose, we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution without actually knowing it. In particular, we define a prior and a posterior which give more weight to functions with good generalization and fairness properties. Furthermore, we will show that this randomized classifier possesses interesting stability properties using the algorithmic distribution stability theory. Finally, we will show that the new posterior can be exploited to define a randomized accurate and fair algorithm. Differential privacy theory will allow us to derive that the latter algorithm has interesting privacy preserving properties ensuring our threefold goal of good generalization, fairness, and privacy of the final model
Auto-tune: PAC-Bayes Optimization over Prior and Posterior for Neural Networks
It is widely recognized that the generalization ability of neural networks
can be greatly enhanced through carefully designing the training procedure. The
current state-of-the-art training approach involves utilizing stochastic
gradient descent (SGD) or Adam optimization algorithms along with a combination
of additional regularization techniques such as weight decay, dropout, or noise
injection. Optimal generalization can only be achieved by tuning a multitude of
hyperparameters through grid search, which can be time-consuming and
necessitates additional validation datasets. To address this issue, we
introduce a practical PAC-Bayes training framework that is nearly tuning-free
and requires no additional regularization while achieving comparable testing
performance to that of SGD/Adam after a complete grid search and with extra
regularizations. Our proposed algorithm demonstrates the remarkable potential
of PAC training to achieve state-of-the-art performance on deep neural networks
with enhanced robustness and interpretability.Comment: 30 pages, 15 figures, 7 table
Learning prediction function of prior measures for statistical inverse problems of partial differential equations
In this paper, we view the statistical inverse problems of partial
differential equations (PDEs) as PDE-constrained regression and focus on
learning the prediction function of the prior probability measures. From this
perspective, we propose general generalization bounds for learning
infinite-dimensionally defined prior measures in the style of the probability
approximately correct Bayesian learning theory. The theoretical framework is
rigorously defined on infinite-dimensional separable function space, which
makes the theories intimately connected to the usual infinite-dimensional
Bayesian inverse approach. Inspired by the concept of -differential
privacy, a generalized condition (containing the usual Gaussian measures
employed widely in the statistical inverse problems of PDEs) has been proposed,
which allows the learned prior measures to depend on the measured data (the
prediction function with measured data as input and the prior measure as output
can be introduced). After illustrating the general theories, the specific
settings of linear and nonlinear problems have been given and can be easily
casted into our general theories to obtain concrete generalization bounds.
Based on the obtained generalization bounds, infinite-dimensionally
well-defined practical algorithms are formulated. Finally, numerical examples
of the backward diffusion and Darcy flow problems are provided to demonstrate
the potential applications of the proposed approach in learning the prediction
function of the prior probability measures.Comment: 57 page
Tighter risk certificates for neural networks
This paper presents an empirical study regarding training probabilistic
neural networks using training objectives derived from PAC-Bayes bounds. In the
context of probabilistic neural networks, the output of training is a
probability distribution over network weights. We present two training
objectives, used here for the first time in connection with training neural
networks. These two training objectives are derived from tight PAC-Bayes
bounds. We also re-implement a previously used training objective based on a
classical PAC-Bayes bound, to compare the properties of the predictors learned
using the different training objectives. We compute risk certificates that are
valid on any unseen examples for the learnt predictors. We further experiment
with different types of priors on the weights (both data-free and
data-dependent priors) and neural network architectures. Our experiments on
MNIST and CIFAR-10 show that our training methods produce competitive test set
errors and non-vacuous risk bounds with much tighter values than previous
results in the literature, showing promise not only to guide the learning
algorithm through bounding the risk but also for model selection. These
observations suggest that the methods studied here might be good candidates for
self-certified learning, in the sense of certifying the risk on any unseen data
without the need for data-splitting protocols.Comment: Preprint under revie
- …