172 research outputs found

    Wasserstein PAC-Bayes Learning: A Bridge Between Generalisation and Optimisation

    Full text link
    PAC-Bayes learning is an established framework to assess the generalisation ability of learning algorithm during the training phase. However, it remains challenging to know whether PAC-Bayes is useful to understand, before training, why the output of well-known algorithms generalise well. We positively answer this question by expanding the \emph{Wasserstein PAC-Bayes} framework, briefly introduced in \cite{amit2022ipm}. We provide new generalisation bounds exploiting geometric assumptions on the loss function. Using our framework, we prove, before any training, that the output of an algorithm from \citet{lambert2022variational} has a strong asymptotic generalisation ability. More precisely, we show that it is possible to incorporate optimisation results within a generalisation framework, building a bridge between PAC-Bayes and optimisation algorithms

    PAC-Bayes analysis beyond the usual bounds

    Get PDF
    We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed ‘data-free’ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss

    PAC-Bayes Analysis Beyond the Usual Bounds

    Get PDF
    We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed 'data-free' priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss.Comment: In NeurIPS 2020. Version 3 is the final published paper. Note that this paper is an enhanced version of the short paper with the same title that was presented at the NeurIPS 2019 Workshop on Machine Learning with Guarantees. Important update: the PAC-Bayes type inequality for unbounded loss functions (Section 2.3) is ne

    Randomized learning and generalization of fair and private classifiers: From PAC-Bayes to stability and differential privacy

    Get PDF
    We address the problem of randomized learning and generalization of fair and private classifiers. From one side we want to ensure that sensitive information does not unfairly influence the outcome of a classifier. From the other side we have to learn from data while preserving the privacy of individual observations. We initially face this issue in the PAC-Bayes framework presenting an approach which trades off and bounds the risk and the fairness of the randomized (Gibbs) classifier. Our new approach is able to handle several different state-of-the-art fairness measures. For this purpose, we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution without actually knowing it. In particular, we define a prior and a posterior which give more weight to functions with good generalization and fairness properties. Furthermore, we will show that this randomized classifier possesses interesting stability properties using the algorithmic distribution stability theory. Finally, we will show that the new posterior can be exploited to define a randomized accurate and fair algorithm. Differential privacy theory will allow us to derive that the latter algorithm has interesting privacy preserving properties ensuring our threefold goal of good generalization, fairness, and privacy of the final model

    Auto-tune: PAC-Bayes Optimization over Prior and Posterior for Neural Networks

    Full text link
    It is widely recognized that the generalization ability of neural networks can be greatly enhanced through carefully designing the training procedure. The current state-of-the-art training approach involves utilizing stochastic gradient descent (SGD) or Adam optimization algorithms along with a combination of additional regularization techniques such as weight decay, dropout, or noise injection. Optimal generalization can only be achieved by tuning a multitude of hyperparameters through grid search, which can be time-consuming and necessitates additional validation datasets. To address this issue, we introduce a practical PAC-Bayes training framework that is nearly tuning-free and requires no additional regularization while achieving comparable testing performance to that of SGD/Adam after a complete grid search and with extra regularizations. Our proposed algorithm demonstrates the remarkable potential of PAC training to achieve state-of-the-art performance on deep neural networks with enhanced robustness and interpretability.Comment: 30 pages, 15 figures, 7 table

    Learning prediction function of prior measures for statistical inverse problems of partial differential equations

    Full text link
    In this paper, we view the statistical inverse problems of partial differential equations (PDEs) as PDE-constrained regression and focus on learning the prediction function of the prior probability measures. From this perspective, we propose general generalization bounds for learning infinite-dimensionally defined prior measures in the style of the probability approximately correct Bayesian learning theory. The theoretical framework is rigorously defined on infinite-dimensional separable function space, which makes the theories intimately connected to the usual infinite-dimensional Bayesian inverse approach. Inspired by the concept of α\alpha-differential privacy, a generalized condition (containing the usual Gaussian measures employed widely in the statistical inverse problems of PDEs) has been proposed, which allows the learned prior measures to depend on the measured data (the prediction function with measured data as input and the prior measure as output can be introduced). After illustrating the general theories, the specific settings of linear and nonlinear problems have been given and can be easily casted into our general theories to obtain concrete generalization bounds. Based on the obtained generalization bounds, infinite-dimensionally well-defined practical algorithms are formulated. Finally, numerical examples of the backward diffusion and Darcy flow problems are provided to demonstrate the potential applications of the proposed approach in learning the prediction function of the prior probability measures.Comment: 57 page

    Tighter risk certificates for neural networks

    Get PDF
    This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates that are valid on any unseen examples for the learnt predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of certifying the risk on any unseen data without the need for data-splitting protocols.Comment: Preprint under revie
    • …
    corecore