35 research outputs found

    A PAC-Bayesian bound for Lifelong Learning

    Full text link
    Transfer learning has received a lot of attention in the machine learning community over the last years, and several effective algorithms have been developed. However, relatively little is known about their theoretical properties, especially in the setting of lifelong learning, where the goal is to transfer information to tasks for which no data have been observed so far. In this work we study lifelong learning from a theoretical perspective. Our main result is a PAC-Bayesian generalization bound that offers a unified view on existing paradigms for transfer learning, such as the transfer of parameters or the transfer of low-dimensional representations. We also use the bound to derive two principled lifelong learning algorithms, and we show that these yield results comparable with existing methods.Comment: to appear at ICML 201

    PAC-Bayes Analysis Beyond the Usual Bounds

    Get PDF
    We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed 'data-free' priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss.Comment: In NeurIPS 2020. Version 3 is the final published paper. Note that this paper is an enhanced version of the short paper with the same title that was presented at the NeurIPS 2019 Workshop on Machine Learning with Guarantees. Important update: the PAC-Bayes type inequality for unbounded loss functions (Section 2.3) is ne

    PAC-Bayes Analysis of Multi-view Learning

    Get PDF
    This paper presents eight PAC-Bayes bounds to analyze the generalization performance of multi-view classifiers. These bounds adopt data dependent Gaussian priors which emphasize classifiers with high view agreements. The center of the prior for the first two bounds is the origin, while the center of the prior for the third and fourth bounds is given by a data dependent vector. An important technique to obtain these bounds is two derived logarithmic determinant inequalities whose difference lies in whether the dimensionality of data is involved. The centers of the fifth and sixth bounds are calculated on a separate subset of the training set. The last two bounds use unlabeled data to represent view agreements and are thus applicable to semi-supervised multi-view learning. We evaluate all the presented multi-view PAC-Bayes bounds on benchmark data and compare them with previous single-view PAC-Bayes bounds. The usefulness and performance of the multi-view bounds are discussed.Comment: 35 page

    Progress in Self-Certified Neural Networks

    Get PDF
    International audienceA learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality with a statistical certificate that is valid on unseen data. Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead not only to accurate predictors, but also to tight risk certificates, bearing promise towards achieving self-certified learning. In this context, learning and certification strategies based on PAC-Bayes bounds are especially attractive due to their ability to leverage all data to learn a posterior and simultaneously certify its risk. In this paper, we assess the progress towards self-certification in probabilistic neural networks learnt by PAC-Bayes inspired objectives. We empirically compare (on 4 classification datasets) classical test set bounds for deterministic predictors and a PAC-Bayes bound for randomised self-certified predictors. We first show that both of these generalisation bounds are not too far from out-of-sample test set errors. We then show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance, while self-certified strategies based on PAC-Bayes bounds do not suffer from this drawback, proving that they might be a suitable choice for the small data regime. We also find that probabilistic neural networks learnt by PAC-Bayes inspired objectives lead to certificates that can be surprisingly competitive with commonly used test set bounds

    Progress in Self-Certified Neural Networks

    Get PDF
    International audienceA learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality with a statistical certificate that is valid on unseen data. Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead not only to accurate predictors, but also to tight risk certificates, bearing promise towards achieving self-certified learning. In this context, learning and certification strategies based on PAC-Bayes bounds are especially attractive due to their ability to leverage all data to learn a posterior and simultaneously certify its risk. In this paper, we assess the progress towards self-certification in probabilistic neural networks learnt by PAC-Bayes inspired objectives. We empirically compare (on 4 classification datasets) classical test set bounds for deterministic predictors and a PAC-Bayes bound for randomised self-certified predictors. We first show that both of these generalisation bounds are not too far from out-of-sample test set errors. We then show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance, while self-certified strategies based on PAC-Bayes bounds do not suffer from this drawback, proving that they might be a suitable choice for the small data regime. We also find that probabilistic neural networks learnt by PAC-Bayes inspired objectives lead to certificates that can be surprisingly competitive with commonly used test set bounds

    Towards practical and provable domain adaptation

    Get PDF
    One of the most central questions in statistical modeling is how well a model will generalize. Absent strong assumptions we find that this question is difficult to answer in a meaningful way. In this work we seek to increase our understanding of the domain adaptation setting through two different lenses. First, we investigate whether tractably computable and tight generalization bounds on the performance of neural network classifiers exist in the current literature. The tightest bounds we find use a portion of the input data to tighten the gap between measured performance and the calculated bound. We present evaluations of four bounds using this tightening method on classifiers applied to image classification tasks: Two bounds from the literature in addition to two of our own construction. Further, we find that for situations lacking domain overlap, the existing literature lacks the tools to achieve tight, tractably computable bounds for the neural network models which we use. We conclude that a new approach might be needed. In the second part we therefore consider a setting where we change our underlying assumptions to ones which might be more plausible. This setting, based on learning using privileged information, is shown to result in consistent learning. We also show empirical gains over comparable methods when our assumptions are likely to hold, both in terms of performance and sample efficiency. In summary, the work set out herein has been a first step towards a better understanding of domain adaptation and how using data and new assumptions can help us further our knowledge about this topic
    corecore