283 research outputs found

    Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

    Get PDF
    We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective.The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors

    Minimax risk classifiers with 0-1 loss

    Get PDF
    Supervised classification techniques use training samples to learn a classification rule with small expected 0-1 loss (error probability). Conventional methods enable tractable learning and provide out-of-sample generalization by using surrogate losses instead of the 0-1 loss and considering specific families of rules (hypothesis classes). This paper presents minimax risk classifiers (MRCs) that minimize the worst-case 0-1 loss over general classification rules and provide tight performance guarantees at learning. We show that MRCs are strongly universally consistent using feature mappings given by characteristic kernels. The paper also proposes efficient optimization techniques for MRC learning and shows that the methods presented can provide accurate classification together with tight performance guarantees in practice

    Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β\beta-Mixing Processes

    Full text link
    Pac-Bayes bounds are among the most accurate generalization bounds for classifiers learned from independently and identically distributed (IID) data, and it is particularly so for margin classifiers: there have been recent contributions showing how practical these bounds can be either to perform model selection (Ambroladze et al., 2007) or even to directly guide the learning of linear classifiers (Germain et al., 2009). However, there are many practical situations where the training data show some dependencies and where the traditional IID assumption does not hold. Stating generalization bounds for such frameworks is therefore of the utmost interest, both from theoretical and practical standpoints. In this work, we propose the first - to the best of our knowledge - Pac-Bayes generalization bounds for classifiers trained on data exhibiting interdependencies. The approach undertaken to establish our results is based on the decomposition of a so-called dependency graph that encodes the dependencies within the data, in sets of independent data, thanks to graph fractional covers. Our bounds are very general, since being able to find an upper bound on the fractional chromatic number of the dependency graph is sufficient to get new Pac-Bayes bounds for specific settings. We show how our results can be used to derive bounds for ranking statistics (such as Auc) and classifiers trained on data distributed according to a stationary {\ss}-mixing process. In the way, we show how our approach seemlessly allows us to deal with U-processes. As a side note, we also provide a Pac-Bayes generalization bound for classifiers learned on data from stationary φ\varphi-mixing distributions.Comment: Long version of the AISTATS 09 paper: http://jmlr.csail.mit.edu/proceedings/papers/v5/ralaivola09a/ralaivola09a.pd

    Generalization Error in Deep Learning

    Get PDF
    Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results

    PAC-Bayesian Computation

    Get PDF
    Risk bounds, which are also called generalisation bounds in the statistical learning literature, are important objects of study because they give some information on the expected error that a predictor may incur on randomly chosen data points. In classical statistical learning, the analyses focus on individual hypotheses, and the aim is deriving risk bounds that are valid for the data-dependent hypothesis output by some learning method. Often, however, such risk bounds are valid uniformly over a hypothesis class, which is a consequence of the methods used to derive them, namely the theory of uniform convergence of empirical processes. This is a source of looseness of these classical kinds of bounds which has lead to debates and criticisms, and motivated the search of alternative methods to derive tighter bounds. The PAC-Bayes analysis focuses on distributions over hypotheses and randomised predictors defined by such distributions. Other prediction schemes can be devised based on a distribution over hypotheses, however, the randomised predictor is a typical starting point. Lifting the analysis to distributions over hypotheses, rather than individual hypotheses, makes available sharp analysis tools, which arguably account for the tightness of PAC-Bayes bounds. Two main uses of PAC-Bayes bounds are (1) risk certification, and (2) cost function derivation. The first consists of evaluating numerical risk certificates for the distributions over hypotheses learned by some method, while the second consists of turning a PAC-Bayes bound into a training objective, to learn a distribution by minimising the bound. This thesis revisits both kinds of uses of PAC-Bayes bounds. We contribute results on certifying the risk of randomised kernel and neural network classifiers, adding evidence to the success of PAC-Bayes bounds at delivering tight certificates. This thesis proposes the name “PAC-Bayesian Computation” as a generic name to encompass the class of methods that learn a distribution over hypotheses by minimising a PAC-Bayes bound (i.e. the second use case described above: cost function derivation), and reports an interesting case of PAC-Bayesian Computation leading to self-certified learning: we develop a learning and certification strategy that uses all the available data to produce a predictor together with a tight risk certificate, as demonstrated with randomised neural network classifiers on two benchmark data sets (MNIST, CIFAR-10)

    PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

    Full text link
    We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.Comment: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop

    PAC-Bayesian Learning of Optimization Algorithms

    Full text link
    We apply the PAC-Bayes theory to the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-bounds) and explicit trade-off between a high probability of convergence and a high convergence speed. Even in the limit case, where convergence is guaranteed, our learned optimization algorithms provably outperform related algorithms based on a (deterministic) worst-case analysis. Our results rely on PAC-Bayes bounds for general, unbounded loss-functions based on exponential families. By generalizing existing ideas, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum, which enables the algorithmic realization of the learning procedure. As a proof-of-concept, we learn hyperparameters of standard optimization algorithms to empirically underline our theory.Comment: Accepted to AISTATS 202

    Tighter risk certificates for neural networks

    Get PDF
    This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates that are valid on any unseen examples for the learnt predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of certifying the risk on any unseen data without the need for data-splitting protocols.Comment: Preprint under revie
    • …
    corecore