89 research outputs found

    Still no free lunches: the price to pay for tighter PAC-Bayes bounds

    Get PDF
    “No free lunch” results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling, which is more or less realistic for a given problem. Some models are “expensive” (strong assumptions, such as sub-Gaussian tails), others are “cheap” (simply finite variance). As it is well known, the more you pay, the more you get: in other words, the most expensive models yield the more interesting bounds. Recent advances in robust statistics have investigated procedures to obtain tight bounds while keeping the cost of assumptions minimal. The present paper explores and exhibits what the limits are for obtaining tight probably approximately correct (PAC)-Bayes bounds in a robust setting for cheap models

    Still no free lunches: the price to pay for tighter PAC-Bayes bounds

    Get PDF
    International audience"No free lunch" results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling. Some models are expensive (strong assumptions, such as as subgaussian tails), others are cheap (simply finite variance). As it is well known, the more you pay, the more you get: in other words, the most expensive models yield the more interesting bounds. Recent advances in robust statistics have investigated procedures to obtain tight bounds while keeping the cost minimal. The present paper explores and exhibits what the limits are for obtaining tight PAC-Bayes bounds in a robust setting for cheap models, addressing the question: is PAC-Bayes good value for money

    PAC-Bayesian Theory Meets Bayesian Inference

    Get PDF
    We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks.Comment: Published at NIPS 2015 (http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference

    PAC-Bayes Analysis of Multi-view Learning

    Get PDF
    This paper presents eight PAC-Bayes bounds to analyze the generalization performance of multi-view classifiers. These bounds adopt data dependent Gaussian priors which emphasize classifiers with high view agreements. The center of the prior for the first two bounds is the origin, while the center of the prior for the third and fourth bounds is given by a data dependent vector. An important technique to obtain these bounds is two derived logarithmic determinant inequalities whose difference lies in whether the dimensionality of data is involved. The centers of the fifth and sixth bounds are calculated on a separate subset of the training set. The last two bounds use unlabeled data to represent view agreements and are thus applicable to semi-supervised multi-view learning. We evaluate all the presented multi-view PAC-Bayes bounds on benchmark data and compare them with previous single-view PAC-Bayes bounds. The usefulness and performance of the multi-view bounds are discussed.Comment: 35 page

    Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β\beta-Mixing Processes

    Full text link
    Pac-Bayes bounds are among the most accurate generalization bounds for classifiers learned from independently and identically distributed (IID) data, and it is particularly so for margin classifiers: there have been recent contributions showing how practical these bounds can be either to perform model selection (Ambroladze et al., 2007) or even to directly guide the learning of linear classifiers (Germain et al., 2009). However, there are many practical situations where the training data show some dependencies and where the traditional IID assumption does not hold. Stating generalization bounds for such frameworks is therefore of the utmost interest, both from theoretical and practical standpoints. In this work, we propose the first - to the best of our knowledge - Pac-Bayes generalization bounds for classifiers trained on data exhibiting interdependencies. The approach undertaken to establish our results is based on the decomposition of a so-called dependency graph that encodes the dependencies within the data, in sets of independent data, thanks to graph fractional covers. Our bounds are very general, since being able to find an upper bound on the fractional chromatic number of the dependency graph is sufficient to get new Pac-Bayes bounds for specific settings. We show how our results can be used to derive bounds for ranking statistics (such as Auc) and classifiers trained on data distributed according to a stationary {\ss}-mixing process. In the way, we show how our approach seemlessly allows us to deal with U-processes. As a side note, we also provide a Pac-Bayes generalization bound for classifiers learned on data from stationary φ\varphi-mixing distributions.Comment: Long version of the AISTATS 09 paper: http://jmlr.csail.mit.edu/proceedings/papers/v5/ralaivola09a/ralaivola09a.pd

    Tuning the distribution dependent prior in the PAC-Bayes framework based on empirical data

    Get PDF
    In this paper we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution. In particular, following Catoni [1], we refine some recent generalisation bounds on the risk of the Gibbs Classifier, when the prior is defined in terms of the data generating distribution, and the posterior is defined in terms of the observed one. Moreover we show that the prior and the posterior distributions can be tuned based on the observed samples without worsening the convergence rate of the bounds and with a marginal impact on their constants
    • …