Search CORE

1,022 research outputs found

PAC-Bayesian Theory Meets Bayesian Inference

Author: Bach Francis
Germain Pascal
Lacoste Alexandre
Lacoste-Julien Simon
Publication venue
Publication date: 27/05/2016
Field of study

We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks.Comment: Published at NIPS 2015 (http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

Author: Auer Peter
Cesa-Bianchi Nicolò
Laviolette François
Peters Jan
Seldin Yevgeny
Shawe-Taylor John
Publication venue
Publication date: 01/01/2011
Field of study

We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.Comment: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop

arXiv.org e-Print Archive

An Improvement to the Domain Adaptation Bound in a PAC-Bayesian context

Author: Germain Pascal
Habrard Amaury
Laviolette Francois
Morvant Emilie
Publication venue
Publication date: 13/12/2014
Field of study

This paper provides a theoretical analysis of domain adaptation based on the PAC-Bayesian theory. We propose an improvement of the previous domain adaptation bound obtained by Germain et al. in two ways. We first give another generalization bound tighter and easier to interpret. Moreover, we provide a new analysis of the constant term appearing in the bound that can be of high interest for developing new algorithmic solutions.Comment: NIPS 2014 Workshop on Transfer and Multi-task learning: Theory Meets Practice, Dec 2014, Montr{\'e}al, Canad

arXiv.org e-Print Archive

HAL-UJM

A New PAC-Bayesian Perspective on Domain Adaptation

Author: Germain Pascal
Habrard Amaury
Laviolette François
Morvant Emilie
Publication venue
Publication date: 01/01/2015
Field of study

We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence---expressed as a ratio---controls the trade-off between a source error measure and the target voters' disagreement. Our bound suggests that one has to focus on regions where the source data is informative.From this result, we derive a PAC-Bayesian generalization bound, and specialize it to linear classifiers. Then, we infer a learning algorithmand perform experiments on real data.Comment: Published at ICML 201

arXiv.org e-Print Archive

HAL-UJM

INRIA a CCSD electronic archive server

Generalization bounds for averaged classifiers

Author: Freund Yoav
Mansour Yishay
Schapire Robert E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2004
Field of study

We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is comparable to the generalization error of the best hypothesis in the class.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000005

arXiv.org e-Print Archive

CiteSeerX

A Bayesian Approach for Noisy Matrix Completion: Optimal Rate under General Sampling Distribution

Author: Alquier Pierre
Mai The Tien
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 21/01/2015
Field of study

Bayesian methods for low-rank matrix completion with noise have been shown to be very efficient computationally. While the behaviour of penalized minimization methods is well understood both from the theoretical and computational points of view in this problem, the theoretical optimality of Bayesian estimators have not been explored yet. In this paper, we propose a Bayesian estimator for matrix completion under general sampling distribution. We also provide an oracle inequality for this estimator. This inequality proves that, whatever the rank of the matrix to be estimated, our estimator reaches the minimax-optimal rate of convergence (up to a logarithmic factor). We end the paper with a short simulation study

arXiv.org e-Print Archive

Research Repository UCD

Random deep neural networks are biased towards simple functions

Author: De Palma Giacomo
Kiani Bobak Toussi
Lloyd Seth
Publication venue
Publication date: 01/01/2019
Field of study

We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2{\pi} log n)), where n is the length of the string. Moreover, if the bits of the initial string are flipped randomly, the average number of flips required to change the classification grows linearly with n. These results are confirmed by numerical experiments on deep neural networks with two hidden layers, and settle the conjecture stating that random deep neural networks are biased towards simple functions. This conjecture was proposed and numerically explored in [Valle P\'erez et al., ICLR 2019] to explain the unreasonably good generalization properties of deep learning algorithms. The probability distribution of the functions generated by random deep neural networks is a good choice for the prior probability distribution in the PAC-Bayesian generalization bounds. Our results constitute a fundamental step forward in the characterization of this distribution, therefore contributing to the understanding of the generalization properties of deep learning algorithms

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna