Search CORE

39 research outputs found

PAC-Bayesian Inequalities for Martingales

Author: Auer Peter
Cesa-Bianchi Nicolò
Laviolette François
Seldin Yevgeny
Shawe-Taylor John
Publication venue
Publication date: 01/01/2012
Field of study

We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and other interactive learning domains, as well as many other domains in probability theory and statistics, where martingales are encountered. We also present a comparison inequality that bounds the expectation of a convex function of a martingale difference sequence shifted to the [0,1] interval by the expectation of the same function of independent Bernoulli variables. This inequality is applied to derive a tighter analog of Hoeffding-Azuma's inequality

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

UCL Discovery

MPG.PuRe

Fast rates in learning with dependent observations

Author: Alquier Pierre
Wintenberger Olivier
Publication venue
Publication date: 01/01/2012
Field of study

In this paper we tackle the problem of fast rates in time series forecasting from a statistical learning perspective. In a serie of papers (e.g. Meir 2000, Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main tools used in learning theory with iid observations can be extended to the prediction of time series. The main message of these papers is that, given a family of predictors, we are able to build a new predictor that predicts the series as well as the best predictor in the family, up to a remainder of order

1/\sqrt{n}

. It is known that this rate cannot be improved in general. In this paper, we show that in the particular case of the least square loss, and under a strong assumption on the time series (phi-mixing) the remainder is actually of order

1/n

. Thus, the optimal rate for iid variables, see e.g. Tsybakov 2003, and individual sequences, see \cite{lugosi} is, for the first time, achieved for uniformly mixing processes. We also show that our method is optimal for aggregating sparse linear combinations of predictors

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Hal-Diderot

HAL-Polytechnique

PAC-Bayesian Theory Meets Bayesian Inference

Author: Bach Francis
Germain Pascal
Lacoste Alexandre
Lacoste-Julien Simon
Publication venue
Publication date: 27/05/2016
Field of study

We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks.Comment: Published at NIPS 2015 (http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

PAC-Bayes-empirical-Bernstein inequality

Author: Seldin Yevgeny
Tolstikhin Ilya
Publication venue: Neural Information Processing Systems (NIPS) Foundation
Publication date: 01/01/2013
Field of study

We present PAC-Bayes-Empirical-Bernstein inequality. The inequality is based on combination of PAC-Bayesian bounding technique with Empirical Bernstein bound. It allows to take advantage of small empirical variance and is especially useful in regression. We show that when the empirical variance is significantly smaller than the empirical loss PAC-Bayes-Empirical-Bernstein inequality is significantly tighter than PAC-Bayes-kl inequality of Seeger (2002) and otherwise it is comparable. PAC-Bayes-Empirical-Bernstein inequality is an interesting example of application of PAC-Bayesian bounding technique to self-bounding functions. We provide empirical comparison of PAC-Bayes-Empirical-Bernstein inequality with PAC-Bayes-kl inequality on a synthetic example and several UCI datasets

Queensland University of Technology ePrints Archive