39 research outputs found
PAC-Bayesian Inequalities for Martingales
We present a set of high-probability inequalities that control the
concentration of weighted averages of multiple (possibly uncountably many)
simultaneously evolving and interdependent martingales. Our results extend the
PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales
opening the way for its application to importance weighted sampling,
reinforcement learning, and other interactive learning domains, as well as many
other domains in probability theory and statistics, where martingales are
encountered.
We also present a comparison inequality that bounds the expectation of a
convex function of a martingale difference sequence shifted to the [0,1]
interval by the expectation of the same function of independent Bernoulli
variables. This inequality is applied to derive a tighter analog of
Hoeffding-Azuma's inequality
Fast rates in learning with dependent observations
In this paper we tackle the problem of fast rates in time series forecasting
from a statistical learning perspective. In a serie of papers (e.g. Meir 2000,
Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main
tools used in learning theory with iid observations can be extended to the
prediction of time series. The main message of these papers is that, given a
family of predictors, we are able to build a new predictor that predicts the
series as well as the best predictor in the family, up to a remainder of order
. It is known that this rate cannot be improved in general. In this
paper, we show that in the particular case of the least square loss, and under
a strong assumption on the time series (phi-mixing) the remainder is actually
of order . Thus, the optimal rate for iid variables, see e.g. Tsybakov
2003, and individual sequences, see \cite{lugosi} is, for the first time,
achieved for uniformly mixing processes. We also show that our method is
optimal for aggregating sparse linear combinations of predictors
PAC-Bayesian Theory Meets Bayesian Inference
We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the
Bayesian marginal likelihood. That is, for the negative log-likelihood loss
function, we show that the minimization of PAC-Bayesian generalization risk
bounds maximizes the Bayesian marginal likelihood. This provides an alternative
explanation to the Bayesian Occam's razor criteria, under the assumption that
the data is generated by an i.i.d distribution. Moreover, as the negative
log-likelihood is an unbounded loss function, we motivate and propose a
PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that
our approach is sound on classical Bayesian linear regression tasks.Comment: Published at NIPS 2015
(http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference
PAC-Bayes-empirical-Bernstein inequality
We present PAC-Bayes-Empirical-Bernstein inequality. The inequality is based on combination of PAC-Bayesian bounding technique with Empirical Bernstein bound. It allows to take advantage of small empirical variance and is especially useful in regression. We show that when the empirical variance is significantly smaller than the empirical loss PAC-Bayes-Empirical-Bernstein inequality is significantly tighter than PAC-Bayes-kl inequality of Seeger (2002) and otherwise it is comparable. PAC-Bayes-Empirical-Bernstein inequality is an interesting example of application of PAC-Bayesian bounding technique to self-bounding functions. We provide empirical comparison of PAC-Bayes-Empirical-Bernstein inequality with PAC-Bayes-kl inequality on a synthetic example and several UCI datasets