20 research outputs found
Predictive PAC Learning and Process Decompositions
We informally call a stochastic process learnable if it admits a
generalization error approaching zero in probability for any concept class with
finite VC-dimension (IID processes are the simplest example). A mixture of
learnable processes need not be learnable itself, and certainly its
generalization error need not decay at the same rate. In this paper, we argue
that it is natural in predictive PAC to condition not on the past observations
but on the mixture component of the sample path. This definition not only
matches what a realistic learner might demand, but also allows us to sidestep
several otherwise grave problems in learning from dependent data. In
particular, we give a novel PAC generalization bound for mixtures of learnable
processes with a generalization error that is not worse than that of each
mixture component. We also provide a characterization of mixtures of absolutely
regular (-mixing) processes, of independent probability-theoretic
interest.Comment: 9 pages, accepted in NIPS 201
Fast rates in learning with dependent observations
In this paper we tackle the problem of fast rates in time series forecasting
from a statistical learning perspective. In a serie of papers (e.g. Meir 2000,
Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main
tools used in learning theory with iid observations can be extended to the
prediction of time series. The main message of these papers is that, given a
family of predictors, we are able to build a new predictor that predicts the
series as well as the best predictor in the family, up to a remainder of order
. It is known that this rate cannot be improved in general. In this
paper, we show that in the particular case of the least square loss, and under
a strong assumption on the time series (phi-mixing) the remainder is actually
of order . Thus, the optimal rate for iid variables, see e.g. Tsybakov
2003, and individual sequences, see \cite{lugosi} is, for the first time,
achieved for uniformly mixing processes. We also show that our method is
optimal for aggregating sparse linear combinations of predictors
Limit theorems for a class of identically distributed random variables
A new type of stochastic dependence for a sequence of random variables is
introduced and studied. Precisely, (X_n)_{n\geq 1} is said to be conditionally
identically distributed (c.i.d.), with respect to a filtration (G_n)_{n\geq 0},
if it is adapted to (G_n)_{n\geq 0} and, for each n\geq 0, (X_k)_{k>n} is
identically distributed given the past G_n. In case G_0={\varnothing,\Omega}
and G_n=\sigma(X_1,...,X_n), a result of Kallenberg implies that (X_n)_{n\geq
1} is exchangeable if and only if it is stationary and c.i.d. After giving some
natural examples of nonexchangeable c.i.d. sequences, it is shown that
(X_n)_{n\geq 1} is exchangeable if and only if (X_{\tau(n)})_{n\geq 1} is
c.i.d. for any finite permutation \tau of {1,2,...}, and that the distribution
of a c.i.d. sequence agrees with an exchangeable law on a certain
sub-\sigma-field. Moreover, (1/n)\sum_{k=1}^nX_k converges a.s. and in L^1
whenever (X_n)_{n\geq 1} is (real-valued) c.i.d. and E[|
X_1| ]<\infty. As to the CLT, three types of random centering are considered.
One such centering, significant in Bayesian prediction and discrete time
filtering, is E[X_{n+1}| G_n]. For each centering, convergence in distribution
of the corresponding empirical process is analyzed under uniform distance.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Probability
(http://www.imstat.org/aop/) at http://dx.doi.org/10.1214/00911790400000067
Model selection for weakly dependent time series forecasting
Observing a stationary time series, we propose a two-step procedure for the
prediction of the next value of the time series. The first step follows machine
learning theory paradigm and consists in determining a set of possible
predictors as randomized estimators in (possibly numerous) different predictive
models. The second step follows the model selection paradigm and consists in
choosing one predictor with good properties among all the predictors of the
first steps. We study our procedure for two different types of bservations:
causal Bernoulli shifts and bounded weakly dependent processes. In both cases,
we give oracle inequalities: the risk of the chosen predictor is close to the
best prediction risk in all predictive models that we consider. We apply our
procedure for predictive models such as linear predictors, neural networks
predictors and non-parametric autoregressive
Prediction of time series by statistical learning: general losses and fast rates
We establish rates of convergences in time series forecasting using the
statistical learning approach based on oracle inequalities. A series of papers
extends the oracle inequalities obtained for iid observations to time series
under weak dependence conditions. Given a family of predictors and
observations, oracle inequalities state that a predictor forecasts the series
as well as the best predictor in the family up to a remainder term .
Using the PAC-Bayesian approach, we establish under weak dependence conditions
oracle inequalities with optimal rates of convergence. We extend previous
results for the absolute loss function to any Lipschitz loss function with
rates where measures the
complexity of the model. We apply the method for quantile loss functions to
forecast the french GDP. Under additional conditions on the loss functions
(satisfied by the quadratic loss function) and on the time series, we refine
the rates of convergence to . We achieve for the
first time these fast rates for uniformly mixing processes. These rates are
known to be optimal in the iid case and for individual sequences. In
particular, we generalize the results of Dalalyan and Tsybakov on sparse
regression estimation to the case of autoregression