In this paper we tackle the problem of fast rates in time series forecasting
from a statistical learning perspective. In a serie of papers (e.g. Meir 2000,
Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main
tools used in learning theory with iid observations can be extended to the
prediction of time series. The main message of these papers is that, given a
family of predictors, we are able to build a new predictor that predicts the
series as well as the best predictor in the family, up to a remainder of order
1/nā. It is known that this rate cannot be improved in general. In this
paper, we show that in the particular case of the least square loss, and under
a strong assumption on the time series (phi-mixing) the remainder is actually
of order 1/n. Thus, the optimal rate for iid variables, see e.g. Tsybakov
2003, and individual sequences, see \cite{lugosi} is, for the first time,
achieved for uniformly mixing processes. We also show that our method is
optimal for aggregating sparse linear combinations of predictors