135 research outputs found
Density estimation with quadratic loss: a confidence intervals method
In a previous article, a least square regression estimation procedure was
proposed: first, we condiser a family of functions and study the properties of
an estimator in every unidimensionnal model defined by one of these functions;
we then show how to aggregate these estimators. The purpose of this paper is to
extend this method to the case of density estimation. We first give a general
overview of the method, adapted to the density estimation problem. We then show
that this leads to adaptative estimators, that means that the estimator reaches
the best possible rate of convergence (up to a factor). Finally we show
some ways to improve and generalize the method
Pac-bayesian bounds for sparse regression estimation with exponential weights
We consider the sparse regression model where the number of parameters is
larger than the sample size . The difficulty when considering
high-dimensional problems is to propose estimators achieving a good compromise
between statistical and computational performances. The BIC estimator for
instance performs well from the statistical point of view \cite{BTW07} but can
only be computed for values of of at most a few tens. The Lasso estimator
is solution of a convex minimization problem, hence computable for large value
of . However stringent conditions on the design are required to establish
fast rates of convergence for this estimator. Dalalyan and Tsybakov
\cite{arnak} propose a method achieving a good compromise between the
statistical and computational aspects of the problem. Their estimator can be
computed for reasonably large and satisfies nice statistical properties
under weak assumptions on the design. However, \cite{arnak} proposes sparsity
oracle inequalities in expectation for the empirical excess risk only. In this
paper, we propose an aggregation procedure similar to that of \cite{arnak} but
with improved statistical performances. Our main theoretical result is a
sparsity oracle inequality in probability for the true excess risk for a
version of exponential weight estimator. We also propose a MCMC method to
compute our estimator for reasonably large values of .Comment: 19 page
Fast rates in learning with dependent observations
In this paper we tackle the problem of fast rates in time series forecasting
from a statistical learning perspective. In a serie of papers (e.g. Meir 2000,
Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main
tools used in learning theory with iid observations can be extended to the
prediction of time series. The main message of these papers is that, given a
family of predictors, we are able to build a new predictor that predicts the
series as well as the best predictor in the family, up to a remainder of order
. It is known that this rate cannot be improved in general. In this
paper, we show that in the particular case of the least square loss, and under
a strong assumption on the time series (phi-mixing) the remainder is actually
of order . Thus, the optimal rate for iid variables, see e.g. Tsybakov
2003, and individual sequences, see \cite{lugosi} is, for the first time,
achieved for uniformly mixing processes. We also show that our method is
optimal for aggregating sparse linear combinations of predictors
Sparsity considerations for dependent observations
The aim of this paper is to provide a comprehensive introduction for the
study of L1-penalized estimators in the context of dependent observations. We
define a general -penalized estimator for solving problems of
stochastic optimization. This estimator turns out to be the LASSO in the
regression estimation setting. Powerful theoretical guarantees on the
statistical performances of the LASSO were provided in recent papers, however,
they usually only deal with the iid case. Here, we study our estimator under
various dependence assumptions
Model selection for weakly dependent time series forecasting
Observing a stationary time series, we propose a two-step procedure for the
prediction of the next value of the time series. The first step follows machine
learning theory paradigm and consists in determining a set of possible
predictors as randomized estimators in (possibly numerous) different predictive
models. The second step follows the model selection paradigm and consists in
choosing one predictor with good properties among all the predictors of the
first steps. We study our procedure for two different types of bservations:
causal Bernoulli shifts and bounded weakly dependent processes. In both cases,
we give oracle inequalities: the risk of the chosen predictor is close to the
best prediction risk in all predictive models that we consider. We apply our
procedure for predictive models such as linear predictors, neural networks
predictors and non-parametric autoregressive
An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization
The aim of this paper is to provide some theoretical understanding of
quasi-Bayesian aggregation methods non-negative matrix factorization. We derive
an oracle inequality for an aggregated estimator. This result holds for a very
general class of prior distributions and shows how the prior affects the rate
of convergence.Comment: This is the corrected version of the published paper P. Alquier, B.
Guedj, An Oracle Inequality for Quasi-Bayesian Non-negative Matrix
Factorization, Mathematical Methods of Statistics, 2017, vol. 26, no. 1, pp.
55-67. Since then Arnak Dalalyan (ENSAE) found a mistake in the proofs. We
fixed the mistake at the price of a slightly different logarithmic term in
the boun
Transductive versions of the LASSO and the Dantzig Selector
We consider the linear regression problem, where the number of covariates
is possibly larger than the number of observations , under sparsity assumptions. On the one hand, several methods have
been successfully proposed to perform this task, for example the LASSO or the
Dantzig Selector. On the other hand, consider new values . If one wants to estimate the corresponding 's, one should
think of a specific estimator devoted to this task, referred by Vapnik as a
"transductive" estimator. This estimator may differ from an estimator designed
to the more general task "estimate on the whole domain". In this work, we
propose a generalized version both of the LASSO and the Dantzig Selector, based
on the geometrical remarks about the LASSO in pr\'evious works. The "usual"
LASSO and Dantzig Selector, as well as new estimators interpreted as
transductive versions of the LASSO, appear as special cases. These estimators
are interesting at least from a theoretical point of view: we can give
theoretical guarantees for these estimators under hypotheses that are relaxed
versions of the hypotheses required in the papers about the "usual" LASSO.
These estimators can also be efficiently computed, with results comparable to
the ones of the LASSO
- …