745 research outputs found
Sparsity oracle inequalities for the Lasso
This paper studies oracle properties of -penalized least squares in
nonparametric regression setting with random design. We show that the penalized
least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in
terms of the number of non-zero components of the oracle vector. The results
are valid even when the dimension of the model is (much) larger than the sample
size and the regression matrix is not positive definite. They can be applied to
high-dimensional linear regression, to nonparametric adaptive regression
estimation and to the problem of aggregation of arbitrary estimators.Comment: Published at http://dx.doi.org/10.1214/07-EJS008 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Exponential Screening and optimal rates of sparse estimation
In high-dimensional linear regression, the goal pursued here is to estimate
an unknown regression function using linear combinations of a suitable set of
covariates. One of the key assumptions for the success of any statistical
procedure in this setup is to assume that the linear combination is sparse in
some sense, for example, that it involves only few covariates. We consider a
general, non necessarily linear, regression with Gaussian noise and study a
related question that is to find a linear combination of approximating
functions, which is at the same time sparse and has small mean squared error
(MSE). We introduce a new estimation procedure, called Exponential Screening
that shows remarkable adaptation properties. It adapts to the linear
combination that optimally balances MSE and sparsity, whether the latter is
measured in terms of the number of non-zero entries in the combination
( norm) or in terms of the global weight of the combination (
norm). The power of this adaptation result is illustrated by showing that
Exponential Screening solves optimally and simultaneously all the problems of
aggregation in Gaussian regression that have been discussed in the literature.
Moreover, we show that the performance of the Exponential Screening estimator
cannot be improved in a minimax sense, even if the optimal sparsity is known in
advance. The theoretical and numerical superiority of Exponential Screening
compared to state-of-the-art sparse procedures is also discussed
Quasi-Likelihood and/or Robust Estimation in High Dimensions
We consider the theory for the high-dimensional generalized linear model with
the Lasso. After a short review on theoretical results in literature, we
present an extension of the oracle results to the case of quasi-likelihood
loss. We prove bounds for the prediction error and -error. The results
are derived under fourth moment conditions on the error distribution. The case
of robust loss is also given. We moreover show that under an irrepresentable
condition, the -penalized quasi-likelihood estimator has no false
positives.Comment: Published in at http://dx.doi.org/10.1214/12-STS397 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sparse Regression Learning by Aggregation and Langevin Monte-Carlo
We consider the problem of regression learning for deterministic design and
independent random errors. We start by proving a sharp PAC-Bayesian type bound
for the exponentially weighted aggregate (EWA) under the expected squared
empirical loss. For a broad class of noise distributions the presented bound is
valid whenever the temperature parameter of the EWA is larger than or
equal to , where is the noise variance. A remarkable
feature of this result is that it is valid even for unbounded regression
functions and the choice of the temperature parameter depends exclusively on
the noise level. Next, we apply this general bound to the problem of
aggregating the elements of a finite-dimensional linear space spanned by a
dictionary of functions . We allow to be much larger
than the sample size but we assume that the true regression function can be
well approximated by a sparse linear combination of functions . Under
this sparsity scenario, we propose an EWA with a heavy tailed prior and we show
that it satisfies a sparsity oracle inequality with leading constant one.
Finally, we propose several Langevin Monte-Carlo algorithms to approximately
compute such an EWA when the number of aggregated functions can be large.
We discuss in some detail the convergence of these algorithms and present
numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200
Pac-bayesian bounds for sparse regression estimation with exponential weights
We consider the sparse regression model where the number of parameters is
larger than the sample size . The difficulty when considering
high-dimensional problems is to propose estimators achieving a good compromise
between statistical and computational performances. The BIC estimator for
instance performs well from the statistical point of view \cite{BTW07} but can
only be computed for values of of at most a few tens. The Lasso estimator
is solution of a convex minimization problem, hence computable for large value
of . However stringent conditions on the design are required to establish
fast rates of convergence for this estimator. Dalalyan and Tsybakov
\cite{arnak} propose a method achieving a good compromise between the
statistical and computational aspects of the problem. Their estimator can be
computed for reasonably large and satisfies nice statistical properties
under weak assumptions on the design. However, \cite{arnak} proposes sparsity
oracle inequalities in expectation for the empirical excess risk only. In this
paper, we propose an aggregation procedure similar to that of \cite{arnak} but
with improved statistical performances. Our main theoretical result is a
sparsity oracle inequality in probability for the true excess risk for a
version of exponential weight estimator. We also propose a MCMC method to
compute our estimator for reasonably large values of .Comment: 19 page
Sparsity considerations for dependent observations
The aim of this paper is to provide a comprehensive introduction for the
study of L1-penalized estimators in the context of dependent observations. We
define a general -penalized estimator for solving problems of
stochastic optimization. This estimator turns out to be the LASSO in the
regression estimation setting. Powerful theoretical guarantees on the
statistical performances of the LASSO were provided in recent papers, however,
they usually only deal with the iid case. Here, we study our estimator under
various dependence assumptions
- …