5,159 research outputs found
Least squares after model selection in high-dimensional sparse models
In this article we study post-model selection estimators that apply ordinary
least squares (OLS) to the model selected by first-step penalized estimators,
typically Lasso. It is well known that Lasso can estimate the nonparametric
regression function at nearly the oracle rate, and is thus hard to improve
upon. We show that the OLS post-Lasso estimator performs at least as well as
Lasso in terms of the rate of convergence, and has the advantage of a smaller
bias. Remarkably, this performance occurs even if the Lasso-based model
selection "fails" in the sense of missing some components of the "true"
regression model. By the "true" model, we mean the best s-dimensional
approximation to the nonparametric regression function chosen by the oracle.
Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso,
in the sense of a strictly faster rate of convergence, if the Lasso-based model
selection correctly includes all components of the "true" model as a subset and
also achieves sufficient sparsity. In the extreme case, when Lasso perfectly
selects the "true" model, the OLS post-Lasso estimator becomes the oracle
estimator. An important ingredient in our analysis is a new sparsity bound on
the dimension of the model selected by Lasso, which guarantees that this
dimension is at most of the same order as the dimension of the "true" model.
Our rate results are nonasymptotic and hold in both parametric and
nonparametric models. Moreover, our analysis is not limited to the Lasso
estimator acting as a selector in the first step, but also applies to any other
estimator, for example, various forms of thresholded Lasso, with good rates and
good sparsity properties. Our analysis covers both traditional thresholding and
a new practical, data-driven thresholding scheme that induces additional
sparsity subject to maintaining a certain goodness of fit. The latter scheme
has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it
dominates those procedures as well as traditional thresholding in a wide
variety of experiments.Comment: Published in at http://dx.doi.org/10.3150/11-BEJ410 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Post-l1-penalized estimators in high-dimensional linear regression models
In this paper we study post-penalized estimators which apply ordinary, unpenalized linear regression to the model selected by first-step penalized estimators, typically LASSO. It is well known that LASSO can estimate the regression function at nearly the oracle rate, and is thus hard to improve upon. We show that post-LASSO performs at least as well as LASSO in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the LASSO-based model selection 'fails' in the sense of missing some components of the 'true' regression model. By the 'true' model we mean here the best s-dimensional approximation to the regression function chosen by the oracle. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the 'true' model as a subset and also achieves a sufficient sparsity. In the extreme case, when LASSO perfectly selects the 'true' model, the post-LASSO estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by LASSO which guarantees that this dimension is at most of the same order as the dimension of the 'true' model. Our rate results are non-asymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the LASSO estimator in the first step, but also applies to other estimators, for example, the trimmed LASSO, Dantzig selector, or any other estimator with good rates and good sparsity. Our analysis covers both traditional trimming and a new practical, completely data-driven trimming scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of LASSO or post-LASSO, but it dominates these procedures as well as traditional trimming in a wide variety of experiments.
An l1-Oracle Inequality for the Lasso
The Lasso has attracted the attention of many authors these last years. While
many efforts have been made to prove that the Lasso behaves like a variable
selection procedure at the price of strong (though unavoidable) assumptions on
the geometric structure of these variables, much less attention has been paid
to the analysis of the performance of the Lasso as a regularization algorithm.
Our first purpose here is to provide a conceptually very simple result in this
direction. We shall prove that, provided that the regularization parameter is
properly chosen, the Lasso works almost as well as the deterministic Lasso.
This result does not require any assumption at all, neither on the structure of
the variables nor on the regression function. Our second purpose is to
introduce a new estimator particularly adapted to deal with infinite countable
dictionaries. This estimator is constructed as an l0-penalized estimator among
a sequence of Lasso estimators associated to a dyadic sequence of growing
truncated dictionaries. The selection procedure automatically chooses the best
level of truncation of the dictionary so as to make the best tradeoff between
approximation, l1-regularization and sparsity. From a theoretical point of
view, we shall provide an oracle inequality satisfied by this selected Lasso
estimator. The oracle inequalities established for the Lasso and the selected
Lasso estimators shall enable us to derive rates of convergence on a wide class
of functions, showing that these estimators perform at least as well as greedy
algorithms. Besides, we shall prove that the rates of convergence achieved by
the selected Lasso estimator are optimal in the orthonormal case by bounding
from below the minimax risk on some Besov bodies. Finally, some theoretical
results about the performance of the Lasso for infinite uncountable
dictionaries will be studied in the specific framework of neural networks. All
the oracle inequalities presented in this paper are obtained via the
application of a single general theorem of model selection among a collection
of nonlinear models which is a direct consequence of the Gaussian concentration
inequality. The key idea that enables us to apply this general theorem is to
see l1-regularization as a model selection procedure among l1-balls
Least Squares After Model Selection in High-dimensional Sparse Models
http://arxiv.org/abs/1001.0188We study post-model selection estimators which apply ordinary least squares (ols) to the model selected by first-step penalized estimators. It is well known that lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that ols post lasso estimator performs at least as well as lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model we mean here the best -dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, ols post lasso estimator can perform strictly better than lasso, i.e. a strictly faster rate of convergence, if the lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when lasso perfectly selects the "true" model, the ols post lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by lasso which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Moreover, our analysis is not limited to the lasso estimator acting as selector in the first step, but also applies to any other estimator, for example various forms of thresholded lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of lasso or ols post lasso, but it dominates these procedures in a wide variety of experiments.National Science Foundation (U.S.
Least Squares after Model Selection in High-Dimensional Sparse Models
Note: new title. Former title = Post-ℓ1-Penalized Estimators in High-Dimensional Linear Regression Models. First Version submitted March 29, 2010; Orig. date Jan 4, 2009; this revision June 14, 2011In this paper we study post-model selection estimators which apply ordinary least squares (ols) to the model selected by first-step penalized estimators, typically lasso. It is well known that lasso can estimate the non-parametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that ols post lasso estimator performs at least as well as lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the lasso-based model selection “fails” in the sense of missing some components of the “true” regression model. By the “true” model we mean here the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, ols post lasso estimator can perform strictly better than lasso, in the sense of a strictly faster rate of convergence, if the lasso-based model selection correctly includes all components of the “true” model as a subset and also achieves sufficient sparsity. In the extreme case, when lasso perfectly selects the “true” model, the ols post lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by lasso which guarantees that this dimension is at most of the same order as the dimension of the “true” model. Our rate results are non-asymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the lasso estimator acting as selector in the first step, but also applies to any other estimator, for example various forms of thresholded lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of lasso or ols post lasso, but it dominates these procedures as well as traditional thresholding in a wide variety of experiments
Inference for High-Dimensional Sparse Econometric Models
This article is about estimation and inference methods for high dimensional
sparse (HDS) regression models in econometrics. High dimensional sparse models
arise in situations where many regressors (or series terms) are available and
the regression function is well-approximated by a parsimonious, yet unknown set
of regressors. The latter condition makes it possible to estimate the entire
regression function effectively by searching for approximately the right set of
regressors. We discuss methods for identifying this set of regressors and
estimating their coefficients based on -penalization and describe key
theoretical results. In order to capture realistic practical situations, we
expressly allow for imperfect selection of regressors and study the impact of
this imperfect selection on estimation and inference results. We focus the main
part of the article on the use of HDS models and methods in the instrumental
variables model and the partially linear model. We present a set of novel
inference results for these models and illustrate their use with applications
to returns to schooling and growth regression
Inference in Additively Separable Models With a High-Dimensional Set of Conditioning Variables
This paper studies nonparametric series estimation and inference for the
effect of a single variable of interest x on an outcome y in the presence of
potentially high-dimensional conditioning variables z. The context is an
additively separable model E[y|x, z] = g0(x) + h0(z). The model is
high-dimensional in the sense that the series of approximating functions for
h0(z) can have more terms than the sample size, thereby allowing z to have
potentially very many measured characteristics. The model is required to be
approximately sparse: h0(z) can be approximated using only a small subset of
series terms whose identities are unknown. This paper proposes an estimation
and inference method for g0(x) called Post-Nonparametric Double Selection which
is a generalization of Post-Double Selection. Standard rates of convergence and
asymptotic normality for the estimator are shown to hold uniformly over a large
class of sparse data generating processes. A simulation study illustrates
finite sample estimation properties of the proposed estimator and coverage
properties of the corresponding confidence intervals. Finally, an empirical
application to college admissions policy demonstrates the practical
implementation of the proposed method
- …