5,159 research outputs found

    Least squares after model selection in high-dimensional sparse models

    Get PDF
    In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments.Comment: Published in at http://dx.doi.org/10.3150/11-BEJ410 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Post-l1-penalized estimators in high-dimensional linear regression models

    Get PDF
    In this paper we study post-penalized estimators which apply ordinary, unpenalized linear regression to the model selected by first-step penalized estimators, typically LASSO. It is well known that LASSO can estimate the regression function at nearly the oracle rate, and is thus hard to improve upon. We show that post-LASSO performs at least as well as LASSO in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the LASSO-based model selection 'fails' in the sense of missing some components of the 'true' regression model. By the 'true' model we mean here the best s-dimensional approximation to the regression function chosen by the oracle. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the 'true' model as a subset and also achieves a sufficient sparsity. In the extreme case, when LASSO perfectly selects the 'true' model, the post-LASSO estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by LASSO which guarantees that this dimension is at most of the same order as the dimension of the 'true' model. Our rate results are non-asymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the LASSO estimator in the first step, but also applies to other estimators, for example, the trimmed LASSO, Dantzig selector, or any other estimator with good rates and good sparsity. Our analysis covers both traditional trimming and a new practical, completely data-driven trimming scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of LASSO or post-LASSO, but it dominates these procedures as well as traditional trimming in a wide variety of experiments.

    An l1-Oracle Inequality for the Lasso

    Get PDF
    The Lasso has attracted the attention of many authors these last years. While many efforts have been made to prove that the Lasso behaves like a variable selection procedure at the price of strong (though unavoidable) assumptions on the geometric structure of these variables, much less attention has been paid to the analysis of the performance of the Lasso as a regularization algorithm. Our first purpose here is to provide a conceptually very simple result in this direction. We shall prove that, provided that the regularization parameter is properly chosen, the Lasso works almost as well as the deterministic Lasso. This result does not require any assumption at all, neither on the structure of the variables nor on the regression function. Our second purpose is to introduce a new estimator particularly adapted to deal with infinite countable dictionaries. This estimator is constructed as an l0-penalized estimator among a sequence of Lasso estimators associated to a dyadic sequence of growing truncated dictionaries. The selection procedure automatically chooses the best level of truncation of the dictionary so as to make the best tradeoff between approximation, l1-regularization and sparsity. From a theoretical point of view, we shall provide an oracle inequality satisfied by this selected Lasso estimator. The oracle inequalities established for the Lasso and the selected Lasso estimators shall enable us to derive rates of convergence on a wide class of functions, showing that these estimators perform at least as well as greedy algorithms. Besides, we shall prove that the rates of convergence achieved by the selected Lasso estimator are optimal in the orthonormal case by bounding from below the minimax risk on some Besov bodies. Finally, some theoretical results about the performance of the Lasso for infinite uncountable dictionaries will be studied in the specific framework of neural networks. All the oracle inequalities presented in this paper are obtained via the application of a single general theorem of model selection among a collection of nonlinear models which is a direct consequence of the Gaussian concentration inequality. The key idea that enables us to apply this general theorem is to see l1-regularization as a model selection procedure among l1-balls

    Least Squares After Model Selection in High-dimensional Sparse Models

    Get PDF
    http://arxiv.org/abs/1001.0188We study post-model selection estimators which apply ordinary least squares (ols) to the model selected by first-step penalized estimators. It is well known that lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that ols post lasso estimator performs at least as well as lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model we mean here the best ss-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, ols post lasso estimator can perform strictly better than lasso, i.e. a strictly faster rate of convergence, if the lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when lasso perfectly selects the "true" model, the ols post lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by lasso which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Moreover, our analysis is not limited to the lasso estimator acting as selector in the first step, but also applies to any other estimator, for example various forms of thresholded lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of lasso or ols post lasso, but it dominates these procedures in a wide variety of experiments.National Science Foundation (U.S.

    Least Squares after Model Selection in High-Dimensional Sparse Models

    Get PDF
    Note: new title. Former title = Post-ℓ1-Penalized Estimators in High-Dimensional Linear Regression Models. First Version submitted March 29, 2010; Orig. date Jan 4, 2009; this revision June 14, 2011In this paper we study post-model selection estimators which apply ordinary least squares (ols) to the model selected by first-step penalized estimators, typically lasso. It is well known that lasso can estimate the non-parametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that ols post lasso estimator performs at least as well as lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the lasso-based model selection “fails” in the sense of missing some components of the “true” regression model. By the “true” model we mean here the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, ols post lasso estimator can perform strictly better than lasso, in the sense of a strictly faster rate of convergence, if the lasso-based model selection correctly includes all components of the “true” model as a subset and also achieves sufficient sparsity. In the extreme case, when lasso perfectly selects the “true” model, the ols post lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by lasso which guarantees that this dimension is at most of the same order as the dimension of the “true” model. Our rate results are non-asymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the lasso estimator acting as selector in the first step, but also applies to any other estimator, for example various forms of thresholded lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of lasso or ols post lasso, but it dominates these procedures as well as traditional thresholding in a wide variety of experiments

    Inference for High-Dimensional Sparse Econometric Models

    Full text link
    This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on 1\ell_1-penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression

    Inference in Additively Separable Models With a High-Dimensional Set of Conditioning Variables

    Full text link
    This paper studies nonparametric series estimation and inference for the effect of a single variable of interest x on an outcome y in the presence of potentially high-dimensional conditioning variables z. The context is an additively separable model E[y|x, z] = g0(x) + h0(z). The model is high-dimensional in the sense that the series of approximating functions for h0(z) can have more terms than the sample size, thereby allowing z to have potentially very many measured characteristics. The model is required to be approximately sparse: h0(z) can be approximated using only a small subset of series terms whose identities are unknown. This paper proposes an estimation and inference method for g0(x) called Post-Nonparametric Double Selection which is a generalization of Post-Double Selection. Standard rates of convergence and asymptotic normality for the estimator are shown to hold uniformly over a large class of sparse data generating processes. A simulation study illustrates finite sample estimation properties of the proposed estimator and coverage properties of the corresponding confidence intervals. Finally, an empirical application to college admissions policy demonstrates the practical implementation of the proposed method
    corecore