9,184 research outputs found
Pivotal estimation in high-dimensional regression via linear programming
We propose a new method of estimation in high-dimensional linear regression
model. It allows for very weak distributional assumptions including
heteroscedasticity, and does not require the knowledge of the variance of
random errors. The method is based on linear programming only, so that its
numerical implementation is faster than for previously known techniques using
conic programs, and it allows one to deal with higher dimensional models. We
provide upper bounds for estimation and prediction errors of the proposed
estimator showing that it achieves the same rate as in the more restrictive
situation of fixed design and i.i.d. Gaussian errors with known variance.
Following Gautier and Tsybakov (2011), we obtain the results under weaker
sensitivity assumptions than the restricted eigenvalue or assimilated
conditions
Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming
We propose a pivotal method for estimating high-dimensional sparse linear
regression models, where the overall number of regressors is large,
possibly much larger than , but only regressors are significant. The
method is a modification of the lasso, called the square-root lasso. The method
is pivotal in that it neither relies on the knowledge of the standard deviation
or nor does it need to pre-estimate . Moreover, the method
does not rely on normality or sub-Gaussianity of noise. It achieves near-oracle
performance, attaining the convergence rate in
the prediction norm, and thus matching the performance of the lasso with known
. These performance results are valid for both Gaussian and
non-Gaussian errors, under some mild moment restrictions. We formulate the
square-root lasso as a solution to a convex conic programming problem, which
allows us to implement the estimator using efficient algorithmic methods, such
as interior-point and first-order methods
Inference for High-Dimensional Sparse Econometric Models
This article is about estimation and inference methods for high dimensional
sparse (HDS) regression models in econometrics. High dimensional sparse models
arise in situations where many regressors (or series terms) are available and
the regression function is well-approximated by a parsimonious, yet unknown set
of regressors. The latter condition makes it possible to estimate the entire
regression function effectively by searching for approximately the right set of
regressors. We discuss methods for identifying this set of regressors and
estimating their coefficients based on -penalization and describe key
theoretical results. In order to capture realistic practical situations, we
expressly allow for imperfect selection of regressors and study the impact of
this imperfect selection on estimation and inference results. We focus the main
part of the article on the use of HDS models and methods in the instrumental
variables model and the partially linear model. We present a set of novel
inference results for these models and illustrate their use with applications
to returns to schooling and growth regression
Pivotal estimation via square-root Lasso in nonparametric regression
We propose a self-tuning method that simultaneously
resolves three important practical problems in high-dimensional regression
analysis, namely it handles the unknown scale, heteroscedasticity and (drastic)
non-Gaussianity of the noise. In addition, our analysis allows for badly
behaved designs, for example, perfectly collinear regressors, and generates
sharp bounds even in extreme cases, such as the infinite variance case and the
noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds
for including prediction norm rate and sparsity. Our
analysis is based on new impact factors that are tailored for bounding
prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely
on moderate deviation theory for self-normalized sums to achieve Gaussian-like
results under weak conditions. Moreover, we derive bounds on the performance of
ordinary least square (ols) applied to the model selected by accounting for possible misspecification of the selected model. Under
mild conditions, the rate of convergence of ols post
is as good as 's rate. As an application, we consider
the use of and ols post as
estimators of nuisance parameters in a generic semiparametric problem
(nonlinear moment condition or -problem), resulting in a construction of
-consistent and asymptotically normal estimators of the main
parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1204 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Exact Post Model Selection Inference for Marginal Screening
We develop a framework for post model selection inference, via marginal
screening, in linear regression. At the core of this framework is a result that
characterizes the exact distribution of linear functions of the response ,
conditional on the model being selected (``condition on selection" framework).
This allows us to construct valid confidence intervals and hypothesis tests for
regression coefficients that account for the selection procedure. In contrast
to recent work in high-dimensional statistics, our results are exact
(non-asymptotic) and require no eigenvalue-like assumptions on the design
matrix . Furthermore, the computational cost of marginal regression,
constructing confidence intervals and hypothesis testing is negligible compared
to the cost of linear regression, thus making our methods particularly suitable
for extremely large datasets. Although we focus on marginal screening to
illustrate the applicability of the condition on selection framework, this
framework is much more broadly applicable. We show how to apply the proposed
framework to several other selection procedures including orthogonal matching
pursuit, non-negative least squares, and marginal screening+Lasso
L1-Penalized quantile regression in high-dimensional sparse models
We consider median regression and, more generally, quantile regression in high-dimensional sparse models. In these models the overall number of regressors p is very large, possibly larger than the sample size n, but only s of these regressors have non-zero impact on the conditional quantile of the response variable, where s grows slower than n. Since in this case the ordinary quantile regression is not consistent, we consider quantile regression penalized by the L1-norm of coefficients (L1-QR). First, we show that L1-QR is consistent at the rate of the square root of (s/n) log p, which is close to the oracle rate of the square root of (s/n), achievable when the minimal true model is known. The overall number of regressors p affects the rate only through the log p factor, thus allowing nearly exponential growth in the number of zero-impact regressors. The rate result holds under relatively weak conditions, requiring that s/n converges to zero at a super-logarithmic speed and that regularization parameter satisfies certain theoretical constraints. Second, we propose a pivotal, data-driven choice of the regularization parameter and show that it satisfies these theoretical constraints. Third, we show that L1-QR correctly selects the true minimal model as a valid submodel, when the non-zero coefficients of the true model are well separated from zero. We also show that the number of non-zero coefficients in L1-QR is of same stochastic order as s, the number of non-zero coefficients in the minimal true model. Fourth, we analyze the rate of convergence of a two-step estimator that applies ordinary quantile regression to the selected model. Fifth, we evaluate the performance of L1-QR in a Monte-Carlo experiment, and provide an application to the analysis of the international economic growth.
Inference for high-dimensional sparse econometric models
This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on l1 -penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression.
- …