1,270 research outputs found
Inference on Treatment Effects After Selection Amongst High-Dimensional Controls
We propose robust methods for inference on the effect of a treatment variable
on a scalar outcome in the presence of very many controls. Our setting is a
partially linear model with possibly non-Gaussian and heteroscedastic
disturbances. Our analysis allows the number of controls to be much larger than
the sample size. To make informative inference feasible, we require the model
to be approximately sparse; that is, we require that the effect of confounding
factors can be controlled for up to a small approximation error by conditioning
on a relatively small number of controls whose identities are unknown. The
latter condition makes it possible to estimate the treatment effect by
selecting approximately the right set of controls. We develop a novel
estimation and uniformly valid inference method for the treatment effect in
this setting, called the "post-double-selection" method. Our results apply to
Lasso-type methods used for covariate selection as well as to any other model
selection method that is able to find a sparse model with good approximation
properties.
The main attractive feature of our method is that it allows for imperfect
selection of the controls and provides confidence intervals that are valid
uniformly across a large class of models. In contrast, standard post-model
selection estimators fail to provide uniform inference even in simple cases
with a small, fixed number of controls. Thus our method resolves the problem of
uniform inference after model selection for a large, interesting class of
models. We illustrate the use of the developed methods with numerical
simulations and an application to the effect of abortion on crime rates
On the Computational Complexity of MCMC-based Estimators in Large Samples
In this paper we examine the implications of the statistical large sample
theory for the computational complexity of Bayesian and quasi-Bayesian
estimation carried out using Metropolis random walks. Our analysis is motivated
by the Laplace-Bernstein-Von Mises central limit theorem, which states that in
large samples the posterior or quasi-posterior approaches a normal density.
Using the conditions required for the central limit theorem to hold, we
establish polynomial bounds on the computational complexity of general
Metropolis random walks methods in large samples. Our analysis covers cases
where the underlying log-likelihood or extremum criterion function is possibly
non-concave, discontinuous, and with increasing parameter dimension. However,
the central limit theorem restricts the deviations from continuity and
log-concavity of the log-likelihood or extremum criterion function in a very
specific manner.
Under minimal assumptions required for the central limit theorem to hold
under the increasing parameter dimension, we show that the Metropolis algorithm
is theoretically efficient even for the canonical Gaussian walk which is
studied in detail. Specifically, we show that the running time of the algorithm
in large samples is bounded in probability by a polynomial in the parameter
dimension , and, in particular, is of stochastic order in the leading
cases after the burn-in period. We then give applications to exponential
families, curved exponential families, and Z-estimation of increasing
dimension.Comment: 36 pages, 2 figure
Least squares after model selection in high-dimensional sparse models
In this article we study post-model selection estimators that apply ordinary
least squares (OLS) to the model selected by first-step penalized estimators,
typically Lasso. It is well known that Lasso can estimate the nonparametric
regression function at nearly the oracle rate, and is thus hard to improve
upon. We show that the OLS post-Lasso estimator performs at least as well as
Lasso in terms of the rate of convergence, and has the advantage of a smaller
bias. Remarkably, this performance occurs even if the Lasso-based model
selection "fails" in the sense of missing some components of the "true"
regression model. By the "true" model, we mean the best s-dimensional
approximation to the nonparametric regression function chosen by the oracle.
Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso,
in the sense of a strictly faster rate of convergence, if the Lasso-based model
selection correctly includes all components of the "true" model as a subset and
also achieves sufficient sparsity. In the extreme case, when Lasso perfectly
selects the "true" model, the OLS post-Lasso estimator becomes the oracle
estimator. An important ingredient in our analysis is a new sparsity bound on
the dimension of the model selected by Lasso, which guarantees that this
dimension is at most of the same order as the dimension of the "true" model.
Our rate results are nonasymptotic and hold in both parametric and
nonparametric models. Moreover, our analysis is not limited to the Lasso
estimator acting as a selector in the first step, but also applies to any other
estimator, for example, various forms of thresholded Lasso, with good rates and
good sparsity properties. Our analysis covers both traditional thresholding and
a new practical, data-driven thresholding scheme that induces additional
sparsity subject to maintaining a certain goodness of fit. The latter scheme
has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it
dominates those procedures as well as traditional thresholding in a wide
variety of experiments.Comment: Published in at http://dx.doi.org/10.3150/11-BEJ410 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Post-l1-penalized estimators in high-dimensional linear regression models
In this paper we study post-penalized estimators which apply ordinary, unpenalized linear regression to the model selected by first-step penalized estimators, typically LASSO. It is well known that LASSO can estimate the regression function at nearly the oracle rate, and is thus hard to improve upon. We show that post-LASSO performs at least as well as LASSO in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the LASSO-based model selection 'fails' in the sense of missing some components of the 'true' regression model. By the 'true' model we mean here the best s-dimensional approximation to the regression function chosen by the oracle. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the 'true' model as a subset and also achieves a sufficient sparsity. In the extreme case, when LASSO perfectly selects the 'true' model, the post-LASSO estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by LASSO which guarantees that this dimension is at most of the same order as the dimension of the 'true' model. Our rate results are non-asymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the LASSO estimator in the first step, but also applies to other estimators, for example, the trimmed LASSO, Dantzig selector, or any other estimator with good rates and good sparsity. Our analysis covers both traditional trimming and a new practical, completely data-driven trimming scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of LASSO or post-LASSO, but it dominates these procedures as well as traditional trimming in a wide variety of experiments.
On the computational complexity of MCMC-based estimators in large samples
In this paper we examine the implications of the statistical large sample theory for the computational complexity of Bayesian and quasi-Bayesian estimation carried out using Metropolis random walks. Our analysis is motivated by the Laplace-Bernstein-Von Mises central limit theorem, which states that in large samples the posterior or quasi-posterior approaches a normal density. Using this observation, we establish polynomial bounds on the computational complexity of general Metropolis random walks methods in large samples. Our analysis covers cases, where the underlying log-likelihood or extremum criterion function is possibly nonconcave, discontinuous, and of increasing dimension. However, the central limit theorem restricts the deviations from continuity and log-concavity of the log-likelihood or extremum criterion function in a very specific manner.
Inference for Extremal Conditional Quantile Models, with an Application to Market and Birthweight Risks
Quantile regression is an increasingly important empirical tool in economics
and other sciences for analyzing the impact of a set of regressors on the
conditional distribution of an outcome. Extremal quantile regression, or
quantile regression applied to the tails, is of interest in many economic and
financial applications, such as conditional value-at-risk, production
efficiency, and adjustment bands in (S,s) models. In this paper we provide
feasible inference tools for extremal conditional quantile models that rely
upon extreme value approximations to the distribution of self-normalized
quantile regression statistics. The methods are simple to implement and can be
of independent interest even in the non-regression case. We illustrate the
results with two empirical examples analyzing extreme fluctuations of a stock
return and extremely low percentiles of live infants' birthweights in the range
between 250 and 1500 grams.Comment: 41 pages, 9 figure
L1-Penalized quantile regression in high-dimensional sparse models
We consider median regression and, more generally, quantile regression in high-dimensional sparse models. In these models the overall number of regressors p is very large, possibly larger than the sample size n, but only s of these regressors have non-zero impact on the conditional quantile of the response variable, where s grows slower than n. Since in this case the ordinary quantile regression is not consistent, we consider quantile regression penalized by the L1-norm of coefficients (L1-QR). First, we show that L1-QR is consistent at the rate of the square root of (s/n) log p, which is close to the oracle rate of the square root of (s/n), achievable when the minimal true model is known. The overall number of regressors p affects the rate only through the log p factor, thus allowing nearly exponential growth in the number of zero-impact regressors. The rate result holds under relatively weak conditions, requiring that s/n converges to zero at a super-logarithmic speed and that regularization parameter satisfies certain theoretical constraints. Second, we propose a pivotal, data-driven choice of the regularization parameter and show that it satisfies these theoretical constraints. Third, we show that L1-QR correctly selects the true minimal model as a valid submodel, when the non-zero coefficients of the true model are well separated from zero. We also show that the number of non-zero coefficients in L1-QR is of same stochastic order as s, the number of non-zero coefficients in the minimal true model. Fourth, we analyze the rate of convergence of a two-step estimator that applies ordinary quantile regression to the selected model. Fifth, we evaluate the performance of L1-QR in a Monte-Carlo experiment, and provide an application to the analysis of the international economic growth.
- ā¦