3,373 research outputs found
Inference for High-Dimensional Sparse Econometric Models
This article is about estimation and inference methods for high dimensional
sparse (HDS) regression models in econometrics. High dimensional sparse models
arise in situations where many regressors (or series terms) are available and
the regression function is well-approximated by a parsimonious, yet unknown set
of regressors. The latter condition makes it possible to estimate the entire
regression function effectively by searching for approximately the right set of
regressors. We discuss methods for identifying this set of regressors and
estimating their coefficients based on -penalization and describe key
theoretical results. In order to capture realistic practical situations, we
expressly allow for imperfect selection of regressors and study the impact of
this imperfect selection on estimation and inference results. We focus the main
part of the article on the use of HDS models and methods in the instrumental
variables model and the partially linear model. We present a set of novel
inference results for these models and illustrate their use with applications
to returns to schooling and growth regression
Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach
Here we present an expository, general analysis of valid post-selection or
post-regularization inference about a low-dimensional target parameter,
, in the presence of a very high-dimensional nuisance parameter,
, which is estimated using modern selection or regularization methods.
Our analysis relies on high-level, easy-to-interpret conditions that allow one
to clearly see the structures needed for achieving valid post-regularization
inference. Simple, readily verifiable sufficient conditions are provided for a
class of affine-quadratic models. We focus our discussion on estimation and
inference procedures based on using the empirical analog of theoretical
equations which identify . Within this structure,
we show that setting up such equations in a manner such that the
orthogonality/immunization condition at
the true parameter values is satisfied, coupled with plausible conditions on
the smoothness of and the quality of the estimator , guarantees
that inference on for the main parameter based on testing or point
estimation methods discussed below will be regular despite selection or
regularization biases occurring in estimation of . In particular, the
estimator of will often be uniformly consistent at the root- rate
and uniformly asymptotically normal even though estimators will
generally not be asymptotically linear and regular. The uniformity holds over
large classes of models that do not impose highly implausible "beta-min"
conditions. We also show that inference can be carried out by inverting tests
formed from Neyman's (orthogonal score) statistics.Comment: 47 page
High-Dimensional Metrics in R
The package High-dimensional Metrics (\Rpackage{hdm}) is an evolving
collection of statistical methods for estimation and quantification of
uncertainty in high-dimensional approximately sparse models. It focuses on
providing confidence intervals and significance testing for (possibly many)
low-dimensional subcomponents of the high-dimensional parameter vector.
Efficient estimators and uniformly valid confidence intervals for regression
coefficients on target variables (e.g., treatment or policy variable) in a
high-dimensional approximately sparse regression model, for average treatment
effect (ATE) and average treatment effect for the treated (ATET), as well for
extensions of these parameters to the endogenous setting are provided. Theory
grounded, data-driven methods for selecting the penalization parameter in Lasso
regressions under heteroscedastic and non-Gaussian errors are implemented.
Moreover, joint/ simultaneous confidence intervals for regression coefficients
of a high-dimensional sparse regression are implemented, including a joint
significance test for Lasso regression. Data sets which have been used in the
literature and might be useful for classroom demonstration and for testing new
estimators are included. \R and the package \Rpackage{hdm} are open-source
software projects and can be freely downloaded from CRAN:
\texttt{http://cran.r-project.org}.Comment: 34 pages; vignette for the R package hdm, available at
http://cran.r-project.org/web/packages/hdm/ and
http://r-forge.r-project.org/R/?group_id=2084 (development version
A lava attack on the recovery of sums of dense and sparse signals
Common high-dimensional methods for prediction rely on having either a sparse
signal model, a model in which most parameters are zero and there are a small
number of non-zero parameters that are large in magnitude, or a dense signal
model, a model with no large parameters and very many small non-zero
parameters. We consider a generalization of these two basic models, termed here
a "sparse+dense" model, in which the signal is given by the sum of a sparse
signal and a dense signal. Such a structure poses problems for traditional
sparse estimators, such as the lasso, and for traditional dense estimation
methods, such as ridge estimation. We propose a new penalization-based method,
called lava, which is computationally efficient. With suitable choices of
penalty parameters, the proposed method strictly dominates both lasso and
ridge. We derive analytic expressions for the finite-sample risk function of
the lava estimator in the Gaussian sequence model. We also provide an deviation
bound for the prediction risk in the Gaussian regression model with fixed
design. In both cases, we provide Stein's unbiased estimator for lava's
prediction risk. A simulation example compares the performance of lava to
lasso, ridge, and elastic net in a regression example using feasible,
data-dependent penalty parameters and illustrates lava's improved performance
relative to these benchmarks
Estimation of treatment effects with high-dimensional controls
We propose methods for inference on the average effect of a treatment on a scalar outcome in the presence of very many controls. Our setting is a partially linear regression model containing the treatment/policy variable and a large number p of controls or series terms, with p that is possibly much larger than the sample size n, but where only s << n unknown controls or series terms are needed to approximate the regression function accurately. The latter sparsity condition makes it possible to estimate the entire regression function as well as the average treatment effect by selecting an approximately the right set of controls using Lasso and related methods. We develop estimation and inference methods for the average treatment effect in this setting, proposing a novel "post double selection" method that provides attractive inferential and estimation properties. In our analysis, in order to cover realistic applications, we expressly allow for imperfect selection of the controls and account for the impact of selection errors on estimation and inference. In order to cover typical applications in economics, we employ the selection methods designed to deal with non-Gaussian and heteroscedastic disturbances. We illustrate the use of new methods with numerical simulations and an application to the effect of abortion on crime rates.
Inference for high-dimensional sparse econometric models
This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on l1 -penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression.
- …