19,434 research outputs found
Post-Selection Inference for Generalized Linear Models with Many Controls
This paper considers generalized linear models in the presence of many
controls. We lay out a general methodology to estimate an effect of interest
based on the construction of an instrument that immunize against model
selection mistakes and apply it to the case of logistic binary choice model.
More specifically we propose new methods for estimating and constructing
confidence regions for a regression parameter of primary interest , a
parameter in front of the regressor of interest, such as the treatment variable
or a policy variable. These methods allow to estimate at the
root- rate when the total number of other regressors, called controls,
potentially exceed the sample size using sparsity assumptions. The sparsity
assumption means that there is a subset of controls which suffices to
accurately approximate the nuisance part of the regression function.
Importantly, the estimators and these resulting confidence regions are valid
uniformly over -sparse models satisfying and other
technical conditions. These procedures do not rely on traditional consistent
model selection arguments for their validity. In fact, they are robust with
respect to moderate model selection mistakes in variable selection. Under
suitable conditions, the estimators are semi-parametrically efficient in the
sense of attaining the semi-parametric efficiency bounds for the class of
models in this paper
Robustness in sparse linear models: relative efficiency based on robust approximate message passing
Understanding efficiency in high dimensional linear models is a longstanding
problem of interest. Classical work with smaller dimensional problems dating
back to Huber and Bickel has illustrated the benefits of efficient loss
functions. When the number of parameters is of the same order as the sample
size , , an efficiency pattern different from the one of Huber
was recently established. In this work, we consider the effects of model
selection on the estimation efficiency of penalized methods. In particular, we
explore whether sparsity, results in new efficiency patterns when . In
the interest of deriving the asymptotic mean squared error for regularized
M-estimators, we use the powerful framework of approximate message passing. We
propose a novel, robust and sparse approximate message passing algorithm
(RAMP), that is adaptive to the error distribution. Our algorithm includes many
non-quadratic and non-differentiable loss functions. We derive its asymptotic
mean squared error and show its convergence, while allowing , with and . We identify new
patterns of relative efficiency regarding a number of penalized estimators,
when is much larger than . We show that the classical information bound
is no longer reachable, even for light--tailed error distributions. We show
that the penalized least absolute deviation estimator dominates the penalized
least square estimator, in cases of heavy--tailed distributions. We observe
this pattern for all choices of the number of non-zero parameters , both and . In non-penalized problems where ,
the opposite regime holds. Therefore, we discover that the presence of model
selection significantly changes the efficiency patterns.Comment: 49 pages, 10 figure
Inference for High-Dimensional Sparse Econometric Models
This article is about estimation and inference methods for high dimensional
sparse (HDS) regression models in econometrics. High dimensional sparse models
arise in situations where many regressors (or series terms) are available and
the regression function is well-approximated by a parsimonious, yet unknown set
of regressors. The latter condition makes it possible to estimate the entire
regression function effectively by searching for approximately the right set of
regressors. We discuss methods for identifying this set of regressors and
estimating their coefficients based on -penalization and describe key
theoretical results. In order to capture realistic practical situations, we
expressly allow for imperfect selection of regressors and study the impact of
this imperfect selection on estimation and inference results. We focus the main
part of the article on the use of HDS models and methods in the instrumental
variables model and the partially linear model. We present a set of novel
inference results for these models and illustrate their use with applications
to returns to schooling and growth regression
- β¦