19,434 research outputs found

    Post-Selection Inference for Generalized Linear Models with Many Controls

    Full text link
    This paper considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunize against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest α0\alpha_0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate α0\alpha_0 at the root-nn rate when the total number pp of other regressors, called controls, potentially exceed the sample size nn using sparsity assumptions. The sparsity assumption means that there is a subset of s<ns<n controls which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over ss-sparse models satisfying s2log⁑2p=o(n)s^2\log^2 p = o(n) and other technical conditions. These procedures do not rely on traditional consistent model selection arguments for their validity. In fact, they are robust with respect to moderate model selection mistakes in variable selection. Under suitable conditions, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper

    Robustness in sparse linear models: relative efficiency based on robust approximate message passing

    Full text link
    Understanding efficiency in high dimensional linear models is a longstanding problem of interest. Classical work with smaller dimensional problems dating back to Huber and Bickel has illustrated the benefits of efficient loss functions. When the number of parameters pp is of the same order as the sample size nn, pβ‰ˆnp \approx n, an efficiency pattern different from the one of Huber was recently established. In this work, we consider the effects of model selection on the estimation efficiency of penalized methods. In particular, we explore whether sparsity, results in new efficiency patterns when p>np > n. In the interest of deriving the asymptotic mean squared error for regularized M-estimators, we use the powerful framework of approximate message passing. We propose a novel, robust and sparse approximate message passing algorithm (RAMP), that is adaptive to the error distribution. Our algorithm includes many non-quadratic and non-differentiable loss functions. We derive its asymptotic mean squared error and show its convergence, while allowing p,n,sβ†’βˆžp, n, s \to \infty, with n/p∈(0,1)n/p \in (0,1) and n/s∈(1,∞)n/s \in (1,\infty). We identify new patterns of relative efficiency regarding a number of penalized MM estimators, when pp is much larger than nn. We show that the classical information bound is no longer reachable, even for light--tailed error distributions. We show that the penalized least absolute deviation estimator dominates the penalized least square estimator, in cases of heavy--tailed distributions. We observe this pattern for all choices of the number of non-zero parameters ss, both s≀ns \leq n and sβ‰ˆns \approx n. In non-penalized problems where s=pβ‰ˆns =p \approx n, the opposite regime holds. Therefore, we discover that the presence of model selection significantly changes the efficiency patterns.Comment: 49 pages, 10 figure

    Inference for High-Dimensional Sparse Econometric Models

    Full text link
    This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on β„“1\ell_1-penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression
    • …
    corecore