Search CORE

19,434 research outputs found

Post-Selection Inference for Generalized Linear Models with Many Controls

Author: Belloni Alexandre
Chernozhukov Victor
Wei Ying
Publication venue
Publication date: 21/03/2016
Field of study

This paper considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunize against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest

\alpha_0

, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate

\alpha_0

at the root-

n

rate when the total number

p

of other regressors, called controls, potentially exceed the sample size

n

using sparsity assumptions. The sparsity assumption means that there is a subset of

s<n

controls which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over

s

-sparse models satisfying

s^2\log^2 p = o(n)

and other technical conditions. These procedures do not rely on traditional consistent model selection arguments for their validity. In fact, they are robust with respect to moderate model selection mistakes in variable selection. Under suitable conditions, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper

arXiv.org e-Print Archive

FigShare

Robustness in sparse linear models: relative efficiency based on robust approximate message passing

Author: Bradic Jelena
Publication venue
Publication date: 30/07/2015
Field of study

Understanding efficiency in high dimensional linear models is a longstanding problem of interest. Classical work with smaller dimensional problems dating back to Huber and Bickel has illustrated the benefits of efficient loss functions. When the number of parameters

p

is of the same order as the sample size

n

p \approx n

, an efficiency pattern different from the one of Huber was recently established. In this work, we consider the effects of model selection on the estimation efficiency of penalized methods. In particular, we explore whether sparsity, results in new efficiency patterns when

p > n

. In the interest of deriving the asymptotic mean squared error for regularized M-estimators, we use the powerful framework of approximate message passing. We propose a novel, robust and sparse approximate message passing algorithm (RAMP), that is adaptive to the error distribution. Our algorithm includes many non-quadratic and non-differentiable loss functions. We derive its asymptotic mean squared error and show its convergence, while allowing

p, n, s \to \infty

, with

n/p \in (0,1)

and

n/s \in (1,\infty)

. We identify new patterns of relative efficiency regarding a number of penalized

M

estimators, when

p

is much larger than

n

. We show that the classical information bound is no longer reachable, even for light--tailed error distributions. We show that the penalized least absolute deviation estimator dominates the penalized least square estimator, in cases of heavy--tailed distributions. We observe this pattern for all choices of the number of non-zero parameters

s

, both

s \leq n

and

s \approx n

. In non-penalized problems where

s =p \approx n

, the opposite regime holds. Therefore, we discover that the presence of model selection significantly changes the efficiency patterns.Comment: 49 pages, 10 figure

arXiv.org e-Print Archive

Ezid

eScholarship - University of California

Inference for High-Dimensional Sparse Econometric Models

Author: Belloni Alexandre
Chernozhukov Victor
Hansen Christian
Publication venue
Publication date: 01/01/2011
Field of study

This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on

\ell_1

-penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression

arXiv.org e-Print Archive

CiteSeerX