research

Machine Learning for Set-Identified Linear Models

Abstract

This paper provides estimation and inference methods for an identified set where the selection among a very large number of covariates is based on modern machine learning tools. I characterize the boundary of the identified set (i.e., support function) using a semiparametric moment condition. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the support function and propose a weighted bootstrap procedure to conduct inference about the identified set. I provide a general method to construct a Neyman-orthogonal moment condition for the support function. Applying my method to Lee (2008)'s endogenous selection model, I provide the asymptotic theory for the sharp (i.e., the tightest possible) bounds on the Average Treatment Effect in the presence of high-dimensional covariates. Furthermore, I relax the conventional monotonicity assumption and allow the sign of the treatment effect on the selection (e.g., employment) to be determined by covariates. Using JobCorps data set with very rich baseline characteristics, I substantially tighten the bounds on the JobCorps effect on wages under weakened monotonicity assumption

    Similar works

    Full text

    thumbnail-image

    Available Versions