3,373 research outputs found

    Broadcasting and the Antitrust Laws

    Get PDF

    The Case Against Secret Evidence

    Get PDF

    Inference for High-Dimensional Sparse Econometric Models

    Full text link
    This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on 1\ell_1-penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression

    Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach

    Full text link
    Here we present an expository, general analysis of valid post-selection or post-regularization inference about a low-dimensional target parameter, α\alpha, in the presence of a very high-dimensional nuisance parameter, η\eta, which is estimated using modern selection or regularization methods. Our analysis relies on high-level, easy-to-interpret conditions that allow one to clearly see the structures needed for achieving valid post-regularization inference. Simple, readily verifiable sufficient conditions are provided for a class of affine-quadratic models. We focus our discussion on estimation and inference procedures based on using the empirical analog of theoretical equations M(α,η)=0M(\alpha, \eta)=0 which identify α\alpha. Within this structure, we show that setting up such equations in a manner such that the orthogonality/immunization condition ηM(α,η)=0\partial_\eta M(\alpha, \eta) = 0 at the true parameter values is satisfied, coupled with plausible conditions on the smoothness of MM and the quality of the estimator η^\hat \eta, guarantees that inference on for the main parameter α\alpha based on testing or point estimation methods discussed below will be regular despite selection or regularization biases occurring in estimation of η\eta. In particular, the estimator of α\alpha will often be uniformly consistent at the root-nn rate and uniformly asymptotically normal even though estimators η^\hat \eta will generally not be asymptotically linear and regular. The uniformity holds over large classes of models that do not impose highly implausible "beta-min" conditions. We also show that inference can be carried out by inverting tests formed from Neyman's C(α)C(\alpha) (orthogonal score) statistics.Comment: 47 page

    High-Dimensional Metrics in R

    Full text link
    The package High-dimensional Metrics (\Rpackage{hdm}) is an evolving collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented, including a joint significance test for Lasso regression. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included. \R and the package \Rpackage{hdm} are open-source software projects and can be freely downloaded from CRAN: \texttt{http://cran.r-project.org}.Comment: 34 pages; vignette for the R package hdm, available at http://cran.r-project.org/web/packages/hdm/ and http://r-forge.r-project.org/R/?group_id=2084 (development version

    A lava attack on the recovery of sums of dense and sparse signals

    Full text link
    Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of non-zero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small non-zero parameters. We consider a generalization of these two basic models, termed here a "sparse+dense" model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide an deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein's unbiased estimator for lava's prediction risk. A simulation example compares the performance of lava to lasso, ridge, and elastic net in a regression example using feasible, data-dependent penalty parameters and illustrates lava's improved performance relative to these benchmarks

    Estimation of treatment effects with high-dimensional controls

    Get PDF
    We propose methods for inference on the average effect of a treatment on a scalar outcome in the presence of very many controls. Our setting is a partially linear regression model containing the treatment/policy variable and a large number p of controls or series terms, with p that is possibly much larger than the sample size n, but where only s << n unknown controls or series terms are needed to approximate the regression function accurately. The latter sparsity condition makes it possible to estimate the entire regression function as well as the average treatment effect by selecting an approximately the right set of controls using Lasso and related methods. We develop estimation and inference methods for the average treatment effect in this setting, proposing a novel "post double selection" method that provides attractive inferential and estimation properties. In our analysis, in order to cover realistic applications, we expressly allow for imperfect selection of the controls and account for the impact of selection errors on estimation and inference. In order to cover typical applications in economics, we employ the selection methods designed to deal with non-Gaussian and heteroscedastic disturbances. We illustrate the use of new methods with numerical simulations and an application to the effect of abortion on crime rates.

    Inference for high-dimensional sparse econometric models

    Get PDF
    This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on l1 -penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression.
    corecore