2,443 research outputs found

    The Augmented Synthetic Control Method

    Full text link
    The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings. The "synthetic control" is a weighted average of control units that balances the treated unit's pre-treatment outcomes as closely as possible. A critical feature of the original proposal is to use SCM only when the fit on pre-treatment outcomes is excellent. We propose Augmented SCM as an extension of SCM to settings where such pre-treatment fit is infeasible. Analogous to bias correction for inexact matching, Augmented SCM uses an outcome model to estimate the bias due to imperfect pre-treatment fit and then de-biases the original SCM estimate. Our main proposal, which uses ridge regression as the outcome model, directly controls pre-treatment fit while minimizing extrapolation from the convex hull. This estimator can also be expressed as a solution to a modified synthetic controls problem that allows negative weights on some donor units. We bound the estimation error of this approach under different data generating processes, including a linear factor model, and show how regularization helps to avoid over-fitting to noise. We demonstrate gains from Augmented SCM with extensive simulation studies and apply this framework to estimate the impact of the 2012 Kansas tax cuts on economic growth. We implement the proposed method in the new augsynth R package

    Post-Selection Inference for Generalized Linear Models with Many Controls

    Full text link
    This paper considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunize against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest α0\alpha_0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate α0\alpha_0 at the root-nn rate when the total number pp of other regressors, called controls, potentially exceed the sample size nn using sparsity assumptions. The sparsity assumption means that there is a subset of s<ns<n controls which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over ss-sparse models satisfying s2log2p=o(n)s^2\log^2 p = o(n) and other technical conditions. These procedures do not rely on traditional consistent model selection arguments for their validity. In fact, they are robust with respect to moderate model selection mistakes in variable selection. Under suitable conditions, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper

    Beyond Support in Two-Stage Variable Selection

    Full text link
    Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give two examples of procedures that can benefit from the proposed transfer of information, in estimation and inference problems respectively. Extensive simulations demonstrate that this transfer is particularly efficient when each stage operates on distinct subsamples. This separation plays a crucial role for the computation of calibrated p-values, allowing to control the False Discovery Rate. In this setup, the proposed transfer results in sensitivity gains ranging from 50% to 100% compared to state-of-the-art
    corecore