2,443 research outputs found
The Augmented Synthetic Control Method
The synthetic control method (SCM) is a popular approach for estimating the
impact of a treatment on a single unit in panel data settings. The "synthetic
control" is a weighted average of control units that balances the treated
unit's pre-treatment outcomes as closely as possible. A critical feature of the
original proposal is to use SCM only when the fit on pre-treatment outcomes is
excellent. We propose Augmented SCM as an extension of SCM to settings where
such pre-treatment fit is infeasible. Analogous to bias correction for inexact
matching, Augmented SCM uses an outcome model to estimate the bias due to
imperfect pre-treatment fit and then de-biases the original SCM estimate. Our
main proposal, which uses ridge regression as the outcome model, directly
controls pre-treatment fit while minimizing extrapolation from the convex hull.
This estimator can also be expressed as a solution to a modified synthetic
controls problem that allows negative weights on some donor units. We bound the
estimation error of this approach under different data generating processes,
including a linear factor model, and show how regularization helps to avoid
over-fitting to noise. We demonstrate gains from Augmented SCM with extensive
simulation studies and apply this framework to estimate the impact of the 2012
Kansas tax cuts on economic growth. We implement the proposed method in the new
augsynth R package
Post-Selection Inference for Generalized Linear Models with Many Controls
This paper considers generalized linear models in the presence of many
controls. We lay out a general methodology to estimate an effect of interest
based on the construction of an instrument that immunize against model
selection mistakes and apply it to the case of logistic binary choice model.
More specifically we propose new methods for estimating and constructing
confidence regions for a regression parameter of primary interest , a
parameter in front of the regressor of interest, such as the treatment variable
or a policy variable. These methods allow to estimate at the
root- rate when the total number of other regressors, called controls,
potentially exceed the sample size using sparsity assumptions. The sparsity
assumption means that there is a subset of controls which suffices to
accurately approximate the nuisance part of the regression function.
Importantly, the estimators and these resulting confidence regions are valid
uniformly over -sparse models satisfying and other
technical conditions. These procedures do not rely on traditional consistent
model selection arguments for their validity. In fact, they are robust with
respect to moderate model selection mistakes in variable selection. Under
suitable conditions, the estimators are semi-parametrically efficient in the
sense of attaining the semi-parametric efficiency bounds for the class of
models in this paper
Beyond Support in Two-Stage Variable Selection
Numerous variable selection methods rely on a two-stage procedure, where a
sparsity-inducing penalty is used in the first stage to predict the support,
which is then conveyed to the second stage for estimation or inference
purposes. In this framework, the first stage screens variables to find a set of
possibly relevant variables and the second stage operates on this set of
candidate variables, to improve estimation accuracy or to assess the
uncertainty associated to the selection of variables. We advocate that more
information can be conveyed from the first stage to the second one: we use the
magnitude of the coefficients estimated in the first stage to define an
adaptive penalty that is applied at the second stage. We give two examples of
procedures that can benefit from the proposed transfer of information, in
estimation and inference problems respectively. Extensive simulations
demonstrate that this transfer is particularly efficient when each stage
operates on distinct subsamples. This separation plays a crucial role for the
computation of calibrated p-values, allowing to control the False Discovery
Rate. In this setup, the proposed transfer results in sensitivity gains ranging
from 50% to 100% compared to state-of-the-art
- …