1,008 research outputs found

    The Augmented Synthetic Control Method

    Full text link
    The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings. The "synthetic control" is a weighted average of control units that balances the treated unit's pre-treatment outcomes as closely as possible. A critical feature of the original proposal is to use SCM only when the fit on pre-treatment outcomes is excellent. We propose Augmented SCM as an extension of SCM to settings where such pre-treatment fit is infeasible. Analogous to bias correction for inexact matching, Augmented SCM uses an outcome model to estimate the bias due to imperfect pre-treatment fit and then de-biases the original SCM estimate. Our main proposal, which uses ridge regression as the outcome model, directly controls pre-treatment fit while minimizing extrapolation from the convex hull. This estimator can also be expressed as a solution to a modified synthetic controls problem that allows negative weights on some donor units. We bound the estimation error of this approach under different data generating processes, including a linear factor model, and show how regularization helps to avoid over-fitting to noise. We demonstrate gains from Augmented SCM with extensive simulation studies and apply this framework to estimate the impact of the 2012 Kansas tax cuts on economic growth. We implement the proposed method in the new augsynth R package

    Using Balancing Weights to Target the Treatment Effect on the Treated when Overlap is Poor

    Full text link
    Inverse probability weights are commonly used in epidemiology to estimate causal effects in observational studies. Researchers can typically focus on either the average treatment effect or the average treatment effect on the treated with inverse probability weighting estimators. However, when overlap between the treated and control groups is poor, this can produce extreme weights that can result in biased estimates and large variances. One alternative to inverse probability weights are overlap weights, which target the population with the most overlap on observed characteristics. While estimates based on overlap weights produce less bias in such contexts, the causal estimand can be difficult to interpret. One alternative to inverse probability weights are balancing weights, which directly target imbalances during the estimation process. Here, we explore whether balancing weights allow analysts to target the average treatment effect on the treated in cases where inverse probability weights are biased due to poor overlap. We conduct three simulation studies and an empirical application. We find that in many cases, balancing weights allow the analyst to still target the average treatment effect on the treated even when overlap is poor. We show that while overlap weights remain a key tool for estimating causal effects, more familiar estimands can be targeted by using balancing weights instead of inverse probability weights

    Locally Testable Codes and Cayley Graphs

    Full text link
    We give two new characterizations of (\F_2-linear) locally testable error-correcting codes in terms of Cayley graphs over \F_2^h: \begin{enumerate} \item A locally testable code is equivalent to a Cayley graph over \F_2^h whose set of generators is significantly larger than hh and has no short linear dependencies, but yields a shortest-path metric that embeds into 1\ell_1 with constant distortion. This extends and gives a converse to a result of Khot and Naor (2006), which showed that codes with large dual distance imply Cayley graphs that have no low-distortion embeddings into 1\ell_1. \item A locally testable code is equivalent to a Cayley graph over \F_2^h that has significantly more than hh eigenvalues near 1, which have no short linear dependencies among them and which "explain" all of the large eigenvalues. This extends and gives a converse to a recent construction of Barak et al. (2012), which showed that locally testable codes imply Cayley graphs that are small-set expanders but have many large eigenvalues. \end{enumerate}Comment: 22 page

    Using Multiple Outcomes to Improve the Synthetic Control Method

    Full text link
    When there are multiple outcome series of interest, Synthetic Control analyses typically proceed by estimating separate weights for each outcome. In this paper, we instead propose estimating a common set of weights across outcomes, by balancing either a vector of all outcomes or an index or average of them. Under a low-rank factor model, we show that these approaches lead to lower bias bounds than separate weights, and that averaging leads to further gains when the number of outcomes grows. We illustrate this via simulation and in a re-analysis of the impact of the Flint water crisis on educational outcomes.Comment: 36 pages, 6 figure

    Policy Learning with Asymmetric Counterfactual Utilities

    Full text link
    Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility function is more properly characterized by the joint set of potential outcomes under all actions. For example, the Hippocratic principle to "do no harm" implies that the cost of causing death to a patient who would otherwise survive without treatment is greater than the cost of forgoing life-saving treatment. We consider optimal policy learning with asymmetric counterfactual utility functions of this form that consider the joint set of potential outcomes. We show that asymmetric counterfactual utilities lead to an unidentifiable expected utility function, and so we first partially identify it. Drawing on statistical decision theory, we then derive minimax decision rules by minimizing the maximum expected utility loss relative to different alternative policies. We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems, and establish that the finite sample excess expected utility loss of this procedure is bounded by the regret of these intermediate classifiers. We apply this conceptual framework and methodology to the decision about whether or not to use right heart catheterization for patients with possible pulmonary hypertension

    Aggregation-fragmentation-diffusion model for trail dynamics

    Get PDF
    We investigate statistical properties of trails formed by a random process incorporating aggregation, fragmentation, and diffusion. In this stochastic process, which takes place in one spatial dimension, two neighboring trails may combine to form a larger one, and also one trail may split into two. In addition, trails move diffusively. The model is defined by two parameters which quantify the fragmentation rate and the fragment size. In the long-time limit, the system reaches a steady state, and our focus is the limiting distribution of trail weights. We find that the density of trail weight has power-law tail P(w)~w-γ for small weight w. We obtain the exponent γ analytically and find that it varies continuously with the two model parameters. The exponent γ can be positive or negative, so that in one range of parameters small-weight trails are abundant and in the complementary range they are rare

    Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

    Full text link
    Algorithmic and data-driven decisions and recommendations are commonly used in high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are common but difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.Comment: 40 pages, 19 figure
    corecore