5,535 research outputs found

    Combining multiple observational data sources to estimate causal effects

    Full text link
    The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators

    Data-driven Algorithms for Dimension Reduction in Causal Inference

    Full text link
    In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.Comment: 27 pages, 2 figures, 11 table

    Propensity Score Analysis with Matching Weights

    Get PDF
    The propensity score analysis is one of the most widely used methods for studying the causal treatment effect in observational studies. This paper studies treatment effect estimation with the method of matching weights. This method resembles propensity score matching but offers a number of new features including efficient estimation, rigorous variance calculation, simple asymptotics, statistical tests of balance, clearly identified target population with optimal sampling property, and no need for choosing matching algorithm and caliper size. In addition, we propose the mirror histogram as a useful tool for graphically displaying balance. The method also shares some features of the inverse probability weighting methods, but the computation remains stable when the propensity scores approach 0 or 1. An augmented version of the matching weight estimator is developed that has the double robust property, i.e., the estimator is consistent if either the outcome model or the propensity score model is correct. In the numerical studies, the proposed methods demonstrated better performance than many widely used propensity score analysis methods such as stratification by quintiles, matching with propensity scores, and inverse probability weighting

    A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

    Full text link
    We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment

    Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection

    Get PDF
    We consider two recent suggestions for how to perform an empirically motivated Monte Carlo study to help select a treatment effect estimator under unconfoundedness. We show theoretically that neither is likely to be informative except under restrictive conditions that are unlikely to be satisfied in many contexts. To test empirical relevance, we also apply the approaches to a real-world setting where estimator performance is known. Both approaches are worse than random at selecting estimators which minimise absolute bias. They are better when selecting estimators that minimise mean squared error. However, using a simple bootstrap is at least as good and often better. For now researchers would be best advised to use a range of estimators and compare estimates for robustness

    Entropy balancing is doubly robust

    Full text link
    Covariate balance is a conventional key diagnostic for methods used estimating causal effects from observational studies. Recently, there is an emerging interest in directly incorporating covariate balance in the estimation. We study a recently proposed entropy maximization method called Entropy Balancing (EB), which exactly matches the covariate moments for the different experimental groups in its optimization problem. We show EB is doubly robust with respect to linear outcome regression and logistic propensity score regression, and it reaches the asymptotic semiparametric variance bound when both regressions are correctly specified. This is surprising to us because there is no attempt to model the outcome or the treatment assignment in the original proposal of EB. Our theoretical results and simulations suggest that EB is a very appealing alternative to the conventional weighting estimators that estimate the propensity score by maximum likelihood.Comment: 23 pages, 6 figures, Journal of Causal Inference 201
    • …
    corecore