5,535 research outputs found
Combining multiple observational data sources to estimate causal effects
The era of big data has witnessed an increasing availability of multiple data
sources for statistical analyses. We consider estimation of causal effects
combining big main data with unmeasured confounders and smaller validation data
with supplementary information on these confounders. Under the unconfoundedness
assumption with completely observed confounders, the smaller validation data
allow for constructing consistent estimators for causal effects, but the big
main data can only give error-prone estimators in general. However, by
leveraging the information in the big main data in a principled way, we can
improve the estimation efficiencies yet preserve the consistencies of the
initial estimators based solely on the validation data. Our framework applies
to asymptotically normal estimators, including the commonly-used regression
imputation, weighting, and matching estimators, and does not require a correct
specification of the model relating the unmeasured confounders to the observed
variables. We also propose appropriate bootstrap procedures, which makes our
method straightforward to implement using software routines for existing
estimators
Data-driven Algorithms for Dimension Reduction in Causal Inference
In observational studies, the causal effect of a treatment may be confounded
with variables that are related to both the treatment and the outcome of
interest. In order to identify a causal effect, such studies often rely on the
unconfoundedness assumption, i.e., that all confounding variables are observed.
The choice of covariates to control for, which is primarily based on subject
matter knowledge, may result in a large covariate vector in the attempt to
ensure that unconfoundedness holds. However, including redundant covariates can
affect bias and efficiency of nonparametric causal effect estimators, e.g., due
to the curse of dimensionality. Data-driven algorithms for the selection of
sufficient covariate subsets are investigated. Under the assumption of
unconfoundedness the algorithms search for minimal subsets of the covariate
vector. Based, e.g., on the framework of sufficient dimension reduction or
kernel smoothing, the algorithms perform a backward elimination procedure
assessing the significance of each covariate. Their performance is evaluated in
simulations and an application using data from the Swedish Childhood Diabetes
Register is also presented.Comment: 27 pages, 2 figures, 11 table
Propensity Score Analysis with Matching Weights
The propensity score analysis is one of the most widely used methods for
studying the causal treatment effect in observational studies. This paper
studies treatment effect estimation with the method of matching weights. This
method resembles propensity score matching but offers a number of new features
including efficient estimation, rigorous variance calculation, simple
asymptotics, statistical tests of balance, clearly identified target population
with optimal sampling property, and no need for choosing matching algorithm and
caliper size. In addition, we propose the mirror histogram as a useful tool for
graphically displaying balance. The method also shares some features of the
inverse probability weighting methods, but the computation remains stable when
the propensity scores approach 0 or 1. An augmented version of the matching
weight estimator is developed that has the double robust property, i.e., the
estimator is consistent if either the outcome model or the propensity score
model is correct. In the numerical studies, the proposed methods demonstrated
better performance than many widely used propensity score analysis methods such
as stratification by quintiles, matching with propensity scores, and inverse
probability weighting
A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure
We often seek to estimate the impact of an exposure naturally occurring or
randomly assigned at the cluster-level. For example, the literature on
neighborhood determinants of health continues to grow. Likewise, community
randomized trials are applied to learn about real-world implementation,
sustainability, and population effects of interventions with proven
individual-level efficacy. In these settings, individual-level outcomes are
correlated due to shared cluster-level factors, including the exposure, as well
as social or biological interactions between individuals. To flexibly and
efficiently estimate the effect of a cluster-level exposure, we present two
targeted maximum likelihood estimators (TMLEs). The first TMLE is developed
under a non-parametric causal model, which allows for arbitrary interactions
between individuals within a cluster. These interactions include direct
transmission of the outcome (i.e. contagion) and influence of one individual's
covariates on another's outcome (i.e. covariate interference). The second TMLE
is developed under a causal sub-model assuming the cluster-level and
individual-specific covariates are sufficient to control for confounding.
Simulations compare the alternative estimators and illustrate the potential
gains from pairing individual-level risk factors and outcomes during
estimation, while avoiding unwarranted assumptions. Our results suggest that
estimation under the sub-model can result in bias and misleading inference in
an observational setting. Incorporating working assumptions during estimation
is more robust than assuming they hold in the underlying causal model. We
illustrate our approach with an application to HIV prevention and treatment
Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection
We consider two recent suggestions for how to perform an empirically
motivated Monte Carlo study to help select a treatment effect estimator under
unconfoundedness. We show theoretically that neither is likely to be
informative except under restrictive conditions that are unlikely to be
satisfied in many contexts. To test empirical relevance, we also apply the
approaches to a real-world setting where estimator performance is known. Both
approaches are worse than random at selecting estimators which minimise
absolute bias. They are better when selecting estimators that minimise mean
squared error. However, using a simple bootstrap is at least as good and often
better. For now researchers would be best advised to use a range of estimators
and compare estimates for robustness
Entropy balancing is doubly robust
Covariate balance is a conventional key diagnostic for methods used
estimating causal effects from observational studies. Recently, there is an
emerging interest in directly incorporating covariate balance in the
estimation. We study a recently proposed entropy maximization method called
Entropy Balancing (EB), which exactly matches the covariate moments for the
different experimental groups in its optimization problem. We show EB is doubly
robust with respect to linear outcome regression and logistic propensity score
regression, and it reaches the asymptotic semiparametric variance bound when
both regressions are correctly specified. This is surprising to us because
there is no attempt to model the outcome or the treatment assignment in the
original proposal of EB. Our theoretical results and simulations suggest that
EB is a very appealing alternative to the conventional weighting estimators
that estimate the propensity score by maximum likelihood.Comment: 23 pages, 6 figures, Journal of Causal Inference 201
- …