5,411 research outputs found
Separators and Adjustment Sets in Causal Graphs: Complete Criteria and an Algorithmic Framework
Principled reasoning about the identifiability of causal effects from
non-experimental data is an important application of graphical causal models.
This paper focuses on effects that are identifiable by covariate adjustment, a
commonly used estimation approach. We present an algorithmic framework for
efficiently testing, constructing, and enumerating -separators in ancestral
graphs (AGs), a class of graphical causal models that can represent uncertainty
about the presence of latent confounders. Furthermore, we prove a reduction
from causal effect identification by covariate adjustment to -separation in
a subgraph for directed acyclic graphs (DAGs) and maximal ancestral graphs
(MAGs). Jointly, these results yield constructive criteria that characterize
all adjustment sets as well as all minimal and minimum adjustment sets for
identification of a desired causal effect with multivariate exposures and
outcomes in the presence of latent confounding. Our results extend several
existing solutions for special cases of these problems. Our efficient
algorithms allowed us to empirically quantify the identifiability gap between
covariate adjustment and the do-calculus in random DAGs and MAGs, covering a
wide range of scenarios. Implementations of our algorithms are provided in the
R package dagitty.Comment: 52 pages, 20 figures, 12 table
Estimation of Direct and Indirect Causal Effects in Longitudinal Studies
The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders, Robins & Greenland (1992) and Pearl (2000), develop two identifiability results for direct and indirect causal effects. They define an individual direct effect as the counterfactual effect of a treatment on an outcome when the intermediate variable is set at the value it would have had if the individual had not been treated, and the population direct effect as the mean of these individual counterfactual direct effects. The identifiability result developed by Robins & Greenland (1992) relies on an additional ``No-Interaction Assumption\u27\u27, while the identifiability result developed by Pearl (2000) relies on a particular assumption about conditional independence in the population being sampled. Both assumptions are considered very restrictive. As a result, estimation of direct and indirect effects has been considered infeasible in many settings. We show that the identifiability result of Pearl (2000), also holds under a new conditional independence assumption which states that, within strata of baseline covariates, the individual direct effect at a fixed level of the intermediate variable is independent of the no-treatment counterfactual intermediate variable. We argue that our assumption is typically less restrictive than both the assumption of Pearl (2000), and the ``No-interaction Assumption\u27\u27 of Robins & Greenland (1992). We also generalize the current definition of the direct (and indirect) effect of a treatment as the population mean of individual counterfactual direct (and indirect) effects to 1) a general parameter of the population distribution of individual counterfactual direct (and indirect) effects, and 2) change of a general parameter of the population distribution of the appropriate counterfactual treatment-specific outcome. Subsequently, we generalize our identifiability result for the mean to identifiability results for these generally defined direct effects. We also discuss methods for modelling, testing, and estimation, and we illustrate our results throughout using an example drawn from the treatment of HIV infection
A Primer on Causality in Data Science
Many questions in Data Science are fundamentally causal in that our objective
is to learn the effect of some exposure, randomized or not, on an outcome
interest. Even studies that are seemingly non-causal, such as those with the
goal of prediction or prevalence estimation, have causal elements, including
differential censoring or measurement. As a result, we, as Data Scientists,
need to consider the underlying causal mechanisms that gave rise to the data,
rather than simply the pattern or association observed in those data. In this
work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to
provide an introduction to some key concepts in causal inference. Similar to
other causal frameworks, the steps of the Roadmap include clearly stating the
scientific question, defining of the causal model, translating the scientific
question into a causal parameter, assessing the assumptions needed to express
the causal parameter as a statistical estimand, implementation of statistical
estimators including parametric and semi-parametric methods, and interpretation
of our findings. We believe that using such a framework in Data Science will
help to ensure that our statistical analyses are guided by the scientific
question driving our research, while avoiding over-interpreting our results. We
focus on the effect of an exposure occurring at a single time point and
highlight the use of targeted maximum likelihood estimation (TMLE) with Super
Learner.Comment: 26 pages (with references); 4 figure
Identifiability of Subgroup Causal Effects in Randomized Experiments with Nonignorable Missing Covariates
Although randomized experiments are widely regarded as the gold standard for
estimating causal effects, missing data of the pretreatment covariates makes it
challenging to estimate the subgroup causal effects. When the missing data
mechanism of the covariates is nonignorable, the parameters of interest are
generally not pointly identifiable, and we can only get bounds for the
parameters of interest, which may be too wide for practical use. In some real
cases, we have prior knowledge that some restrictions may be plausible. We show
the identifiability of the causal effects and joint distributions for four
interpretable missing data mechanisms, and evaluate the performance of the
statistical inference via simulation studies. One application of our methods to
a real data set from a randomized clinical trial shows that one of the
nonignorable missing data mechanisms fits better than the ignorable missing
data mechanism, and the results conform to the study's original expert
opinions. We also illustrate the potential applications of our methods to
observational studies using a data set from a job-training program.Comment: Statistics in Medicine (2014
- …