545 research outputs found
A General Algorithm for Deciding Transportability of Experimental Results
Generalizing empirical findings to new environments, settings, or populations
is essential in most scientific explorations. This article treats a particular
problem of generalizability, called "transportability", defined as a license to
transfer information learned in experimental studies to a different population,
on which only observational studies can be conducted. Given a set of
assumptions concerning commonalities and differences between the two
populations, Pearl and Bareinboim (2011) derived sufficient conditions that
permit such transfer to take place. This article summarizes their findings and
supplements them with an effective procedure for deciding when and how
transportability is feasible. It establishes a necessary and sufficient
condition for deciding when causal effects in the target population are
estimable from both the statistical information available and the causal
information transferred from the experiments. The article further provides a
complete algorithm for computing the transport formula, that is, a way of
combining observational and experimental information to synthesize bias-free
estimate of the desired causal relation. Finally, the article examines the
differences between transportability and other variants of generalizability
Surrogate Outcomes and Transportability
Identification of causal effects is one of the most fundamental tasks of
causal inference. We consider an identifiability problem where some
experimental and observational data are available but neither data alone is
sufficient for the identification of the causal effect of interest. Instead of
the outcome of interest, surrogate outcomes are measured in the experiments.
This problem is a generalization of identifiability using surrogate experiments
and we label it as surrogate outcome identifiability. We show that the concept
of transportability provides a sufficient criteria for determining surrogate
outcome identifiability for a large class of queries.Comment: This is the version published in the International Journal of
Approximate Reasonin
A Primer on Causality in Data Science
Many questions in Data Science are fundamentally causal in that our objective
is to learn the effect of some exposure, randomized or not, on an outcome
interest. Even studies that are seemingly non-causal, such as those with the
goal of prediction or prevalence estimation, have causal elements, including
differential censoring or measurement. As a result, we, as Data Scientists,
need to consider the underlying causal mechanisms that gave rise to the data,
rather than simply the pattern or association observed in those data. In this
work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to
provide an introduction to some key concepts in causal inference. Similar to
other causal frameworks, the steps of the Roadmap include clearly stating the
scientific question, defining of the causal model, translating the scientific
question into a causal parameter, assessing the assumptions needed to express
the causal parameter as a statistical estimand, implementation of statistical
estimators including parametric and semi-parametric methods, and interpretation
of our findings. We believe that using such a framework in Data Science will
help to ensure that our statistical analyses are guided by the scientific
question driving our research, while avoiding over-interpreting our results. We
focus on the effect of an exposure occurring at a single time point and
highlight the use of targeted maximum likelihood estimation (TMLE) with Super
Learner.Comment: 26 pages (with references); 4 figure
Causal Inference and Data-Fusion in Econometrics
Learning about cause and effect is arguably the main goal in applied
econometrics. In practice, the validity of these causal inferences is
contingent on a number of critical assumptions regarding the type of data that
has been collected and the substantive knowledge that is available. For
instance, unobserved confounding factors threaten the internal validity of
estimates, data availability is often limited to non-random, selection-biased
samples, causal effects need to be learned from surrogate experiments with
imperfect compliance, and causal knowledge has to be extrapolated across
structurally heterogeneous populations. A powerful causal inference framework
is required to tackle these challenges, which plague most data analysis to
varying degrees. Building on the structural approach to causality introduced by
Haavelmo (1943) and the graph-theoretic framework proposed by Pearl (1995), the
artificial intelligence (AI) literature has developed a wide array of
techniques for causal learning that allow to leverage information from various
imperfect, heterogeneous, and biased data sources (Bareinboim and Pearl, 2016).
In this paper, we discuss recent advances in this literature that have the
potential to contribute to econometric methodology along three dimensions.
First, they provide a unified and comprehensive framework for causal inference,
in which the aforementioned problems can be addressed in full generality.
Second, due to their origin in AI, they come together with sound, efficient,
and complete algorithmic criteria for automatization of the corresponding
identification task. And third, because of the nonparametric description of
structural models that graph-theoretic approaches build on, they combine the
strengths of both structural econometrics as well as the potential outcomes
framework, and thus offer a perfect middle ground between these two competing
literature streams.Comment: Abstract change
Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach
Peer reviewe
- …