5 research outputs found
Experiment Selection for Causal Discovery
Randomized controlled experiments are often described as the most reliable tool available to scientists
for discovering causal relationships among quantities of interest. However, it is often unclear
how many and which different experiments are needed to identify the full (possibly cyclic) causal
structure among some given (possibly causally insufficient) set of variables. Recent results in the
causal discovery literature have explored various identifiability criteria that depend on the assumptions
one is able to make about the underlying causal process, but these criteria are not directly
constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions
already exist in the combinatorics literature, albeit under terminology which is unfamiliar to
most of the causal discovery community. In this paper we translate the theoretical results and apply
them to the concrete problem of experiment selection. For a variety of settings we give explicit
constructions of the optimal set of experiments and adapt some of the general combinatorics results
to answer questions relating to the problem of experiment selection
Causal discovery of linear cyclic models from multiple experimental data sets with overlapping variables
Much of scientific data is collected as randomized experiments intervening on some and observing other variables of interest. Quite often, a given phenomenon is investigated in several studies, and different sets of variables are involved in each study. In this article we consider the problem of integrating such knowledge, inferring as much as possible concerning the underlying causal structure with respect to the union of observed variables from such experimental or passive observational overlapping data sets. We do not assume acyclicity or joint causal sufficiency of the underlying data generating model, but we do restrict the causal relationships to be linear and use only second order statistics of the data. We derive conditions for full model identifiability in the most generic case, and provide novel techniques for incorporating an assumption of faithfulness to aid in inference. In each case we seek to establish what is and what is not determined by the data at hand.