43 research outputs found
Identifiability of Causal Graphs using Functional Models
This work addresses the following question: Under what assumptions on the
data generating process can one infer the causal graph from the joint
distribution? The approach taken by conditional independence-based causal
discovery methods is based on two assumptions: the Markov condition and
faithfulness. It has been shown that under these assumptions the causal graph
can be identified up to Markov equivalence (some arrows remain undirected)
using methods like the PC algorithm. In this work we propose an alternative by
defining Identifiable Functional Model Classes (IFMOCs). As our main theorem we
prove that if the data generating process belongs to an IFMOC, one can identify
the complete causal graph. To the best of our knowledge this is the first
identifiability result of this kind that is not limited to linear functional
relationships. We discuss how the IFMOC assumption and the Markov and
faithfulness assumptions relate to each other and explain why we believe that
the IFMOC assumption can be tested more easily on given data. We further
provide a practical algorithm that recovers the causal graph from finitely many
data; experiments on simulated data support the theoretical findings
Justifying Information-Geometric Causal Inference
Information Geometric Causal Inference (IGCI) is a new approach to
distinguish between cause and effect for two variables. It is based on an
independence assumption between input distribution and causal mechanism that
can be phrased in terms of orthogonality in information space. We describe two
intuitive reinterpretations of this approach that makes IGCI more accessible to
a broader audience.
Moreover, we show that the described independence is related to the
hypothesis that unsupervised learning and semi-supervised learning only works
for predicting the cause from the effect and not vice versa.Comment: 3 Figure
Two Optimal Strategies for Active Learning of Causal Models from Interventional Data
From observational data alone, a causal DAG is only identifiable up to Markov
equivalence. Interventional data generally improves identifiability; however,
the gain of an intervention strongly depends on the intervention target, that
is, the intervened variables. We present active learning (that is, optimal
experimental design) strategies calculating optimal interventions for two
different learning goals. The first one is a greedy approach using
single-vertex interventions that maximizes the number of edges that can be
oriented after each intervention. The second one yields in polynomial time a
minimum set of targets of arbitrary size that guarantees full identifiability.
This second approach proves a conjecture of Eberhardt (2008) indicating the
number of unbounded intervention targets which is sufficient and in the worst
case necessary for full identifiability. In a simulation study, we compare our
two active learning approaches to random interventions and an existing
approach, and analyze the influence of estimation errors on the overall
performance of active learning