41 research outputs found
On Deducing Conditional Independence from d-Separation in Causal Graphs with Feedback (Research Note)
Pearl and Dechter (1996) claimed that the d-separation criterion for
conditional independence in acyclic causal networks also applies to networks of
discrete variables that have feedback cycles, provided that the variables of
the system are uniquely determined by the random disturbances. I show by
example that this is not true in general. Some condition stronger than
uniqueness is needed, such as the existence of a causal dynamics guaranteed to
lead to the unique solution
Discovering Cyclic Causal Models with Latent Variables: A General SAT-Based Procedure
We present a very general approach to learning the structure of causal models
based on d-separation constraints, obtained from any given set of overlapping
passive observational or experimental data sets. The procedure allows for both
directed cycles (feedback loops) and the presence of latent variables. Our
approach is based on a logical representation of causal pathways, which permits
the integration of quite general background knowledge, and inference is
performed using a Boolean satisfiability (SAT) solver. The procedure is
complete in that it exhausts the available information on whether any given
edge can be determined to be present or absent, and returns "unknown"
otherwise. Many existing constraint-based causal discovery algorithms can be
seen as special cases, tailored to circumstances in which one or more
restricting assumptions apply. Simulations illustrate the effect of these
assumptions on discovery and how the present algorithm scales.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Causal Discovery for Relational Domains: Representation, Reasoning, and Learning
Many domains are currently experiencing the growing trend to record and analyze massive, observational data sets with increasing complexity. A commonly made claim is that these data sets hold potential to transform their corresponding domains by providing previously unknown or unexpected explanations and enabling informed decision-making. However, only knowledge of the underlying causal generative process, as opposed to knowledge of associational patterns, can support such tasks.
Most methods for traditional causal discovery—the development of algorithms that learn causal structure from observational data—are restricted to representations that require limiting assumptions on the form of the data. Causal discovery has almost exclusively been applied to directed graphical models of propositional data that assume a single type of entity with independence among instances. However, most real-world domains are characterized by systems that involve complex interactions among multiple types of entities. Many state-of-the-art methods in statistics and machine learning that address such complex systems focus on learning associational models, and they are oftentimes mistakenly interpreted as causal. The intersection between causal discovery and machine learning in complex systems is small.
The primary objective of this thesis is to extend causal discovery to such complex systems. Specifically, I formalize a relational representation and model that can express the causal and probabilistic dependencies among the attributes of interacting, heterogeneous entities. I show that the traditional method for reasoning about statistical independence from model structure fails to accurately derive conditional independence facts from relational models. I introduce a new theory—relational d-separation—and a novel, lifted representation—the abstract ground graph—that supports a sound, complete, and computationally efficient method for algorithmically deriving conditional independencies from probabilistic models of relational data. The abstract ground graph representation also presents causal implications that enable the detection of causal direction for bivariate relational dependencies without parametric assumptions. I leverage these implications and the theoretical framework of relational d-separation to develop a sound and complete algorithm—the relational causal discovery (RCD) algorithm—that learns causal structure from relational data
Establishing Markov Equivalence in Cyclic Directed Graphs
We present a new, efficient procedure to establish Markov equivalence between
directed graphs that may or may not contain cycles under the
\textit{d}-separation criterion. It is based on the Cyclic Equivalence Theorem
(CET) in the seminal works on cyclic models by Thomas Richardson in the mid
'90s, but now rephrased from an ancestral perspective. The resulting
characterization leads to a procedure for establishing Markov equivalence
between graphs that no longer requires tests for d-separation, leading to a
significantly reduced algorithmic complexity. The conceptually simplified
characterization may help to reinvigorate theoretical research towards sound
and complete cyclic discovery in the presence of latent confounders. This
version includes a correction to rule (iv) in Theorem 1, and the subsequent
adjustment in part 2 of Algorithm 2.Comment: Correction to original version published at UAI-2023. Includes
additional experimental results and extended proof details in supplemen
Learning Optimal Causal Graphs with Exact Search
Peer reviewe
Causal Confusion in Imitation Learning
Behavioral cloning reduces policy learning to supervised learning by training
a discriminative model to predict expert actions given observations. Such
discriminative models are non-causal: the training procedure is unaware of the
causal structure of the interaction between the expert and the environment. We
point out that ignoring causality is particularly damaging because of the
distributional shift in imitation learning. In particular, it leads to a
counter-intuitive "causal misidentification" phenomenon: access to more
information can yield worse performance. We investigate how this problem
arises, and propose a solution to combat it through targeted
interventions---either environment interaction or expert queries---to determine
the correct causal model. We show that causal misidentification occurs in
several benchmark control domains as well as realistic driving settings, and
validate our solution against DAgger and other baselines and ablations.Comment: Published at NeurIPS 2019 9 pages, plus references and appendice
Markov Properties for Graphical Models with Cycles and Latent Variables
We investigate probabilistic graphical models that allow for both cycles and
latent variables. For this we introduce directed graphs with hyperedges
(HEDGes), generalizing and combining both marginalized directed acyclic graphs
(mDAGs) that can model latent (dependent) variables, and directed mixed graphs
(DMGs) that can model cycles. We define and analyse several different Markov
properties that relate the graphical structure of a HEDG with a probability
distribution on a corresponding product space over the set of nodes, for
example factorization properties, structural equations properties,
ordered/local/global Markov properties, and marginal versions of these. The
various Markov properties for HEDGes are in general not equivalent to each
other when cycles or hyperedges are present, in contrast with the simpler case
of directed acyclic graphical (DAG) models (also known as Bayesian networks).
We show how the Markov properties for HEDGes - and thus the corresponding
graphical Markov models - are logically related to each other.Comment: 131 page
Joint Causal Inference from Multiple Contexts
The gold standard for discovering causal relations is by means of
experimentation. Over the last decades, alternative methods have been proposed
that can infer causal relations between variables from certain statistical
patterns in purely observational data. We introduce Joint Causal Inference
(JCI), a novel approach to causal discovery from multiple data sets from
different contexts that elegantly unifies both approaches. JCI is a causal
modeling framework rather than a specific algorithm, and it can be implemented
using any causal discovery algorithm that can take into account certain
background knowledge. JCI can deal with different types of interventions (e.g.,
perfect, imperfect, stochastic, etc.) in a unified fashion, and does not
require knowledge of intervention targets or types in case of interventional
data. We explain how several well-known causal discovery algorithms can be seen
as addressing special cases of the JCI framework, and we also propose novel
implementations that extend existing causal discovery methods for purely
observational data to the JCI setting. We evaluate different JCI
implementations on synthetic data and on flow cytometry protein expression data
and conclude that JCI implementations can considerably outperform
state-of-the-art causal discovery algorithms.Comment: Final version, as published by JML
On the Foundations of Cycles in Bayesian Networks
Bayesian networks (BNs) are a probabilistic graphical model widely used for
representing expert knowledge and reasoning under uncertainty. Traditionally,
they are based on directed acyclic graphs that capture dependencies between
random variables. However, directed cycles can naturally arise when
cross-dependencies between random variables exist, e.g., for modeling feedback
loops. Existing methods to deal with such cross-dependencies usually rely on
reductions to BNs without cycles. These approaches are fragile to generalize,
since their justifications are intermingled with additional knowledge about the
application context. In this paper, we present a foundational study regarding
semantics for cyclic BNs that are generic and conservatively extend the
cycle-free setting. First, we propose constraint-based semantics that specify
requirements for full joint distributions over a BN to be consistent with the
local conditional probabilities and independencies. Second, two kinds of limit
semantics that formalize infinite unfolding approaches are introduced and shown
to be computable by a Markov chain construction.Comment: Full version with an appendix containing the proof