5,243 research outputs found
Causal inference using the algorithmic Markov condition
Inferring the causal structure that links n observables is usually based upon
detecting statistical dependences and choosing simple graphs that make the
joint measure Markovian. Here we argue why causal inference is also possible
when only single observations are present.
We develop a theory how to generate causal graphs explaining similarities
between single objects. To this end, we replace the notion of conditional
stochastic independence in the causal Markov condition with the vanishing of
conditional algorithmic mutual information and describe the corresponding
causal inference rules.
We explain why a consistent reformulation of causal inference in terms of
algorithmic complexity implies a new inference principle that takes into
account also the complexity of conditional probability densities, making it
possible to select among Markov equivalent causal graphs. This insight provides
a theoretical foundation of a heuristic principle proposed in earlier work.
We also discuss how to replace Kolmogorov complexity with decidable
complexity criteria. This can be seen as an algorithmic analog of replacing the
empirically undecidable question of statistical independence with practical
independence tests that are based on implicit or explicit assumptions on the
underlying distribution.Comment: 16 figure
Distinguishing Cause and Effect via Second Order Exponential Models
We propose a method to infer causal structures containing both discrete and
continuous variables. The idea is to select causal hypotheses for which the
conditional density of every variable, given its causes, becomes smooth. We
define a family of smooth densities and conditional densities by second order
exponential models, i.e., by maximizing conditional entropy subject to first
and second statistical moments. If some of the variables take only values in
proper subsets of R^n, these conditionals can induce different families of
joint distributions even for Markov-equivalent graphs.
We consider the case of one binary and one real-valued variable where the
method can distinguish between cause and effect. Using this example, we
describe that sometimes a causal hypothesis must be rejected because
P(effect|cause) and P(cause) share algorithmic information (which is untypical
if they are chosen independently). This way, our method is in the same spirit
as faithfulness-based causal inference because it also rejects non-generic
mutual adjustments among DAG-parameters.Comment: 36 pages, 8 figure
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes
Information-theoretic inference of common ancestors
A directed acyclic graph (DAG) partially represents the conditional
independence structure among observations of a system if the local Markov
condition holds, that is, if every variable is independent of its
non-descendants given its parents. In general, there is a whole class of DAGs
that represents a given set of conditional independence relations. We are
interested in properties of this class that can be derived from observations of
a subsystem only. To this end, we prove an information theoretic inequality
that allows for the inference of common ancestors of observed parts in any DAG
representing some unknown larger system. More explicitly, we show that a large
amount of dependence in terms of mutual information among the observations
implies the existence of a common ancestor that distributes this information.
Within the causal interpretation of DAGs our result can be seen as a
quantitative extension of Reichenbach's Principle of Common Cause to more than
two variables. Our conclusions are valid also for non-probabilistic
observations such as binary strings, since we state the proof for an
axiomatized notion of mutual information that includes the stochastic as well
as the algorithmic version.Comment: 18 pages, 4 figure
We Are Not Your Real Parents: Telling Causal from Confounded using MDL
Given data over variables we consider the problem of finding out whether jointly causes or whether they are all confounded by an unobserved latent variable . To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where causes and where there exists a latent variables confounding both and and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence
Justifying additive-noise-model based causal discovery via algorithmic information theory
A recent method for causal discovery is in many cases able to infer whether X
causes Y or Y causes X for just two observed variables X and Y. It is based on
the observation that there exist (non-Gaussian) joint distributions P(X,Y) for
which Y may be written as a function of X up to an additive noise term that is
independent of X and no such model exists from Y to X. Whenever this is the
case, one prefers the causal model X--> Y.
Here we justify this method by showing that the causal hypothesis Y--> X is
unlikely because it requires a specific tuning between P(Y) and P(X|Y) to
generate a distribution that admits an additive noise model from X to Y. To
quantify the amount of tuning required we derive lower bounds on the
algorithmic information shared by P(Y) and P(X|Y). This way, our justification
is consistent with recent approaches for using algorithmic information theory
for causal reasoning. We extend this principle to the case where P(X,Y) almost
admits an additive noise model.
Our results suggest that the above conclusion is more reliable if the
complexity of P(Y) is high.Comment: 17 pages, 1 Figur
Telling Cause from Effect using MDL-based Local and Global Regression
We consider the fundamental problem of inferring the causal direction between
two univariate numeric random variables and from observational data.
The two-variable case is especially difficult to solve since it is not possible
to use standard conditional independence tests between the variables.
To tackle this problem, we follow an information theoretic approach based on
Kolmogorov complexity and use the Minimum Description Length (MDL) principle to
provide a practical solution. In particular, we propose a compression scheme to
encode local and global functional relations using MDL-based regression. We
infer causes in case it is shorter to describe as a function of
than the inverse direction. In addition, we introduce Slope, an efficient
linear-time algorithm that through thorough empirical evaluation on both
synthetic and real world data we show outperforms the state of the art by a
wide margin.Comment: 10 pages, To appear in ICDM1
Identifiability of Causal Graphs using Functional Models
This work addresses the following question: Under what assumptions on the
data generating process can one infer the causal graph from the joint
distribution? The approach taken by conditional independence-based causal
discovery methods is based on two assumptions: the Markov condition and
faithfulness. It has been shown that under these assumptions the causal graph
can be identified up to Markov equivalence (some arrows remain undirected)
using methods like the PC algorithm. In this work we propose an alternative by
defining Identifiable Functional Model Classes (IFMOCs). As our main theorem we
prove that if the data generating process belongs to an IFMOC, one can identify
the complete causal graph. To the best of our knowledge this is the first
identifiability result of this kind that is not limited to linear functional
relationships. We discuss how the IFMOC assumption and the Markov and
faithfulness assumptions relate to each other and explain why we believe that
the IFMOC assumption can be tested more easily on given data. We further
provide a practical algorithm that recovers the causal graph from finitely many
data; experiments on simulated data support the theoretical findings
- …