803 research outputs found
Justifying additive-noise-model based causal discovery via algorithmic information theory
A recent method for causal discovery is in many cases able to infer whether X
causes Y or Y causes X for just two observed variables X and Y. It is based on
the observation that there exist (non-Gaussian) joint distributions P(X,Y) for
which Y may be written as a function of X up to an additive noise term that is
independent of X and no such model exists from Y to X. Whenever this is the
case, one prefers the causal model X--> Y.
Here we justify this method by showing that the causal hypothesis Y--> X is
unlikely because it requires a specific tuning between P(Y) and P(X|Y) to
generate a distribution that admits an additive noise model from X to Y. To
quantify the amount of tuning required we derive lower bounds on the
algorithmic information shared by P(Y) and P(X|Y). This way, our justification
is consistent with recent approaches for using algorithmic information theory
for causal reasoning. We extend this principle to the case where P(X,Y) almost
admits an additive noise model.
Our results suggest that the above conclusion is more reliable if the
complexity of P(Y) is high.Comment: 17 pages, 1 Figur
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes
Identifiability of Causal Graphs using Functional Models
This work addresses the following question: Under what assumptions on the
data generating process can one infer the causal graph from the joint
distribution? The approach taken by conditional independence-based causal
discovery methods is based on two assumptions: the Markov condition and
faithfulness. It has been shown that under these assumptions the causal graph
can be identified up to Markov equivalence (some arrows remain undirected)
using methods like the PC algorithm. In this work we propose an alternative by
defining Identifiable Functional Model Classes (IFMOCs). As our main theorem we
prove that if the data generating process belongs to an IFMOC, one can identify
the complete causal graph. To the best of our knowledge this is the first
identifiability result of this kind that is not limited to linear functional
relationships. We discuss how the IFMOC assumption and the Markov and
faithfulness assumptions relate to each other and explain why we believe that
the IFMOC assumption can be tested more easily on given data. We further
provide a practical algorithm that recovers the causal graph from finitely many
data; experiments on simulated data support the theoretical findings
Justifying Information-Geometric Causal Inference
Information Geometric Causal Inference (IGCI) is a new approach to
distinguish between cause and effect for two variables. It is based on an
independence assumption between input distribution and causal mechanism that
can be phrased in terms of orthogonality in information space. We describe two
intuitive reinterpretations of this approach that makes IGCI more accessible to
a broader audience.
Moreover, we show that the described independence is related to the
hypothesis that unsupervised learning and semi-supervised learning only works
for predicting the cause from the effect and not vice versa.Comment: 3 Figure
Detecting confounding in multivariate linear models via spectral analysis
We study a model where one target variable Y is correlated with a vector
X:=(X_1,...,X_d) of predictor variables being potential causes of Y. We
describe a method that infers to what extent the statistical dependences
between X and Y are due to the influence of X on Y and to what extent due to a
hidden common cause (confounder) of X and Y. The method relies on concentration
of measure results for large dimensions d and an independence assumption
stating that, in the absence of confounding, the vector of regression
coefficients describing the influence of each X on Y typically has `generic
orientation' relative to the eigenspaces of the covariance matrix of X. For the
special case of a scalar confounder we show that confounding typically spoils
this generic orientation in a characteristic way that can be used to
quantitatively estimate the amount of confounding.Comment: 27 pages, 16 figure
Causal Inference on Discrete Data using Additive Noise Models
Inferring the causal structure of a set of random variables from a finite
sample of the joint distribution is an important problem in science. Recently,
methods using additive noise models have been suggested to approach the case of
continuous variables. In many situations, however, the variables of interest
are discrete or even have only finitely many states. In this work we extend the
notion of additive noise models to these cases. We prove that whenever the
joint distribution \prob^{(X,Y)} admits such a model in one direction, e.g.
Y=f(X)+N, N \independent X, it does not admit the reversed model
X=g(Y)+\tilde N, \tilde N \independent Y as long as the model is chosen in a
generic way. Based on these deliberations we propose an efficient new algorithm
that is able to distinguish between cause and effect for a finite sample of
discrete variables. In an extensive experimental study we show that this
algorithm works both on synthetic and real data sets
Causal Discovery with Continuous Additive Noise Models
We consider the problem of learning causal directed acyclic graphs from an
observational joint distribution. One can use these graphs to predict the
outcome of interventional experiments, from which data are often not available.
We show that if the observational distribution follows a structural equation
model with an additive noise structure, the directed acyclic graph becomes
identifiable from the distribution under mild conditions. This constitutes an
interesting alternative to traditional methods that assume faithfulness and
identify only the Markov equivalence class of the graph, thus leaving some
edges undirected. We provide practical algorithms for finitely many samples,
RESIT (Regression with Subsequent Independence Test) and two methods based on
an independence score. We prove that RESIT is correct in the population setting
and provide an empirical evaluation
- …