2,206 research outputs found
Causal inference using the algorithmic Markov condition
Inferring the causal structure that links n observables is usually based upon
detecting statistical dependences and choosing simple graphs that make the
joint measure Markovian. Here we argue why causal inference is also possible
when only single observations are present.
We develop a theory how to generate causal graphs explaining similarities
between single objects. To this end, we replace the notion of conditional
stochastic independence in the causal Markov condition with the vanishing of
conditional algorithmic mutual information and describe the corresponding
causal inference rules.
We explain why a consistent reformulation of causal inference in terms of
algorithmic complexity implies a new inference principle that takes into
account also the complexity of conditional probability densities, making it
possible to select among Markov equivalent causal graphs. This insight provides
a theoretical foundation of a heuristic principle proposed in earlier work.
We also discuss how to replace Kolmogorov complexity with decidable
complexity criteria. This can be seen as an algorithmic analog of replacing the
empirically undecidable question of statistical independence with practical
independence tests that are based on implicit or explicit assumptions on the
underlying distribution.Comment: 16 figure
Distinguishing Cause and Effect via Second Order Exponential Models
We propose a method to infer causal structures containing both discrete and
continuous variables. The idea is to select causal hypotheses for which the
conditional density of every variable, given its causes, becomes smooth. We
define a family of smooth densities and conditional densities by second order
exponential models, i.e., by maximizing conditional entropy subject to first
and second statistical moments. If some of the variables take only values in
proper subsets of R^n, these conditionals can induce different families of
joint distributions even for Markov-equivalent graphs.
We consider the case of one binary and one real-valued variable where the
method can distinguish between cause and effect. Using this example, we
describe that sometimes a causal hypothesis must be rejected because
P(effect|cause) and P(cause) share algorithmic information (which is untypical
if they are chosen independently). This way, our method is in the same spirit
as faithfulness-based causal inference because it also rejects non-generic
mutual adjustments among DAG-parameters.Comment: 36 pages, 8 figure
Justifying additive-noise-model based causal discovery via algorithmic information theory
A recent method for causal discovery is in many cases able to infer whether X
causes Y or Y causes X for just two observed variables X and Y. It is based on
the observation that there exist (non-Gaussian) joint distributions P(X,Y) for
which Y may be written as a function of X up to an additive noise term that is
independent of X and no such model exists from Y to X. Whenever this is the
case, one prefers the causal model X--> Y.
Here we justify this method by showing that the causal hypothesis Y--> X is
unlikely because it requires a specific tuning between P(Y) and P(X|Y) to
generate a distribution that admits an additive noise model from X to Y. To
quantify the amount of tuning required we derive lower bounds on the
algorithmic information shared by P(Y) and P(X|Y). This way, our justification
is consistent with recent approaches for using algorithmic information theory
for causal reasoning. We extend this principle to the case where P(X,Y) almost
admits an additive noise model.
Our results suggest that the above conclusion is more reliable if the
complexity of P(Y) is high.Comment: 17 pages, 1 Figur
The Big Four - Their Interdependence and Limitations
Four intuitions are recurrent and influential in theories about conditionals: the Ramsey’s test, the Adams’ Thesis, the Equation, and the robustness requirement. For simplicity’s sake, I call these intuitions ‘the big four’. My aim is to show that: (1) the big four are interdependent; (2) they express our inferential dispositions to employ a conditional on a modus ponens; (3) the disposition to employ conditionals on a modus ponens doesn’t have the epistemic significance that is usually attributed to it, since the acceptability or truth conditions of a conditional is not necessarily associated with its employability on a modus ponens
Invariant Models for Causal Transfer Learning
Methods of transfer learning try to combine knowledge from several related
tasks (or domains) to improve performance on a test task. Inspired by causal
methodology, we relax the usual covariate shift assumption and assume that it
holds true for a subset of predictor variables: the conditional distribution of
the target variable given this subset of predictors is invariant over all
tasks. We show how this assumption can be motivated from ideas in the field of
causality. We focus on the problem of Domain Generalization, in which no
examples from the test task are observed. We prove that in an adversarial
setting using this subset for prediction is optimal in Domain Generalization;
we further provide examples, in which the tasks are sufficiently diverse and
the estimator therefore outperforms pooling the data, even on average. If
examples from the test task are available, we also provide a method to transfer
knowledge from the training tasks and exploit all available features for
prediction. However, we provide no guarantees for this method. We introduce a
practical method which allows for automatic inference of the above subset and
provide corresponding code. We present results on synthetic data sets and a
gene deletion data set
- …