Search CORE

2,206 research outputs found

Conditionals, causality and conditional probability

Author: Schulz K.
van Rooij R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Causal inference using the algorithmic Markov condition

Author: Janzing Dominik
Schoelkopf Bernhard
Publication venue
Publication date: 01/01/2008
Field of study

Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.Comment: 16 figure

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Distinguishing Cause and Effect via Second Order Exponential Models

Author: Janzing Dominik
Schoelkopf Bernhard
Sun Xiaohai
Publication venue
Publication date: 01/01/2009
Field of study

We propose a method to infer causal structures containing both discrete and continuous variables. The idea is to select causal hypotheses for which the conditional density of every variable, given its causes, becomes smooth. We define a family of smooth densities and conditional densities by second order exponential models, i.e., by maximizing conditional entropy subject to first and second statistical moments. If some of the variables take only values in proper subsets of R^n, these conditionals can induce different families of joint distributions even for Markov-equivalent graphs. We consider the case of one binary and one real-valued variable where the method can distinguish between cause and effect. Using this example, we describe that sometimes a causal hypothesis must be rejected because P(effect|cause) and P(cause) share algorithmic information (which is untypical if they are chosen independently). This way, our method is in the same spirit as faithfulness-based causal inference because it also rejects non-generic mutual adjustments among DAG-parameters.Comment: 36 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Ifs, though, and because

Author: Rott Hans
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1986
Field of study

University of Regensburg Publication Server

Justifying additive-noise-model based causal discovery via algorithmic information theory

Author: Janzing Dominik
Steudel Bastian
Publication venue
Publication date: 09/10/2009
Field of study

A recent method for causal discovery is in many cases able to infer whether X causes Y or Y causes X for just two observed variables X and Y. It is based on the observation that there exist (non-Gaussian) joint distributions P(X,Y) for which Y may be written as a function of X up to an additive noise term that is independent of X and no such model exists from Y to X. Whenever this is the case, one prefers the causal model X--> Y. Here we justify this method by showing that the causal hypothesis Y--> X is unlikely because it requires a specific tuning between P(Y) and P(X|Y) to generate a distribution that admits an additive noise model from X to Y. To quantify the amount of tuning required we derive lower bounds on the algorithmic information shared by P(Y) and P(X|Y). This way, our justification is consistent with recent approaches for using algorithmic information theory for causal reasoning. We extend this principle to the case where P(X,Y) almost admits an additive noise model. Our results suggest that the above conclusion is more reliable if the complexity of P(Y) is high.Comment: 17 pages, 1 Figur

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

The Big Four - Their Interdependence and Limitations

Author: Silva Matheus
Publication venue
Publication date
Field of study

Four intuitions are recurrent and influential in theories about conditionals: the Ramsey’s test, the Adams’ Thesis, the Equation, and the robustness requirement. For simplicity’s sake, I call these intuitions ‘the big four’. My aim is to show that: (1) the big four are interdependent; (2) they express our inferential dispositions to employ a conditional on a modus ponens; (3) the disposition to employ conditionals on a modus ponens doesn’t have the epistemic significance that is usually attributed to it, since the acceptability or truth conditions of a conditional is not necessarily associated with its employability on a modus ponens

PhilPapers

Invariant Models for Causal Transfer Learning

Author: Peters Jonas
Rojas-Carulla Mateo
Schölkopf Bernhard
Turner Richard
Publication venue
Publication date: 01/01/2018
Field of study

Methods of transfer learning try to combine knowledge from several related tasks (or domains) to improve performance on a test task. Inspired by causal methodology, we relax the usual covariate shift assumption and assume that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks. We show how this assumption can be motivated from ideas in the field of causality. We focus on the problem of Domain Generalization, in which no examples from the test task are observed. We prove that in an adversarial setting using this subset for prediction is optimal in Domain Generalization; we further provide examples, in which the tasks are sufficiently diverse and the estimator therefore outperforms pooling the data, even on average. If examples from the test task are available, we also provide a method to transfer knowledge from the training tasks and exploit all available features for prediction. However, we provide no guarantees for this method. We introduce a practical method which allows for automatic inference of the above subset and provide corresponding code. We present results on synthetic data sets and a gene deletion data set

arXiv.org e-Print Archive

Copenhagen University Research Information System

MPG.PuRe