145 research outputs found
On Causal and Anticausal Learning
We consider the problem of function estimation in the case where an
underlying causal model can be inferred. This has implications for popular
scenarios such as covariate shift, concept drift, transfer learning and
semi-supervised learning. We argue that causal knowledge may facilitate some
approaches for a given problem, and rule out others. In particular, we
formulate a hypothesis for when semi-supervised learning can help, and
corroborate it with empirical results.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012). arXiv admin note: substantial text overlap with
arXiv:1112.273
Towards a Learning Theory of Cause-Effect Inference
We pose causal inference as the problem of learning to classify probability
distributions. In particular, we assume access to a collection
, where each is a sample drawn from the
probability distribution of , and is a binary label
indicating whether "" or "". Given these data,
we build a causal inference rule in two steps. First, we featurize each
using the kernel mean embedding associated with some characteristic kernel.
Second, we train a binary classifier on such embeddings to distinguish between
causal directions. We present generalization bounds showing the statistical
consistency and learning rates of the proposed approach, and provide a simple
implementation that achieves state-of-the-art cause-effect inference.
Furthermore, we extend our ideas to infer causal relationships between more
than two variables
Justifying Information-Geometric Causal Inference
Information Geometric Causal Inference (IGCI) is a new approach to
distinguish between cause and effect for two variables. It is based on an
independence assumption between input distribution and causal mechanism that
can be phrased in terms of orthogonality in information space. We describe two
intuitive reinterpretations of this approach that makes IGCI more accessible to
a broader audience.
Moreover, we show that the described independence is related to the
hypothesis that unsupervised learning and semi-supervised learning only works
for predicting the cause from the effect and not vice versa.Comment: 3 Figure
Semi-Supervised Learning, Causality and the Conditional Cluster Assumption
While the success of semi-supervised learning (SSL) is still not fully
understood, Sch\"olkopf et al. (2012) have established a link to the principle
of independent causal mechanisms. They conclude that SSL should be impossible
when predicting a target variable from its causes, but possible when predicting
it from its effects. Since both these cases are somewhat restrictive, we extend
their work by considering classification using cause and effect features at the
same time, such as predicting disease from both risk factors and symptoms.
While standard SSL exploits information contained in the marginal distribution
of all inputs (to improve the estimate of the conditional distribution of the
target given inputs), we argue that in our more general setting we should use
information in the conditional distribution of effect features given causal
features. We explore how this insight generalises the previous understanding,
and how it relates to and can be exploited algorithmically for SSL.Comment: 36th Conference on Uncertainty in Artificial Intelligence (2020)
(Previously presented at the NeurIPS 2019 workshop "Do the right thing":
machine learning and causal inference for improved decision making,
Vancouver, Canada.
Learning Independent Causal Mechanisms
Statistical learning relies upon data sampled from a distribution, and we
usually do not care what actually generated it in the first place. From the
point of view of causal modeling, the structure of each distribution is induced
by physical mechanisms that give rise to dependences between observables.
Mechanisms, however, can be meaningful autonomous modules of generative models
that make sense beyond a particular entailed data distribution, lending
themselves to transfer between problems. We develop an algorithm to recover a
set of independent (inverse) mechanisms from a set of transformed data points.
The approach is unsupervised and based on a set of experts that compete for
data generated by the mechanisms, driving specialization. We analyze the
proposed method in a series of experiments on image data. Each expert learns to
map a subset of the transformed data back to a reference distribution. The
learned mechanisms generalize to novel domains. We discuss implications for
transfer learning and links to recent trends in generative modeling.Comment: ICML 201
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes
Telling cause from effect in deterministic linear dynamical systems
Inferring a cause from its effect using observed time series data is a major
challenge in natural and social sciences. Assuming the effect is generated by
the cause trough a linear system, we propose a new approach based on the
hypothesis that nature chooses the "cause" and the "mechanism that generates
the effect from the cause" independent of each other. We therefore postulate
that the power spectrum of the time series being the cause is uncorrelated with
the square of the transfer function of the linear filter generating the effect.
While most causal discovery methods for time series mainly rely on the noise,
our method relies on asymmetries of the power spectral density properties that
can be exploited even in the context of deterministic systems. We describe
mathematical assumptions in a deterministic model under which the causal
direction is identifiable with this approach. We also discuss the method's
performance under the additive noise model and its relationship to Granger
causality. Experiments show encouraging results on synthetic as well as
real-world data. Overall, this suggests that the postulate of Independence of
Cause and Mechanism is a promising principle for causal inference on empirical
time series.Comment: This article is under review for a peer-reviewed conferenc
- …