92 research outputs found
Semi-Supervised Learning, Causality and the Conditional Cluster Assumption
While the success of semi-supervised learning (SSL) is still not fully
understood, Sch\"olkopf et al. (2012) have established a link to the principle
of independent causal mechanisms. They conclude that SSL should be impossible
when predicting a target variable from its causes, but possible when predicting
it from its effects. Since both these cases are somewhat restrictive, we extend
their work by considering classification using cause and effect features at the
same time, such as predicting disease from both risk factors and symptoms.
While standard SSL exploits information contained in the marginal distribution
of all inputs (to improve the estimate of the conditional distribution of the
target given inputs), we argue that in our more general setting we should use
information in the conditional distribution of effect features given causal
features. We explore how this insight generalises the previous understanding,
and how it relates to and can be exploited algorithmically for SSL.Comment: 36th Conference on Uncertainty in Artificial Intelligence (2020)
(Previously presented at the NeurIPS 2019 workshop "Do the right thing":
machine learning and causal inference for improved decision making,
Vancouver, Canada.
Domain adaptation under structural causal models
Domain adaptation (DA) arises as an important problem in statistical machine
learning when the source data used to train a model is different from the
target data used to test the model. Recent advances in DA have mainly been
application-driven and have largely relied on the idea of a common subspace for
source and target data. To understand the empirical successes and failures of
DA methods, we propose a theoretical framework via structural causal models
that enables analysis and comparison of the prediction performance of DA
methods. This framework also allows us to itemize the assumptions needed for
the DA methods to have a low target error. Additionally, with insights from our
theory, we propose a new DA method called CIRM that outperforms existing DA
methods when both the covariates and label distributions are perturbed in the
target data. We complement the theoretical analysis with extensive simulations
to show the necessity of the devised assumptions. Reproducible synthetic and
real data experiments are also provided to illustrate the strengths and
weaknesses of DA methods when parts of the assumptions in our theory are
violated.Comment: 80 pages, 22 figures, accepted in JML
Invariant Models for Causal Transfer Learning
Methods of transfer learning try to combine knowledge from several related
tasks (or domains) to improve performance on a test task. Inspired by causal
methodology, we relax the usual covariate shift assumption and assume that it
holds true for a subset of predictor variables: the conditional distribution of
the target variable given this subset of predictors is invariant over all
tasks. We show how this assumption can be motivated from ideas in the field of
causality. We focus on the problem of Domain Generalization, in which no
examples from the test task are observed. We prove that in an adversarial
setting using this subset for prediction is optimal in Domain Generalization;
we further provide examples, in which the tasks are sufficiently diverse and
the estimator therefore outperforms pooling the data, even on average. If
examples from the test task are available, we also provide a method to transfer
knowledge from the training tasks and exploit all available features for
prediction. However, we provide no guarantees for this method. We introduce a
practical method which allows for automatic inference of the above subset and
provide corresponding code. We present results on synthetic data sets and a
gene deletion data set
Learning Invariant Representations under General Interventions on the Response
It has become increasingly common nowadays to collect observations of feature
and response pairs from different environments. As a consequence, one has to
apply learned predictors to data with a different distribution due to
distribution shifts. One principled approach is to adopt the structural causal
models to describe training and test models, following the invariance principle
which says that the conditional distribution of the response given its
predictors remains the same across environments. However, this principle might
be violated in practical settings when the response is intervened. A natural
question is whether it is still possible to identify other forms of invariance
to facilitate prediction in unseen environments. To shed light on this
challenging scenario, we focus on linear structural causal models (SCMs) and
introduce invariant matching property (IMP), an explicit relation to capture
interventions through an additional feature, leading to an alternative form of
invariance that enables a unified treatment of general interventions on the
response as well as the predictors. We analyze the asymptotic generalization
errors of our method under both the discrete and continuous environment
settings, where the continuous case is handled by relating it to the
semiparametric varying coefficient models. We present algorithms that show
competitive performance compared to existing methods over various experimental
settings including a COVID dataset.Comment: Accepted to the IEEE Journal on Selected Areas in Information Theory.
Special Issue: Causality: Fundamental Limits and Application
Causal Discovery Beyond Conditional Independences
Knowledge about causal relationships is important because it enables the prediction of the effects of interventions that perturb the observed system. Specifically, predicting the results of interventions amounts to the ability of answering questions like the following: if one or more variables are forced into a particular state, how will the probability distribution of the other variables be affected? Causal relationships can be identified through randomized experiments. However, such experiments may often be unethical, too expensive or even impossible to perform. The development of methods to infer causal relationships from observational rather than experimental data constitutes therefore a fundamental research topic. In this thesis, we address the prob- lem of causal discovery, that is, recovering the underlying causal structure based on the joint probability distribution of the observed random variables.
The causal graph cannot be determined by the observed joint distribution alone; additional causal assumptions, that link statistics to causality, are necessary. Under the Markov condition and the faithfulness assumption, conditional-independence-based methods estimate a set of Markov equiva- lent graphs. However, these methods cannot distinguish between two graphs belonging to the same Markov equivalence class. Alternative methods in- vestigate a different set of assumptions. A formal basis underlying these assumptions are functional models which model each variable as a function of its parents and some noise, with the noise variables assumed to be jointly independent. By restricting the function class, e.g., assuming additive noise, Markov equivalent graphs can become distinguishable. Variants of all afore- mentioned methods allow for the presence of confounders, which are unob- served common causes of two or more observed variables.
In this thesis, we present complementary causal discovery methods employ- ing different kind of assumptions than the ones mentioned above. The first part of this work concerns causal discovery allowing for the presence of con- founders. We first propose a method that detects the existence and identifies a finite-range confounder of a set of observed dependent variables. It is based on a kernel method to identify finite mixtures of nonparametric product dis- tributions. Next, a property of a conditional distribution, called purity, is introduced which is used for excluding the presence of a low-range confounder of two observed variables that completely explains their dependence (we call low-range a variable whose range has “small” cardinality).
We further study the problem of causal discovery in the two-variable case, but now assuming no confounders. To this end, we exploit the principle of inde- pendence of causal mechanisms that has been proposed in the literature. For the case of two variables, it states that, if X → Y (X causes Y ), then P (X ) and P(Y |X) do not contain information about each other. Instead, P(Y ) and P(X|Y ) may contain information about each other. Consequently, esti- mating P(Y |X) from P(X) should not be possible, while estimating P(X|Y ) based on P(Y) may be possible. We employ this asymmetry to propose a causal discovery method which decides upon the causal direction by compar- ing the accuracy of the estimations of P (Y |X ) and P (X |Y ).
Moreover, the principle of independence has implications for common ma- chine learning tasks such as semi-supervised learning, which are also dis- cussed in the current work.
Finally, the goal of the last part of this dissertation is to present empirical results on the performance of estimation procedures for causal discovery using Additive Noise Models (ANMs) in the two-variable case.
Experiments on synthetic and real data show that the algorithms proposed in this thesis often outperform state-of-the-art algorithms
Semi-generative modelling: learning with cause and effect features
We consider a case of covariate shift where prior causal inference or expert knowledge has identified some features as effects, and show how this setting, when analysed from a causal perspective, gives rise to a semi-generative modelling framework: P(Y,X_eff|Xcau)
- …