92 research outputs found

    Semi-Supervised Learning, Causality and the Conditional Cluster Assumption

    Full text link
    While the success of semi-supervised learning (SSL) is still not fully understood, Sch\"olkopf et al. (2012) have established a link to the principle of independent causal mechanisms. They conclude that SSL should be impossible when predicting a target variable from its causes, but possible when predicting it from its effects. Since both these cases are somewhat restrictive, we extend their work by considering classification using cause and effect features at the same time, such as predicting disease from both risk factors and symptoms. While standard SSL exploits information contained in the marginal distribution of all inputs (to improve the estimate of the conditional distribution of the target given inputs), we argue that in our more general setting we should use information in the conditional distribution of effect features given causal features. We explore how this insight generalises the previous understanding, and how it relates to and can be exploited algorithmically for SSL.Comment: 36th Conference on Uncertainty in Artificial Intelligence (2020) (Previously presented at the NeurIPS 2019 workshop "Do the right thing": machine learning and causal inference for improved decision making, Vancouver, Canada.

    Domain adaptation under structural causal models

    Full text link
    Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been application-driven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions in our theory are violated.Comment: 80 pages, 22 figures, accepted in JML

    Invariant Models for Causal Transfer Learning

    Get PDF
    Methods of transfer learning try to combine knowledge from several related tasks (or domains) to improve performance on a test task. Inspired by causal methodology, we relax the usual covariate shift assumption and assume that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks. We show how this assumption can be motivated from ideas in the field of causality. We focus on the problem of Domain Generalization, in which no examples from the test task are observed. We prove that in an adversarial setting using this subset for prediction is optimal in Domain Generalization; we further provide examples, in which the tasks are sufficiently diverse and the estimator therefore outperforms pooling the data, even on average. If examples from the test task are available, we also provide a method to transfer knowledge from the training tasks and exploit all available features for prediction. However, we provide no guarantees for this method. We introduce a practical method which allows for automatic inference of the above subset and provide corresponding code. We present results on synthetic data sets and a gene deletion data set

    Learning Invariant Representations under General Interventions on the Response

    Full text link
    It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that the conditional distribution of the response given its predictors remains the same across environments. However, this principle might be violated in practical settings when the response is intervened. A natural question is whether it is still possible to identify other forms of invariance to facilitate prediction in unseen environments. To shed light on this challenging scenario, we focus on linear structural causal models (SCMs) and introduce invariant matching property (IMP), an explicit relation to capture interventions through an additional feature, leading to an alternative form of invariance that enables a unified treatment of general interventions on the response as well as the predictors. We analyze the asymptotic generalization errors of our method under both the discrete and continuous environment settings, where the continuous case is handled by relating it to the semiparametric varying coefficient models. We present algorithms that show competitive performance compared to existing methods over various experimental settings including a COVID dataset.Comment: Accepted to the IEEE Journal on Selected Areas in Information Theory. Special Issue: Causality: Fundamental Limits and Application

    Causal Discovery Beyond Conditional Independences

    Get PDF
    Knowledge about causal relationships is important because it enables the prediction of the effects of interventions that perturb the observed system. Specifically, predicting the results of interventions amounts to the ability of answering questions like the following: if one or more variables are forced into a particular state, how will the probability distribution of the other variables be affected? Causal relationships can be identified through randomized experiments. However, such experiments may often be unethical, too expensive or even impossible to perform. The development of methods to infer causal relationships from observational rather than experimental data constitutes therefore a fundamental research topic. In this thesis, we address the prob- lem of causal discovery, that is, recovering the underlying causal structure based on the joint probability distribution of the observed random variables. The causal graph cannot be determined by the observed joint distribution alone; additional causal assumptions, that link statistics to causality, are necessary. Under the Markov condition and the faithfulness assumption, conditional-independence-based methods estimate a set of Markov equiva- lent graphs. However, these methods cannot distinguish between two graphs belonging to the same Markov equivalence class. Alternative methods in- vestigate a different set of assumptions. A formal basis underlying these assumptions are functional models which model each variable as a function of its parents and some noise, with the noise variables assumed to be jointly independent. By restricting the function class, e.g., assuming additive noise, Markov equivalent graphs can become distinguishable. Variants of all afore- mentioned methods allow for the presence of confounders, which are unob- served common causes of two or more observed variables. In this thesis, we present complementary causal discovery methods employ- ing different kind of assumptions than the ones mentioned above. The first part of this work concerns causal discovery allowing for the presence of con- founders. We first propose a method that detects the existence and identifies a finite-range confounder of a set of observed dependent variables. It is based on a kernel method to identify finite mixtures of nonparametric product dis- tributions. Next, a property of a conditional distribution, called purity, is introduced which is used for excluding the presence of a low-range confounder of two observed variables that completely explains their dependence (we call low-range a variable whose range has “small” cardinality). We further study the problem of causal discovery in the two-variable case, but now assuming no confounders. To this end, we exploit the principle of inde- pendence of causal mechanisms that has been proposed in the literature. For the case of two variables, it states that, if X → Y (X causes Y ), then P (X ) and P(Y |X) do not contain information about each other. Instead, P(Y ) and P(X|Y ) may contain information about each other. Consequently, esti- mating P(Y |X) from P(X) should not be possible, while estimating P(X|Y ) based on P(Y) may be possible. We employ this asymmetry to propose a causal discovery method which decides upon the causal direction by compar- ing the accuracy of the estimations of P (Y |X ) and P (X |Y ). Moreover, the principle of independence has implications for common ma- chine learning tasks such as semi-supervised learning, which are also dis- cussed in the current work. Finally, the goal of the last part of this dissertation is to present empirical results on the performance of estimation procedures for causal discovery using Additive Noise Models (ANMs) in the two-variable case. Experiments on synthetic and real data show that the algorithms proposed in this thesis often outperform state-of-the-art algorithms

    Semi-generative modelling: learning with cause and effect features

    Get PDF
    We consider a case of covariate shift where prior causal inference or expert knowledge has identified some features as effects, and show how this setting, when analysed from a causal perspective, gives rise to a semi-generative modelling framework: P(Y,X_eff|Xcau)
    corecore