7 research outputs found

    Causal Inference by Stochastic Complexity

    Full text link
    The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes

    Causal Inference by Stochastic Complexity

    No full text
    The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes

    Structural Agnostic Modeling: Adversarial Learning of Causal Graphs

    Full text link
    A new causal discovery method, Structural Agnostic Modeling (SAM), is presented in this paper. Leveraging both conditional independencies and distributional asymmetries in the data, SAM aims at recovering full causal models from continuous observational data along a multivariate non-parametric setting. The approach is based on a game between dd players estimating each variable distribution conditionally to the others as a neural net, and an adversary aimed at discriminating the overall joint conditional distribution, and that of the original data. An original learning criterion combining distribution estimation, sparsity and acyclicity constraints is used to enforce the end-to-end optimization of the graph structure and parameters through stochastic gradient descent. Besides the theoretical analysis of the approach in the large sample limit, SAM is extensively experimentally validated on synthetic and real data

    CAUSAL ANALYSIS THEORY AND APPLICATION TO ALZHEIMER’S DISEASE (AD) AND HEART FAILURE (HF)

    Get PDF
    Alzheimer\u27s disease (AD) and heart failure (HF) are two complex diseases that are caused by the combination of genetic and epigenetic, environmental and other lifestyle factors. Understanding the relationships between genetic and epigenetic variants and other factors of such complex diseases could assist researchers discover disease mechanisms and develop targeted therapies. Much of the research in genetics/epigenetics studies regarding AD and heart diseases have been focused on association analysis. Many researchers have identified genetic/epigenetics variants and phenotypes that are significantly associated with disease pathology. While most of these studies utilize association analysis as the analytical platform, the signals identified by association studies can only explain a small proportion of the heritability of complex diseases and a large proportion of risk factors remain undiscovered, which is the limitation of genome- wide association studies (GWAS). In addition, the biological system usually functions in a systematic or causal way, thus causation analysis is key to uncover the risk mechanisms of complex diseases. The relationship between association and causation is that causation can be used to infer association, but the reverse cannot be guaranteed. Traditionally, the gold standard for causation analysis is using interventions in randomized controlled trials (RCT). However, RCT is not feasible for genetics/epigenetics data for either ethical or technical reasons. The major objective of this research is thus to propose methods to uncover the causal mechanisms between genetic/epigenetic factors and phenotypes such as environmental and lifestyle factors for complex diseases. First, I proposed a bivariate causal discovery method to uncover the pairwise causal relationships between factors. Second, I proposed a network analysis framework to construct the causal network among genetic/epigenetic variants and phenotypic factors. Finally, I applied the bivariate causal discovery method and causal network construction method to the two complex diseases: Alzheimer\u27s disease (AD) and heart failure (HF) data. Simulations and applications results were discussed in the following sections
    corecore