Search CORE

459 research outputs found

Linear causal model discovery using the MML criterion

Author: Dai Honghua
Li Gang
Tu Yiqing
Publication venue: IEEE Computer Society
Publication date: 01/01/2002
Field of study

Determining the causal structure of a domain is a key task in the area of Data Mining and Knowledge Discovery.The algorithm proposed by Wallace et al. [15] has demonstrated its strong ability in discovering Linear Causal Models from given data sets. However, some experiments showed that this algorithm experienced difficulty in discovering linear relations with small deviation, and it occasionally gives a negative message length, which should not be allowed. In this paper, a more efficient and precise MML encoding scheme is proposed to describe the model structure and the nodes in a Linear Causal Model. The estimation of different parameters is also derived. Empirical results show that the new algorithm outperformed the previous MML-based algorithm in terms of both speed and precision. <br /

Deakin Research Online

Distinguishing cause from effect using observational data: methods and benchmarks

Author: Janzing Dominik
Mooij Joris M.
Peters Jonas
Schölkopf Bernhard
Zscheischler Jakob
Publication venue
Publication date: 01/01/2015
Field of study

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning Researc

arXiv.org e-Print Archive

MPG.PuRe

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Discovering linear causal model from incomplete data

Author: Dai Honghua
Li Gang
Tu Y.
Publication venue: Chiang Mai University, Institute for Science and Technology Research and Development
Publication date: 01/01/2003
Field of study

One common drawback in algorithms for learning Linear Causal Models is that they can not deal with incomplete data set. This is unfortunate since many real problems involve missing data or even hidden variable. In this paper, based on multiple imputation, we propose a three-step process to learn linear causal models from incomplete data set. Experimental results indicate that this algorithm is better than the single imputation method (EM algorithm) and the simple list deletion method, and for lower missing rate, this algorithm can even find models better than the results from the greedy learning algorithm MLGS working in a complete data set. In addition, the method is amenable to parallel or distributed processing, which is an important characteristic for data mining in large data sets.<br /

Deakin Research Online

Ensemble parameter estimation for graphical models

Author: Dai Honghua
Li Gang
Tu Yiqing
Publication venue: Chiang Mai University, Institute for Science and Technology Research and Development
Publication date: 01/01/2003
Field of study

Parameter Estimation is one of the key issues involved in the discovery of graphical models from data. Current state of the art methods have demonstrated their abilities in different kind of graphical models. In this paper, we introduce ensemble learning into the process of parameter estimation, and examine ensemble parameter estimation methods for different kind of graphical models under complete data set and incomplete data set. We provide experimental results which show that ensemble method can achieve an improved result over the base parameter estimation method in terms of accuracy. In addition, the method is amenable to parallel or distributed processing, which is an important characteristic for data mining in large data sets.<br /

Deakin Research Online

An examination on the performance of MML causal induction

Author: Dai Honghua
Li Gang
Zhuang L.
Publication venue: Chiang Mai University, Institute for Science and Technology Research and Development
Publication date: 01/01/2003
Field of study

This paper presents an examination report on the performance of the improved MML based causal model discovery algorithm. In this paper, We firstly describe our improvement to the causal discovery algorithm which introduces a new encoding scheme for measuring the cost of describing the causal structure. Stiring function is also applied to further simplify the computational complexity and thus works more efficiently. It is followed by a detailed examination report on the performance of our improved discovery algorithm. The experimental results of the current version of the discovery system show that: (l) the current version is capable of discovering what discovered by previous system; (2) current system is capable of discovering more complicated causal networks with large number of variables; (3) the new version works more efficiently compared with the previous version in terms of time complexity

Deakin Research Online

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

Author: Hlavackova-Schindler Katerina
Melnykova Anna
Tubikanec Irene
Publication venue
Publication date: 05/09/2023
Field of study

Multivariate Hawkes processes (MHPs) are versatile probabilistic tools used to model various real-life phenomena: earthquakes, operations on stock markets, neuronal activity, virus propagation and many others. In this paper, we focus on MHPs with exponential decay kernels and estimate connectivity graphs, which represent the Granger causal relations between their components. We approach this inference problem by proposing an optimization criterion and model selection algorithm based on the minimum message length (MML) principle. MML compares Granger causal models using the Occam's razor principle in the following way: even when models have a comparable goodness-of-fit to the observed data, the one generating the most concise explanation of the data is preferred. While most of the state-of-art methods using lasso-type penalization tend to overfitting in scenarios with short time horizons, the proposed MML-based method achieves high F1 scores in these settings. We conduct a numerical study comparing the proposed algorithm to other related classical and state-of-art methods, where we achieve the highest F1 scores in specific sparse graph settings. We illustrate the proposed method also on G7 sovereign bond data and obtain causal connections, which are in agreement with the expert knowledge available in the literature.Comment: 23 pages, 5 figure

arXiv.org e-Print Archive

MML Probabilistic Principal Component Analysis

Author: Makalic Enes
Schmidt Daniel F.
Publication venue
Publication date: 16/02/2023
Field of study

Principal component analysis (PCA) is perhaps the most widely method for data dimensionality reduction. A key question in PCA decomposition of data is deciding how many factors to retain. This manuscript describes a new approach to automatically selecting the number of principal components based on the Bayesian minimum message length method of inductive inference. We also derive a new estimate of the isotropic residual variance and demonstrate, via numerical experiments, that it improves on the usual maximum likelihood approach

arXiv.org e-Print Archive

Telling Cause from Effect using MDL-based Local and Global Regression

Author: Marx Alexander
Vreeken Jilles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables

X

and

Y

from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer

X

causes

Y

in case it is shorter to describe

Y

as a function of

X

than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.Comment: 10 pages, To appear in ICDM1

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Learning Causal Models for Noisy Biological Data Mining: An Application to Ovarian Cancer Detection

Author: PANG Hwee Hwa
TAN Ah-Hwee
YAP Ghim-Eng
Publication venue: AAAI Press
Publication date: 01/01/2007
Field of study

Singapore Management University Office of Researc

CiteSeerX

Institutional Knowledge at Singapore Management University