Search CORE

3 research outputs found

Confounder Detection in High Dimensional Linear Models using First Moments of Spectral Measures

Author: Chan Laiwan
Liu Furui
Publication venue: 'MIT Press - Journals'
Publication date: 20/03/2018
Field of study

In this paper, we study the confounder detection problem in the linear model, where the target variable

Y

is predicted using its

n

potential causes

X_n=(x_1,...,x_n)^T

. Based on an assumption of rotation invariant generating process of the model, recent study shows that the spectral measure induced by the regression coefficient vector with respect to the covariance matrix of

X_n

is close to a uniform measure in purely causal cases, but it differs from a uniform measure characteristically in the presence of a scalar confounder. Then, analyzing spectral measure pattern could help to detect confounding. In this paper, we propose to use the first moment of the spectral measure for confounder detection. We calculate the first moment of the regression vector induced spectral measure, and compare it with the first moment of a uniform spectral measure, both defined with respect to the covariance matrix of

X_n

. The two moments coincide in non-confounding cases, and differ from each other in the presence of confounding. This statistical causal-confounding asymmetry can be used for confounder detection. Without the need of analyzing the spectral measure pattern, our method does avoid the difficulty of metric choice and multiple parameter optimization. Experiments on synthetic and real data show the performance of this method.Comment: Accepted at Neural Computatio

arXiv.org e-Print Archive

A Causal Direction Test for Heterogeneous Populations

Author: Asgharian Masoud
Chen Zhitang
Geng Yanhui
Hu Shoubo
Li Xinlin
Nia Vahid Partovi
Publication venue
Publication date: 27/09/2021
Field of study

A probabilistic expert system emulates the decision-making ability of a human expert through a directional graphical model. The first step in building such systems is to understand data generation mechanism. To this end, one may try to decompose a multivariate distribution into product of several conditionals, and evolving a blackbox machine learning predictive models towards transparent cause-and-effect discovery. Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications. We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction. We propose an adjustment to a commonly used causal direction test statistic by using a

k

-means type clustering algorithm where both the labels and the number of components are estimated from the collected data to adjust the test statistic. Our simulation result show that the proposed adjustment significantly improves the performance of the causal direction test statistic for heterogeneous data. We study large sample behaviour of our proposed test statistic and demonstrate the application of the proposed method using real data

arXiv.org e-Print Archive

Entropic Latent Variable Discovery

Author: Caramanis Constantine
Dimakis Alexandros G.
Kocaoglu Murat
Shakkottai Sanjay
Vishwanath Sriram
Publication venue
Publication date: 26/07/2018
Field of study

We consider the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. This problem has appeared in the literature as probabilistic latent semantic analysis (pLSA), and has connections to non-negative matrix factorization. When the simplicity of the variable is measured through its cardinality, we show that a solution to this latent variable discovery problem can be used to distinguish direct causal relations from spurious correlations among almost all joint distributions on simple causal graphs with two observed variables. Conjecturing a similar identifiability result holds with Shannon entropy, we study a loss function that trades-off between entropy of the latent variable and the conditional mutual information of the observed variables. We then propose a latent variable discovery algorithm -- LatentSearch -- and show that its stationary points are the stationary points of our loss function. We experimentally show that LatentSearch can indeed be used to distinguish direct causal relations from spurious correlations

arXiv.org e-Print Archive