3 research outputs found
Confounder Detection in High Dimensional Linear Models using First Moments of Spectral Measures
In this paper, we study the confounder detection problem in the linear model,
where the target variable is predicted using its potential causes
. Based on an assumption of rotation invariant generating
process of the model, recent study shows that the spectral measure induced by
the regression coefficient vector with respect to the covariance matrix of
is close to a uniform measure in purely causal cases, but it differs from
a uniform measure characteristically in the presence of a scalar confounder.
Then, analyzing spectral measure pattern could help to detect confounding. In
this paper, we propose to use the first moment of the spectral measure for
confounder detection. We calculate the first moment of the regression vector
induced spectral measure, and compare it with the first moment of a uniform
spectral measure, both defined with respect to the covariance matrix of .
The two moments coincide in non-confounding cases, and differ from each other
in the presence of confounding. This statistical causal-confounding asymmetry
can be used for confounder detection. Without the need of analyzing the
spectral measure pattern, our method does avoid the difficulty of metric choice
and multiple parameter optimization. Experiments on synthetic and real data
show the performance of this method.Comment: Accepted at Neural Computatio
A Causal Direction Test for Heterogeneous Populations
A probabilistic expert system emulates the decision-making ability of a human
expert through a directional graphical model. The first step in building such
systems is to understand data generation mechanism. To this end, one may try to
decompose a multivariate distribution into product of several conditionals, and
evolving a blackbox machine learning predictive models towards transparent
cause-and-effect discovery. Most causal models assume a single homogeneous
population, an assumption that may fail to hold in many applications. We show
that when the homogeneity assumption is violated, causal models developed based
on such assumption can fail to identify the correct causal direction. We
propose an adjustment to a commonly used causal direction test statistic by
using a -means type clustering algorithm where both the labels and the
number of components are estimated from the collected data to adjust the test
statistic. Our simulation result show that the proposed adjustment
significantly improves the performance of the causal direction test statistic
for heterogeneous data. We study large sample behaviour of our proposed test
statistic and demonstrate the application of the proposed method using real
data
Entropic Latent Variable Discovery
We consider the problem of discovering the simplest latent variable that can
make two observed discrete variables conditionally independent. This problem
has appeared in the literature as probabilistic latent semantic analysis
(pLSA), and has connections to non-negative matrix factorization. When the
simplicity of the variable is measured through its cardinality, we show that a
solution to this latent variable discovery problem can be used to distinguish
direct causal relations from spurious correlations among almost all joint
distributions on simple causal graphs with two observed variables. Conjecturing
a similar identifiability result holds with Shannon entropy, we study a loss
function that trades-off between entropy of the latent variable and the
conditional mutual information of the observed variables. We then propose a
latent variable discovery algorithm -- LatentSearch -- and show that its
stationary points are the stationary points of our loss function. We
experimentally show that LatentSearch can indeed be used to distinguish direct
causal relations from spurious correlations