Search CORE

3,077 research outputs found

Finding Exogenous Variables in Data with Many More Variables than Observations

Author: A. Hyvärinen
A. Hyvärinen
A. Londei
A.V. Ivshina
C. Lorén
D. Bernardo di
E. Lehmann
J. Pearl
N. Delfosse
P. Comon
P. Spirtes
S. Shimizu
Y. Benjamini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Many statistical methods have been proposed to estimate causal models in classical situations with fewer variables than observations (p<n, p: the number of variables and n: the number of observations). However, modern datasets including gene expression data need high-dimensional causal modeling in challenging situations with orders of magnitude more variables than observations (p>>n). In this paper, we propose a method to find exogenous variables in a linear non-Gaussian causal model, which requires much smaller sample sizes than conventional methods and works even when p>>n. The key idea is to identify which variables are exogenous based on non-Gaussianity instead of estimating the entire structure of the model. Exogenous variables work as triggers that activate a causal chain in the model, and their identification leads to more efficient experimental designs and better understanding of the causal mechanism. We present experiments with artificial data and real-world gene expression data to evaluate the method.Comment: A revised version of this was published in Proc. ICANN201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Quantifying identifiability in independent component analysis

Author: Falkeborg Benjamin
Maathuis Marloes H.
Sokol Alexander
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

We are interested in consistent estimation of the mixing matrix in the ICA model, when the error distribution is close to (but different from) Gaussian. In particular, we consider

n

independent samples from the ICA model

X = A\epsilon

, where we assume that the coordinates of

\epsilon

are independent and identically distributed according to a contaminated Gaussian distribution, and the amount of contamination is allowed to depend on

n

. We then investigate how the ability to consistently estimate the mixing matrix depends on the amount of contamination. Our results suggest that in an asymptotic sense, if the amount of contamination decreases at rate

1/\sqrt{n}

or faster, then the mixing matrix is only identifiable up to transpose products. These results also have implications for causal inference from linear structural equation models with near-Gaussian additive noise.Comment: 22 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Modeling sparse connectivity between underlying brain sources for EEG/MEG

Author: Haufe Stefan
Kawanabe Motoaki
Mueller Klaus-Robert
Nolte Guido
Tomioka Ryota
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/12/2009
Field of study

We propose a novel technique to assess functional brain connectivity in EEG/MEG signals. Our method, called Sparsely-Connected Sources Analysis (SCSA), can overcome the problem of volume conduction by modeling neural data innovatively with the following ingredients: (a) the EEG is assumed to be a linear mixture of correlated sources following a multivariate autoregressive (MVAR) model, (b) the demixing is estimated jointly with the source MVAR parameters, (c) overfitting is avoided by using the Group Lasso penalty. This approach allows to extract the appropriate level cross-talk between the extracted sources and in this manner we obtain a sparse data-driven model of functional connectivity. We demonstrate the usefulness of SCSA with simulated data, and compare to a number of existing algorithms with excellent results.Comment: 9 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

We Are Not Your Real Parents: Telling Causal from Confounded using MDL

Author: Kaltenpoth D.
Vreeken J.
Publication venue
Publication date: 01/01/2019
Field of study

Given data over variables

(X_1,...,X_m, Y)

we consider the problem of finding out whether

X

jointly causes

Y

or whether they are all confounded by an unobserved latent variable

Z

. To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where

X

causes

Y

and where there exists a latent variables

Z

confounding both

X

and

Y

and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence

MPG.PuRe