1,177 research outputs found
Non-linear Causal Inference using Gaussianity Measures
We provide theoretical and empirical evidence for a type of asymmetry between
causes and effects that is present when these are related via linear models
contaminated with additive non-Gaussian noise. Assuming that the causes and the
effects have the same distribution, we show that the distribution of the
residuals of a linear fit in the anti-causal direction is closer to a Gaussian
than the distribution of the residuals in the causal direction. This
Gaussianization effect is characterized by reduction of the magnitude of the
high-order cumulants and by an increment of the differential entropy of the
residuals. The problem of non-linear causal inference is addressed by
performing an embedding in an expanded feature space, in which the relation
between causes and effects can be assumed to be linear. The effectiveness of a
method to discriminate between causes and effects based on this type of
asymmetry is illustrated in a variety of experiments using different measures
of Gaussianity. The proposed method is shown to be competitive with
state-of-the-art techniques for causal inference.Comment: 35 pages, 9 figure
Finding Exogenous Variables in Data with Many More Variables than Observations
Many statistical methods have been proposed to estimate causal models in
classical situations with fewer variables than observations (p<n, p: the number
of variables and n: the number of observations). However, modern datasets
including gene expression data need high-dimensional causal modeling in
challenging situations with orders of magnitude more variables than
observations (p>>n). In this paper, we propose a method to find exogenous
variables in a linear non-Gaussian causal model, which requires much smaller
sample sizes than conventional methods and works even when p>>n. The key idea
is to identify which variables are exogenous based on non-Gaussianity instead
of estimating the entire structure of the model. Exogenous variables work as
triggers that activate a causal chain in the model, and their identification
leads to more efficient experimental designs and better understanding of the
causal mechanism. We present experiments with artificial data and real-world
gene expression data to evaluate the method.Comment: A revised version of this was published in Proc. ICANN201
Recommended from our members
Exporting and productivity as part of the growth process: causal evidence from a data-driven structural VAR
This paper introduces a little known category of estimators - Linear Non-Gaussian vector autoregression models that are acyclic or cyclic - imported from the machine learning literature, to revisit a well-known debate. Does exporting increase firm productivity? Or is it only more productive firms that remain in the export market? We focus on a relatively well-studied country (Chile) and on already-exporting firms (i.e. the intensive margin of exporting). We explicitly look at the co-evolution of productivity and growth, and attempt to ascertain both contemporaneous and lagged causal relationships. Our findings suggest that exporting does not have any causal influence on the other variables. Instead, export seems to be determined by other dimensions of firm growth. With respect to learning by exporting (LBE), we find no evidence that export growth causes productivity growth within the period and very little evidence that exporting growth has a causal effect on subsequent TFP growth
Estimating the effect of joint interventions from observational data in sparse high-dimensional settings
We consider the estimation of joint causal effects from observational data.
In particular, we propose new methods to estimate the effect of multiple
simultaneous interventions (e.g., multiple gene knockouts), under the
assumption that the observational data come from an unknown linear structural
equation model with independent errors. We derive asymptotic variances of our
estimators when the underlying causal structure is partly known, as well as
high-dimensional consistency when the causal structure is fully unknown and the
joint distribution is multivariate Gaussian. We also propose a generalization
of our methodology to the class of nonparanormal distributions. We evaluate the
estimators in simulation studies and also illustrate them on data from the
DREAM4 challenge.Comment: 30 pages, 3 figures, 45 pages supplemen
- …