1,174 research outputs found
Identifiability of Causal Graphs using Functional Models
This work addresses the following question: Under what assumptions on the
data generating process can one infer the causal graph from the joint
distribution? The approach taken by conditional independence-based causal
discovery methods is based on two assumptions: the Markov condition and
faithfulness. It has been shown that under these assumptions the causal graph
can be identified up to Markov equivalence (some arrows remain undirected)
using methods like the PC algorithm. In this work we propose an alternative by
defining Identifiable Functional Model Classes (IFMOCs). As our main theorem we
prove that if the data generating process belongs to an IFMOC, one can identify
the complete causal graph. To the best of our knowledge this is the first
identifiability result of this kind that is not limited to linear functional
relationships. We discuss how the IFMOC assumption and the Markov and
faithfulness assumptions relate to each other and explain why we believe that
the IFMOC assumption can be tested more easily on given data. We further
provide a practical algorithm that recovers the causal graph from finitely many
data; experiments on simulated data support the theoretical findings
Structural Agnostic Modeling: Adversarial Learning of Causal Graphs
A new causal discovery method, Structural Agnostic Modeling (SAM), is
presented in this paper. Leveraging both conditional independencies and
distributional asymmetries in the data, SAM aims at recovering full causal
models from continuous observational data along a multivariate non-parametric
setting. The approach is based on a game between players estimating each
variable distribution conditionally to the others as a neural net, and an
adversary aimed at discriminating the overall joint conditional distribution,
and that of the original data. An original learning criterion combining
distribution estimation, sparsity and acyclicity constraints is used to enforce
the end-to-end optimization of the graph structure and parameters through
stochastic gradient descent. Besides the theoretical analysis of the approach
in the large sample limit, SAM is extensively experimentally validated on
synthetic and real data
Invariant Causal Prediction for Sequential Data
We investigate the problem of inferring the causal predictors of a response
from a set of explanatory variables . Classical
ordinary least squares regression includes all predictors that reduce the
variance of . Using only the causal predictors instead leads to models that
have the advantage of remaining invariant under interventions, loosely speaking
they lead to invariance across different "environments" or "heterogeneity
patterns". More precisely, the conditional distribution of given its causal
predictors remains invariant for all observations. Recent work exploits such a
stability to infer causal relations from data with different but known
environments. We show that even without having knowledge of the environments or
heterogeneity pattern, inferring causal relations is possible for time-ordered
(or any other type of sequentially ordered) data. In particular, this allows
detecting instantaneous causal relations in multivariate linear time series
which is usually not the case for Granger causality. Besides novel methodology,
we provide statistical confidence bounds and asymptotic detection results for
inferring causal predictors, and present an application to monetary policy in
macroeconomics.Comment: 55 page
Switching Regression Models and Causal Inference in the Presence of Discrete Latent Variables
Given a response and a vector of predictors,
we investigate the problem of inferring direct causes of among the vector
. Models for that use all of its causal covariates as predictors enjoy
the property of being invariant across different environments or interventional
settings. Given data from such environments, this property has been exploited
for causal discovery. Here, we extend this inference principle to situations in
which some (discrete-valued) direct causes of are unobserved. Such cases
naturally give rise to switching regression models. We provide sufficient
conditions for the existence, consistency and asymptotic normality of the MLE
in linear switching regression models with Gaussian noise, and construct a test
for the equality of such models. These results allow us to prove that the
proposed causal discovery method obtains asymptotic false discovery control
under mild conditions. We provide an algorithm, make available code, and test
our method on simulated data. It is robust against model violations and
outperforms state-of-the-art approaches. We further apply our method to a real
data set, where we show that it does not only output causal predictors, but
also a process-based clustering of data points, which could be of additional
interest to practitioners.Comment: 46 pages, 14 figures; real-world application added in Section 5.2;
additional numerical experiments added in the Appendix
Large-Scale Kernel Methods for Independence Testing
Representations of probability measures in reproducing kernel Hilbert spaces
provide a flexible framework for fully nonparametric hypothesis tests of
independence, which can capture any type of departure from independence,
including nonlinear associations and multivariate interactions. However, these
approaches come with an at least quadratic computational cost in the number of
observations, which can be prohibitive in many applications. Arguably, it is
exactly in such large-scale datasets that capturing any type of dependence is
of interest, so striking a favourable tradeoff between computational efficiency
and test performance for kernel independence tests would have a direct impact
on their applicability in practice. In this contribution, we provide an
extensive study of the use of large-scale kernel approximations in the context
of independence testing, contrasting block-based, Nystrom and random Fourier
feature approaches. Through a variety of synthetic data experiments, it is
demonstrated that our novel large scale methods give comparable performance
with existing methods whilst using significantly less computation time and
memory.Comment: 29 pages, 6 figure
Causal Discovery with Continuous Additive Noise Models
We consider the problem of learning causal directed acyclic graphs from an
observational joint distribution. One can use these graphs to predict the
outcome of interventional experiments, from which data are often not available.
We show that if the observational distribution follows a structural equation
model with an additive noise structure, the directed acyclic graph becomes
identifiable from the distribution under mild conditions. This constitutes an
interesting alternative to traditional methods that assume faithfulness and
identify only the Markov equivalence class of the graph, thus leaving some
edges undirected. We provide practical algorithms for finitely many samples,
RESIT (Regression with Subsequent Independence Test) and two methods based on
an independence score. We prove that RESIT is correct in the population setting
and provide an empirical evaluation
- …