10,077 research outputs found
Causal aggregation: estimation and inference of causal effects by constraint-based data fusion
In causal inference, it is common to estimate the causal effect of a single
treatment variable on an outcome. However, practitioners may also be interested
in the effect of simultaneous interventions on multiple covariates of a fixed
target variable. We propose a novel method that allows to estimate the effect
of joint interventions using data from different experiments in which only very
few variables are manipulated. If there is only little randomized data or no
randomized data at all, one can use observational data sets if certain parental
sets are known or instrumental variables are available. If the joint causal
effect is linear, the proposed method can be used for estimation and inference
of joint causal effects, and we characterize conditions for identifiability. In
the overidentified case, we indicate how to leverage all the available causal
information across multiple data sets to efficiently estimate the causal
effects. If the dimension of the covariate vector is large, we may only have a
few samples in each data set. Under a sparsity assumption, we derive an
estimator of the causal effects in this high-dimensional scenario. In addition,
we show how to deal with the case where a lack of experimental constraints
prevents direct estimation of the causal effects. When the joint causal effects
are non-linear, we characterize conditions under which identifiability holds,
and propose a non-linear causal aggregation methodology for experimental data
sets similar to the gradient boosting algorithm where in each iteration we
combine weak learners trained on different datasets using only unconfounded
samples. We demonstrate the effectiveness of the proposed method on simulated
and semi-synthetic data
Structural Agnostic Modeling: Adversarial Learning of Causal Graphs
A new causal discovery method, Structural Agnostic Modeling (SAM), is
presented in this paper. Leveraging both conditional independencies and
distributional asymmetries in the data, SAM aims at recovering full causal
models from continuous observational data along a multivariate non-parametric
setting. The approach is based on a game between players estimating each
variable distribution conditionally to the others as a neural net, and an
adversary aimed at discriminating the overall joint conditional distribution,
and that of the original data. An original learning criterion combining
distribution estimation, sparsity and acyclicity constraints is used to enforce
the end-to-end optimization of the graph structure and parameters through
stochastic gradient descent. Besides the theoretical analysis of the approach
in the large sample limit, SAM is extensively experimentally validated on
synthetic and real data
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
- …