10,077 research outputs found

    Causal aggregation: estimation and inference of causal effects by constraint-based data fusion

    Full text link
    In causal inference, it is common to estimate the causal effect of a single treatment variable on an outcome. However, practitioners may also be interested in the effect of simultaneous interventions on multiple covariates of a fixed target variable. We propose a novel method that allows to estimate the effect of joint interventions using data from different experiments in which only very few variables are manipulated. If there is only little randomized data or no randomized data at all, one can use observational data sets if certain parental sets are known or instrumental variables are available. If the joint causal effect is linear, the proposed method can be used for estimation and inference of joint causal effects, and we characterize conditions for identifiability. In the overidentified case, we indicate how to leverage all the available causal information across multiple data sets to efficiently estimate the causal effects. If the dimension of the covariate vector is large, we may only have a few samples in each data set. Under a sparsity assumption, we derive an estimator of the causal effects in this high-dimensional scenario. In addition, we show how to deal with the case where a lack of experimental constraints prevents direct estimation of the causal effects. When the joint causal effects are non-linear, we characterize conditions under which identifiability holds, and propose a non-linear causal aggregation methodology for experimental data sets similar to the gradient boosting algorithm where in each iteration we combine weak learners trained on different datasets using only unconfounded samples. We demonstrate the effectiveness of the proposed method on simulated and semi-synthetic data

    Structural Agnostic Modeling: Adversarial Learning of Causal Graphs

    Full text link
    A new causal discovery method, Structural Agnostic Modeling (SAM), is presented in this paper. Leveraging both conditional independencies and distributional asymmetries in the data, SAM aims at recovering full causal models from continuous observational data along a multivariate non-parametric setting. The approach is based on a game between dd players estimating each variable distribution conditionally to the others as a neural net, and an adversary aimed at discriminating the overall joint conditional distribution, and that of the original data. An original learning criterion combining distribution estimation, sparsity and acyclicity constraints is used to enforce the end-to-end optimization of the graph structure and parameters through stochastic gradient descent. Besides the theoretical analysis of the approach in the large sample limit, SAM is extensively experimentally validated on synthetic and real data

    Learning Large-Scale Bayesian Networks with the sparsebn Package

    Get PDF
    Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets often have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we have developed a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing software packages for this task, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. Additionally, the sparsebn package is fully compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
    • …
    corecore