27,876 research outputs found
Direct Estimation of Differences in Causal Graphs
We consider the problem of estimating the differences between two causal
directed acyclic graph (DAG) models with a shared topological order given
i.i.d. samples from each model. This is of interest for example in genomics,
where changes in the structure or edge weights of the underlying causal graphs
reflect alterations in the gene regulatory networks. We here provide the first
provably consistent method for directly estimating the differences in a pair of
causal DAGs without separately learning two possibly large and dense DAG models
and computing their difference. Our two-step algorithm first uses invariance
tests between regression coefficients of the two data sets to estimate the
skeleton of the difference graph and then orients some of the edges using
invariance tests between regression residual variances. We demonstrate the
properties of our method through a simulation study and apply it to the
analysis of gene expression data from ovarian cancer and during T-cell
activation
Robust causal structure learning with some hidden variables
We introduce a new method to estimate the Markov equivalence class of a
directed acyclic graph (DAG) in the presence of hidden variables, in settings
where the underlying DAG among the observed variables is sparse, and there are
a few hidden variables that have a direct effect on many of the observed ones.
Building on the so-called low rank plus sparse framework, we suggest a
two-stage approach which first removes the effect of the hidden variables, and
then estimates the Markov equivalence class of the underlying DAG under the
assumption that there are no remaining hidden variables. This approach is
consistent in certain high-dimensional regimes and performs favourably when
compared to the state of the art, both in terms of graphical structure recovery
and total causal effect estimation
Ancestral Causal Inference
Constraint-based causal discovery from limited data is a notoriously
difficult challenge due to the many borderline independence test decisions.
Several approaches to improve the reliability of the predictions by exploiting
redundancy in the independence information have been proposed recently. Though
promising, existing approaches can still be greatly improved in terms of
accuracy and scalability. We present a novel method that reduces the
combinatorial explosion of the search space by using a more coarse-grained
representation of causal information, drastically reducing computation time.
Additionally, we propose a method to score causal predictions based on their
confidence. Crucially, our implementation also allows one to easily combine
observational and interventional data and to incorporate various types of
available background knowledge. We prove soundness and asymptotic consistency
of our method and demonstrate that it can outperform the state-of-the-art on
synthetic data, achieving a speedup of several orders of magnitude. We
illustrate its practical feasibility by applying it on a challenging protein
data set.Comment: In Proceedings of Advances in Neural Information Processing Systems
29 (NIPS 2016
Distributional Robustness of K-class Estimators and the PULSE
Recently, in causal discovery, invariance properties such as the moment
criterion which two-stage least square estimator leverage have been exploited
for causal structure learning: e.g., in cases, where the causal parameter is
not identifiable, some structure of the non-zero components may be identified,
and coverage guarantees are available. Subsequently, anchor regression has been
proposed to trade-off invariance and predictability. The resulting estimator is
shown to have optimal predictive performance under bounded shift interventions.
In this paper, we show that the concepts of anchor regression and K-class
estimators are closely related. Establishing this connection comes with two
benefits: (1) It enables us to prove robustness properties for existing K-class
estimators when considering distributional shifts. And, (2), we propose a novel
estimator in instrumental variable settings by minimizing the mean squared
prediction error subject to the constraint that the estimator lies in an
asymptotically valid confidence region of the causal parameter. We call this
estimator PULSE (p-uncorrelated least squares estimator) and show that it can
be computed efficiently, even though the underlying optimization problem is
non-convex. We further prove that it is consistent. We perform simulation
experiments illustrating that there are several settings including weak
instrument settings, where PULSE outperforms other estimators and suffers from
less variability.Comment: 85 pages, 15 figure
- …