452 research outputs found
Causal Reasoning with Ancestral Graphs
Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)'s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl's calculus---the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993)
Massively-Parallel Feature Selection for Big Data
We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for
feature selection (FS) in Big Data settings (high dimensionality and/or sample
size). To tackle the challenges of Big Data FS PFBP partitions the data matrix
both in terms of rows (samples, training examples) as well as columns
(features). By employing the concepts of -values of conditional independence
tests and meta-analysis techniques PFBP manages to rely only on computations
local to a partition while minimizing communication costs. Then, it employs
powerful and safe (asymptotically sound) heuristics to make early, approximate
decisions, such as Early Dropping of features from consideration in subsequent
iterations, Early Stopping of consideration of features within the same
iteration, or Early Return of the winner in each iteration. PFBP provides
asymptotic guarantees of optimality for data distributions faithfully
representable by a causal network (Bayesian network or maximal ancestral
graph). Our empirical analysis confirms a super-linear speedup of the algorithm
with increasing sample size, linear scalability with respect to the number of
features and processing cores, while dominating other competitive algorithms in
its class
Estimating the effect of joint interventions from observational data in sparse high-dimensional settings
We consider the estimation of joint causal effects from observational data.
In particular, we propose new methods to estimate the effect of multiple
simultaneous interventions (e.g., multiple gene knockouts), under the
assumption that the observational data come from an unknown linear structural
equation model with independent errors. We derive asymptotic variances of our
estimators when the underlying causal structure is partly known, as well as
high-dimensional consistency when the causal structure is fully unknown and the
joint distribution is multivariate Gaussian. We also propose a generalization
of our methodology to the class of nonparanormal distributions. We evaluate the
estimators in simulation studies and also illustrate them on data from the
DREAM4 challenge.Comment: 30 pages, 3 figures, 45 pages supplemen
- ā¦