448 research outputs found
Interpreting and using CPDAGs with background knowledge
We develop terminology and methods for working with maximally oriented
partially directed acyclic graphs (maximal PDAGs). Maximal PDAGs arise from
imposing restrictions on a Markov equivalence class of directed acyclic graphs,
or equivalently on its graphical representation as a completed partially
directed acyclic graph (CPDAG), for example when adding background knowledge
about certain edge orientations. Although maximal PDAGs often arise in
practice, causal methods have been mostly developed for CPDAGs. In this paper,
we extend such methodology to maximal PDAGs. In particular, we develop
methodology to read off possible ancestral relationships, we introduce a
graphical criterion for covariate adjustment to estimate total causal effects,
and we adapt the IDA and joint-IDA frameworks to estimate multi-sets of
possible causal effects. We also present a simulation study that illustrates
the gain in identifiability of total causal effects as the background knowledge
increases. All methods are implemented in the R package pcalg.Comment: 17 pages, 6 figures, UAI 201
Quantifying identifiability in independent component analysis
We are interested in consistent estimation of the mixing matrix in the ICA
model, when the error distribution is close to (but different from) Gaussian.
In particular, we consider independent samples from the ICA model , where we assume that the coordinates of are independent
and identically distributed according to a contaminated Gaussian distribution,
and the amount of contamination is allowed to depend on . We then
investigate how the ability to consistently estimate the mixing matrix depends
on the amount of contamination. Our results suggest that in an asymptotic
sense, if the amount of contamination decreases at rate or faster,
then the mixing matrix is only identifiable up to transpose products. These
results also have implications for causal inference from linear structural
equation models with near-Gaussian additive noise.Comment: 22 pages, 2 figure
Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm
We consider variable selection in high-dimensional linear models where the
number of covariates greatly exceeds the sample size. We introduce the new
concept of partial faithfulness and use it to infer associations between the
covariates and the response. Under partial faithfulness, we develop a
simplified version of the PC algorithm (Spirtes et al., 2000), the PC-simple
algorithm, which is computationally feasible even with thousands of covariates
and provides consistent variable selection under conditions on the random
design matrix that are of a different nature than coherence conditions for
penalty-based approaches like the Lasso. Simulations and application to real
data show that our method is competitive compared to penalty-based approaches.
We provide an efficient implementation of the algorithm in the R-package pcalg.Comment: 20 pages, 3 figure
Robust causal structure learning with some hidden variables
We introduce a new method to estimate the Markov equivalence class of a
directed acyclic graph (DAG) in the presence of hidden variables, in settings
where the underlying DAG among the observed variables is sparse, and there are
a few hidden variables that have a direct effect on many of the observed ones.
Building on the so-called low rank plus sparse framework, we suggest a
two-stage approach which first removes the effect of the hidden variables, and
then estimates the Markov equivalence class of the underlying DAG under the
assumption that there are no remaining hidden variables. This approach is
consistent in certain high-dimensional regimes and performs favourably when
compared to the state of the art, both in terms of graphical structure recovery
and total causal effect estimation
Estimating the effect of joint interventions from observational data in sparse high-dimensional settings
We consider the estimation of joint causal effects from observational data.
In particular, we propose new methods to estimate the effect of multiple
simultaneous interventions (e.g., multiple gene knockouts), under the
assumption that the observational data come from an unknown linear structural
equation model with independent errors. We derive asymptotic variances of our
estimators when the underlying causal structure is partly known, as well as
high-dimensional consistency when the causal structure is fully unknown and the
joint distribution is multivariate Gaussian. We also propose a generalization
of our methodology to the class of nonparanormal distributions. We evaluate the
estimators in simulation studies and also illustrate them on data from the
DREAM4 challenge.Comment: 30 pages, 3 figures, 45 pages supplemen
- …