16,834 research outputs found
Uniform random generation of large acyclic digraphs
Directed acyclic graphs are the basic representation of the structure
underlying Bayesian networks, which represent multivariate probability
distributions. In many practical applications, such as the reverse engineering
of gene regulatory networks, not only the estimation of model parameters but
the reconstruction of the structure itself is of great interest. As well as for
the assessment of different structure learning algorithms in simulation
studies, a uniform sample from the space of directed acyclic graphs is required
to evaluate the prevalence of certain structural features. Here we analyse how
to sample acyclic digraphs uniformly at random through recursive enumeration,
an approach previously thought too computationally involved. Based on
complexity considerations, we discuss in particular how the enumeration
directly provides an exact method, which avoids the convergence issues of the
alternative Markov chain methods and is actually computationally much faster.
The limiting behaviour of the distribution of acyclic digraphs then allows us
to sample arbitrarily large graphs. Building on the ideas of recursive
enumeration based sampling we also introduce a novel hybrid Markov chain with
much faster convergence than current alternatives while still being easy to
adapt to various restrictions. Finally we discuss how to include such
restrictions in the combinatorial enumeration and the new hybrid Markov chain
method for efficient uniform sampling of the corresponding graphs.Comment: 15 pages, 2 figures. To appear in Statistics and Computin
Two Optimal Strategies for Active Learning of Causal Models from Interventional Data
From observational data alone, a causal DAG is only identifiable up to Markov
equivalence. Interventional data generally improves identifiability; however,
the gain of an intervention strongly depends on the intervention target, that
is, the intervened variables. We present active learning (that is, optimal
experimental design) strategies calculating optimal interventions for two
different learning goals. The first one is a greedy approach using
single-vertex interventions that maximizes the number of edges that can be
oriented after each intervention. The second one yields in polynomial time a
minimum set of targets of arbitrary size that guarantees full identifiability.
This second approach proves a conjecture of Eberhardt (2008) indicating the
number of unbounded intervention targets which is sufficient and in the worst
case necessary for full identifiability. In a simulation study, we compare our
two active learning approaches to random interventions and an existing
approach, and analyze the influence of estimation errors on the overall
performance of active learning
- …