1,692 research outputs found
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning
International audienceWe present a novel hybrid algorithm for Bayesian network structure learning, called Hybrid HPC (H2PC). It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. It is based on a subroutine called HPC, that combines ideas from incremental and divide-and-conquer constraint-based methods to learn the parents and children of a target variable. We conduct an experimental comparison of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning, on several benchmarks with various data sizes. Our extensive experiments show that H2PC outperforms MMHC both in terms of goodness of fit to new data and in terms of the quality of the network structure itself, which is closer to the true dependence structure of the data. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available
High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering
Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data
- …