68,313 research outputs found
Learning Bayesian Networks with the bnlearn R Package
bnlearn is an R package which includes several algorithms for learning the
structure of Bayesian networks with either discrete or continuous variables.
Both constraint-based and score-based algorithms are implemented, and can use
the functionality provided by the snow package to improve their performance via
parallel computing. Several network scores and conditional independence
algorithms are available for both the learning algorithms and independent use.
Advanced plotting options are provided by the Rgraphviz package.Comment: 22 pages, 4 picture
Parallelization of the PC Algorithm
This paper describes a parallel version of the PC algorithm
for learning the structure of a Bayesian network from data. The PC
algorithm is a constraint-based algorithm consisting of fi ve steps where
the first step is to perform a set of (conditional) independence tests
while the remaining four steps relate to identifying the structure of the
Bayesian network using the results of the (conditional) independence
tests. In this paper, we describe a new approach to parallelization of the
(conditional) independence testing as experiments illustrate that this is
by far the most time consuming step. The proposed parallel PC algorithm
is evaluated on data sets generated at random from five different real-
world Bayesian networks. The results demonstrate that signi cant time
performance improvements are possible using the proposed algorithm
Learning Bayesian Networks with the bnlearn R Package
bnlearn is an R package (R Development Core Team 2010) which includes several algorithms for learning the structure of Bayesian networks with either discrete or continuous variables. Both constraint-based and score-based algorithms are implemented, and can use the functionality provided by the snow package (Tierney et al. 2008) to improve their performance via parallel computing. Several network scores and conditional independence algorithms are available for both the learning algorithms and independent use. Advanced plotting options are provided by the Rgraphviz package (Gentry et al. 2010).
Bayesian networks to explain the effect of label information on product perception
Interdisciplinary approaches in food research require new methods in data analysis that are able to deal with complexity and facilitate the communication among model users. Four parallel full factorial within-subject designs were performed to examine the relative contribution to consumer product evaluation of intrinsic product properties and information given on packaging. Detailed experimental designs and results obtained from analyses of variance were published [1]. The data was analyzed again with the machine learning modelling technique Bayesian networks. The objective of the current paper is to explain basic features of this technique and its advantages over the standard statistical approach regarding handling of complexity and communication of results. With analysis of variance, visualization and interpretation of main effects and interactions effects becomes difficult in complex systems. The Bayesian network model offers the possibility to formally incorporate (domain) experts knowledge. By combining empirical data with the pre-defined network structure, new relationships can be learned, thus generating an update of current knowledge. Probabilistic inference in Bayesian networks allows instant and global use of the model; its graphical representation makes it easy to visualize and communicate the results. Making use of the most of data from one single experiment, as well as combining data of independent experiments makes Bayesian networks for analysing these and similarly complex and rich data set
Parallelization of the PC Algorithm
Abstract. This paper describes a parallel version of the PC algorithm for learning the structure of a Bayesian network from data. The PC algorithm is a constraint-based algorithm consisting of five steps where the first step is to perform a set of (conditional) independence tests while the remaining four steps relate to identifying the structure of the Bayesian network using the results of the (conditional) independence tests. In this paper, we describe a new approach to parallelization of the (conditional) independence testing as experiments illustrate that this is by far the most time consuming step. The proposed parallel PC algorithm is evaluated on data sets generated at random from five different realworld Bayesian networks. The results demonstrate that significant time performance improvements are possible using the proposed algorithm
A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks
Exact Bayesian structure discovery in Bayesian networks requires exponential
time and space. Using dynamic programming (DP), the fastest known sequential
algorithm computes the exact posterior probabilities of structural features in
time and space, if the number of nodes (variables) in the
Bayesian network is and the in-degree (the number of parents) per node is
bounded by a constant . Here we present a parallel algorithm capable of
computing the exact posterior probabilities for all edges with optimal
parallel space efficiency and nearly optimal parallel time efficiency. That is,
if processors are used, the run-time reduces to
and the space usage becomes per
processor. Our algorithm is based the observation that the subproblems in the
sequential DP algorithm constitute a - hypercube. We take a delicate way
to coordinate the computation of correlated DP procedures such that large
amount of data exchange is suppressed. Further, we develop parallel techniques
for two variants of the well-known \emph{zeta transform}, which have
applications outside the context of Bayesian networks. We demonstrate the
capability of our algorithm on datasets with up to 33 variables and its
scalability on up to 2048 processors. We apply our algorithm to a biological
data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure
Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark
In Machine Learning, the parent set identification problem is to find a set
of random variables that best explain selected variable given the data and some
predefined scoring function. This problem is a critical component to structure
learning of Bayesian networks and Markov blankets discovery, and thus has many
practical applications, ranging from fraud detection to clinical decision
support. In this paper, we introduce a new distributed memory approach to the
exact parent sets assignment problem. To achieve scalability, we derive
theoretical bounds to constraint the search space when MDL scoring function is
used, and we reorganize the underlying dynamic programming such that the
computational density is increased and fine-grain synchronization is
eliminated. We then design efficient realization of our approach in the Apache
Spark platform. Through experimental results, we demonstrate that the method
maintains strong scalability on a 500-core standalone Spark cluster, and it can
be used to efficiently process data sets with 70 variables, far beyond the
reach of the currently available solutions
- …