Search CORE

68,313 research outputs found

Learning Bayesian Networks with the bnlearn R Package

Author: Scutari Marco
Publication venue
Publication date: 01/01/2010
Field of study

bnlearn is an R package which includes several algorithms for learning the structure of Bayesian networks with either discrete or continuous variables. Both constraint-based and score-based algorithms are implemented, and can use the functionality provided by the snow package to improve their performance via parallel computing. Several network scores and conditional independence algorithms are available for both the learning algorithms and independent use. Advanced plotting options are provided by the Rgraphviz package.Comment: 22 pages, 4 picture

arXiv.org e-Print Archive

Directory of Open Access Journals

UCL Discovery

Oxford University Research Archive

Journal of Statistical Software

Parallelization of the PC Algorithm

Author: Jensen Frank
Langseth Helge
Madsen Anders L.
Nielsen Thomas D.
Salmerón Cerdán Antonio
Publication venue
Publication date: 01/01/2015
Field of study

This paper describes a parallel version of the PC algorithm for learning the structure of a Bayesian network from data. The PC algorithm is a constraint-based algorithm consisting of fi ve steps where the first step is to perform a set of (conditional) independence tests while the remaining four steps relate to identifying the structure of the Bayesian network using the results of the (conditional) independence tests. In this paper, we describe a new approach to parallelization of the (conditional) independence testing as experiments illustrate that this is by far the most time consuming step. The proposed parallel PC algorithm is evaluated on data sets generated at random from five different real- world Bayesian networks. The results demonstrate that signi cant time performance improvements are possible using the proposed algorithm

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

Learning Bayesian Networks with the bnlearn R Package

Author: Marco Scutari
Publication venue
Publication date
Field of study

bnlearn is an R package (R Development Core Team 2010) which includes several algorithms for learning the structure of Bayesian networks with either discrete or continuous variables. Both constraint-based and score-based algorithms are implemented, and can use the functionality provided by the snow package (Tierney et al. 2008) to improve their performance via parallel computing. Several network scores and conditional independence algorithms are available for both the learning algorithms and independent use. Advanced plotting options are provided by the Rgraphviz package (Gentry et al. 2010).

Research Papers in Economics

Bayesian networks to explain the effect of label information on product perception

Author: Boekel M.A.J.S., van
Dekker M.
Garczarek U.
Kole A.P.W.
Phan V.A.
Publication venue
Publication date: 01/01/2011
Field of study

Interdisciplinary approaches in food research require new methods in data analysis that are able to deal with complexity and facilitate the communication among model users. Four parallel full factorial within-subject designs were performed to examine the relative contribution to consumer product evaluation of intrinsic product properties and information given on packaging. Detailed experimental designs and results obtained from analyses of variance were published [1]. The data was analyzed again with the machine learning modelling technique Bayesian networks. The objective of the current paper is to explain basic features of this technique and its advantages over the standard statistical approach regarding handling of complexity and communication of results. With analysis of variance, visualization and interpretation of main effects and interactions effects becomes difficult in complex systems. The Bayesian network model offers the possibility to formally incorporate (domain) experts knowledge. By combining empirical data with the pre-defined network structure, new relationships can be learned, thus generating an update of current knowledge. Probabilistic inference in Bayesian networks allows instant and global use of the model; its graphical representation makes it easy to visualize and communicate the results. Making use of the most of data from one single experiment, as well as combining data of independent experiments makes Bayesian networks for analysing these and similarly complex and rich data set

Wageningen University & Research Publications

Parallelization of the PC Algorithm

Author: Anders L Madsen
Antonio Salmerón
Frank Jensen
Helge Langseth
Thomas D Nielsen
Publication venue
Publication date: 24/04/2020
Field of study

Abstract. This paper describes a parallel version of the PC algorithm for learning the structure of a Bayesian network from data. The PC algorithm is a constraint-based algorithm consisting of five steps where the first step is to perform a set of (conditional) independence tests while the remaining four steps relate to identifying the structure of the Bayesian network using the results of the (conditional) independence tests. In this paper, we describe a new approach to parallelization of the (conditional) independence testing as experiments illustrate that this is by far the most time consuming step. The proposed parallel PC algorithm is evaluated on data sets generated at random from five different realworld Bayesian networks. The results demonstrate that significant time performance improvements are possible using the proposed algorithm

CiteSeerX

A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks

Author: Jin Tian
Olga Nikolova
Sage Bionetworks
Srinivas Aluru
Yetian Chen
Publication venue
Publication date: 13/08/2016
Field of study

Exact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in

O(2(d+1)n2^n)

time and space, if the number of nodes (variables) in the Bayesian network is

n

and the in-degree (the number of parents) per node is bounded by a constant

d

. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all

n(n-1)

edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if

p=2^k

processors are used, the run-time reduces to

O(5(d+1)n2^{n-k}+k(n-k)^d)

and the space usage becomes

O(n2^{n-k})

per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a

n

D

hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

Author: Karan Subhadeep
Zola Jaroslaw
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/10/2017
Field of study

In Machine Learning, the parent set identification problem is to find a set of random variables that best explain selected variable given the data and some predefined scoring function. This problem is a critical component to structure learning of Bayesian networks and Markov blankets discovery, and thus has many practical applications, ranging from fraud detection to clinical decision support. In this paper, we introduce a new distributed memory approach to the exact parent sets assignment problem. To achieve scalability, we derive theoretical bounds to constraint the search space when MDL scoring function is used, and we reorganize the underlying dynamic programming such that the computational density is increased and fine-grain synchronization is eliminated. We then design efficient realization of our approach in the Apache Spark platform. Through experimental results, we demonstrate that the method maintains strong scalability on a 500-core standalone Spark cluster, and it can be used to efficiently process data sets with 70 variables, far beyond the reach of the currently available solutions

arXiv.org e-Print Archive

Crossref