Search CORE

95,896 research outputs found

Bayesian Structure Learning and Sampling of Bayesian Networks with the R Package BiDAG

Author: Beerenwinkel Niko
Kuipers Jack
Moffa Giusi
Suter Polina
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 28/01/2023
Field of study

The R package BiDAG implements Markov chain Monte Carlo (MCMC) methods for structure learning and sampling of Bayesian networks. The package includes tools to search for a maximum a posteriori (MAP) graph and to sample graphs from the posterior distribution given the data. A new hybrid approach to structure learning enables inference in large graphs. In the first step, we define a reduced search space by means of the PC algorithm or based on prior knowledge. In the second step, an iterative order MCMC scheme proceeds to optimize the restricted search space and estimate the MAP graph. Sampling from the posterior distribution is implemented using either order or partition MCMC. The models and algorithms can handle both discrete and continuous data. The BiDAG package also provides an implementation of MCMC schemes for structure learning and sampling of dynamic Bayesian networks

Journal of Statistical Software

Bayesian network modeling for evolutionary genetic structures

Author: Cercone Nick
Yan Lisa Jing
Publication venue: Elsevier Ltd.
Publication date: 01/01/2010
Field of study

AbstractEvolutionary theory states that stronger genetic characteristics reflect the organism’s ability to adapt to its environment and to survive the harsh competition faced by every species. Evolution normally takes millions of generations to assess and measure changes in heredity. Determining the connections, which constrain genotypes and lead superior ones to survive is an interesting problem. In order to accelerate this process,we develop an artificial genetic dataset, based on an artificial life (AL) environment genetic expression (ALGAE). ALGAE can provide a useful and unique set of meaningful data, which can not only describe the characteristics of genetic data, but also simplify its complexity for later analysis.To explore the hidden dependencies among the variables, Bayesian Networks (BNs) are used to analyze genotype data derived from simulated evolutionary processes and provide a graphical model to describe various connections among genes. There are a number of models available for data analysis such as artificial neural networks, decision trees, factor analysis, BNs, and so on. Yet BNs have distinct advantages as analytical methods which can discern hidden relationships among variables. Two main approaches, constraint based and score based, have been used to learn the BN structure. However, both suit either sparse structures or dense structures. Firstly, we introduce a hybrid algorithm, called “the E-algorithm”, to complement the benefits and limitations in both approaches for BN structure learning. Testing E-algorithm against a standardized benchmark dataset ALARM, suggests valid and accurate results. BAyesian Network ANAlysis (BANANA) is then developed which incorporates the E-algorithm to analyze the genetic data from ALGAE. The resulting BN topological structure with conditional probabilistic distributions reveals the principles of how survivors adapt during evolution producing an optimal genetic profile for evolutionary fitness

CiteSeerX

Elsevier - Publisher Connector

Advances in Learning Bayesian Networks of Bounded Treewidth

Author: de Campos Cassio Polpo
Ji Qiang
Maua Denis Deratani
Nie Siqi
Publication venue
Publication date: 01/01/2014
Field of study

This work presents novel algorithms for learning Bayesian network structures with bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed-integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in uniformly sampling

k

-trees (maximal graphs of treewidth

k

), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that

k

-tree. Some properties of these methods are discussed and proven. The approaches are empirically compared to each other and to a state-of-the-art method for learning bounded treewidth structures on a collection of public data sets with up to 100 variables. The experiments show that our exact algorithm outperforms the state of the art, and that the approximate approach is fairly accurate.Comment: 23 pages, 2 figures, 3 table

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

CiteSeerX

Repository TU/e

Learning Large-Scale Bayesian Networks with the sparsebn Package

Author: Aragam Bryon
Gu Jiaying
Zhou Qing
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 10/03/2018
Field of study

Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets often have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we have developed a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing software packages for this task, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. Additionally, the sparsebn package is fully compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure

arXiv.org e-Print Archive

Journal of Statistical Software

Bayesian optimization of the PC algorithm for learning Gaussian Bayesian networks

Author: B Malone
B Shahriari
C Bielza
CE Rasmussen
D Colombo
I Tsamardinos
J Hausser
M Kalisch
M Scutari
RO Ness
SL Lauritzen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/06/2018
Field of study

The PC algorithm is a popular method for learning the structure of Gaussian Bayesian networks. It carries out statistical tests to determine absent edges in the network. It is hence governed by two parameters: (i) The type of test, and (ii) its significance level. These parameters are usually set to values recommended by an expert. Nevertheless, such an approach can suffer from human bias, leading to suboptimal reconstruction results. In this paper we consider a more principled approach for choosing these parameters in an automatic way. For this we optimize a reconstruction score evaluated on a set of different Gaussian Bayesian networks. This objective is expensive to evaluate and lacks a closed-form expression, which means that Bayesian optimization (BO) is a natural choice. BO methods use a model to guide the search and are hence able to exploit smoothness properties of the objective surface. We show that the parameters found by a BO method outperform those found by a random search strategy and the expert recommendation. Importantly, we have found that an often overlooked statistical test provides the best over-all reconstruction results

arXiv.org e-Print Archive

Crossref

Penalized Estimation of Directed Acyclic Graphs From Discrete Data

Author: Fu Fei
Gu Jiaying
Zhou Qing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2018
Field of study

Bayesian networks, with structure given by a directed acyclic graph (DAG), are a popular class of graphical models. However, learning Bayesian networks from discrete or categorical data is particularly challenging, due to the large parameter space and the difficulty in searching for a sparse structure. In this article, we develop a maximum penalized likelihood method to tackle this problem. Instead of the commonly used multinomial distribution, we model the conditional distribution of a node given its parents by multi-logit regression, in which an edge is parameterized by a set of coefficient vectors with dummy variables encoding the levels of a node. To obtain a sparse DAG, a group norm penalty is employed, and a blockwise coordinate descent algorithm is developed to maximize the penalized likelihood subject to the acyclicity constraint of a DAG. When interventional data are available, our method constructs a causal network, in which a directed edge represents a causal relation. We apply our method to various simulated and real data sets. The results show that our method is very competitive, compared to many existing methods, in DAG estimation from both interventional and high-dimensional observational data.Comment: To appear in Statistics and Computin

arXiv.org e-Print Archive

eScholarship - University of California