26 research outputs found
A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks
Exact Bayesian structure discovery in Bayesian networks requires exponential
time and space. Using dynamic programming (DP), the fastest known sequential
algorithm computes the exact posterior probabilities of structural features in
time and space, if the number of nodes (variables) in the
Bayesian network is and the in-degree (the number of parents) per node is
bounded by a constant . Here we present a parallel algorithm capable of
computing the exact posterior probabilities for all edges with optimal
parallel space efficiency and nearly optimal parallel time efficiency. That is,
if processors are used, the run-time reduces to
and the space usage becomes per
processor. Our algorithm is based the observation that the subproblems in the
sequential DP algorithm constitute a - hypercube. We take a delicate way
to coordinate the computation of correlated DP procedures such that large
amount of data exchange is suppressed. Further, we develop parallel techniques
for two variants of the well-known \emph{zeta transform}, which have
applications outside the context of Bayesian networks. We demonstrate the
capability of our algorithm on datasets with up to 33 variables and its
scalability on up to 2048 processors. We apply our algorithm to a biological
data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure
Scalable Population Synthesis with Deep Generative Modeling
Population synthesis is concerned with the generation of synthetic yet
realistic representations of populations. It is a fundamental problem in the
modeling of transport where the synthetic populations of micro-agents represent
a key input to most agent-based models. In this paper, a new methodological
framework for how to 'grow' pools of micro-agents is presented. The model
framework adopts a deep generative modeling approach from machine learning
based on a Variational Autoencoder (VAE). Compared to the previous population
synthesis approaches, including Iterative Proportional Fitting (IPF), Gibbs
sampling and traditional generative models such as Bayesian Networks or Hidden
Markov Models, the proposed method allows fitting the full joint distribution
for high dimensions. The proposed methodology is compared with a conventional
Gibbs sampler and a Bayesian Network by using a large-scale Danish trip diary.
It is shown that, while these two methods outperform the VAE in the
low-dimensional case, they both suffer from scalability issues when the number
of modeled attributes increases. It is also shown that the Gibbs sampler
essentially replicates the agents from the original sample when the required
conditional distributions are estimated as frequency tables. In contrast, the
VAE allows addressing the problem of sampling zeros by generating agents that
are virtually different from those in the original data but have similar
statistical properties. The presented approach can support agent-based modeling
at all levels by enabling richer synthetic populations with smaller zones and
more detailed individual characteristics.Comment: 27 pages, 15 figures, 4 table
A Score-and-Search Approach to Learning Bayesian Networks with Noisy-OR Relations
A Bayesian network is a probabilistic graphical model that consists of a
directed acyclic graph (DAG), where each node is a random variable and attached
to each node is a conditional probability distribution (CPD). A Bayesian
network can be learned from data using the well-known score-and-search
approach, and within this approach a key consideration is how to simultaneously
learn the global structure in the form of the underlying DAG and the local
structure in the CPDs. Several useful forms of local structure have been
identified in the literature but thus far the score-and-search approach has
only been extended to handle local structure in form of context-specific
independence. In this paper, we show how to extend the score-and-search
approach to the important and widely useful case of noisy-OR relations. We
provide an effective gradient descent algorithm to score a candidate noisy-OR
using the widely used BIC score and we provide pruning rules that allow the
search to successfully scale to medium sized networks. Our empirical results
provide evidence for the success of our approach to learning Bayesian networks
that incorporate noisy-OR relations.Comment: Accepted to Probabilistic Graphical Models, 202
Bayes factorを用いたRAIアルゴリズムによる大規模ベイジアンネットワーク学習
漸近一致性をもつベイジアンネットワークの構造学習はNP困難である.これまで動的計画法やA*探索,整数計画法による探索アルゴリズムが開発されてきたが,未だに60ノード程度の構造学習を限界とし,大規模構造学習の実現のためには,全く異なるアプローチの開発が急務である.一方で因果モデルの研究分野では,条件付き独立性テスト(CIテスト)と方向付けによる画期的に計算量を削減した構造学習アプローチが提案されている.このアプローチは制約ベースアプローチと呼ばれ,RAIアルゴリズムが最も高精度な最先端学習法として知られている.しかしRAIアルゴリズムは,CIテストに仮説検定法または条件付き相互情報量を用いている.前者の精度は帰無仮説が正しい確率を表すp値とユーザが設定する有意水準に依存する.p値はデータ数の増加により小さい値を取り,誤って帰無仮説を棄却してしまう問題が知られている.一方で,後者の精度はしきい値の設定に強く影響する.したがって,漸近的に真の構造を学習できる保証がない.本論文では,漸近一致性を有するBayes factorを用いたCIテストをRAIアルゴリズムに組み込む.これにより,数百ノードをもつ大規模構造学習を実現する.数種類のベンチマークネットワークを用いたシミュレーション実験により,本手法の有意性を示す.A score-based learning Bayesian networks is NP-hard. On the other hands, constraint-based approach, that can dynamically relaxes the computational cost, is applicable to learning huge Bayesian network structures. The approach uses conditional independence (CI) tests based on the conditional mutual information and statistical testings. However, those CI tests have no consistency. In this paper, we propose a new constraint-based learning method that uses the CI test based on the Bayes factor, which have consistency. The proposed method combines it to the RAI algorithm, that is a state-of-the-art algorithm of the constraint-based approach. The experimental result shows our proposed method provides empirically best performance
Falcon Optimization Algorithm for Bayesian Networks Structure Learning
In machine-learning, one of the useful scientific models for producing the structure of knowledge is Bayesian network, which can draw probabilistic dependency relationships between variables. The score and search is a method used for learning the structure of a Bayesian network. The authors apply the Falcon Optimization Algorithm (FOA) as a new approach to learning the structure of Bayesian networks. This paper uses the Reversing, Deleting, Moving and Inserting operations to adopt the FOA for approaching the optimal solution of Bayesian network structure. Essentially, the falcon prey search strategy is used in the FOA algorithm. The result of the proposed technique is compared with Pigeon Inspired optimization, Greedy Search, and Simulated Annealing using the BDeu score function. The authors have also examined the performances of the confusion matrix of these techniques utilizing several benchmark data sets. As shown by the evaluations, the proposed method has more reliable performance than the other algorithms including producing better scores and accuracy values
Low Rank Directed Acyclic Graphs and Causal Structure Learning
Despite several important advances in recent years, learning causal
structures represented by directed acyclic graphs (DAGs) remains a challenging
task in high dimensional settings when the graphs to be learned are not sparse.
In particular, the recent formulation of structure learning as a continuous
optimization problem proved to have considerable advantages over the
traditional combinatorial formulation, but the performance of the resulting
algorithms is still wanting when the target graph is relatively large and
dense. In this paper we propose a novel approach to mitigate this problem, by
exploiting a low rank assumption regarding the (weighted) adjacency matrix of a
DAG causal model. We establish several useful results relating interpretable
graphical conditions to the low rank assumption, and show how to adapt existing
methods for causal structure learning to take advantage of this assumption. We
also provide empirical evidence for the utility of our low rank algorithms,
especially on graphs that are not sparse. Not only do they outperform
state-of-the-art algorithms when the low rank condition is satisfied, the
performance on randomly generated scale-free graphs is also very competitive
even though the true ranks may not be as low as is assumed
A Bounded Error, Anytime, Parallel Algorithm for Exact Bayesian Network Structure Learning
Abstract Bayesian network structure learning is NP-hard. Several anytime structure learning algorithms have been proposed which guarantee to learn optimal networks if given enough resources. In this paper, we describe a general purpose, anytime search algorithm with bounded error that also guarantees optimality. We give an efficient, sparse representation of a key data structure for structure learning. Empirical results show our algorithm often finds better networks more quickly than state of the art methods. They also highlight accepting a small, bounded amount of suboptimality can reduce the memory and runtime requirements of structure learning by several orders of magnitude
An Exact Approach to Learning Probabilistic Relational Model
Abstract Probabilistic Graphical Models (PGMs) offer a popular framework including a variety of statistical formalisms, such as Bayesian networks (BNs). These latter are able to depict real-world situations with high degree of uncertainty. Due to their power and flexibility, several extensions were proposed, ensuring thereby the suitability of their use. Probabilistic Relational Models (PRMs) extend BNs to work with relational databases rather than propositional data. Their construction represents an active area since it remains the most complicated issue. Only few works have been proposed in this direction, and most of them don't guarantee an optimal identification of their dependency structure. In this paper we intend to propose an approach that ensures returning an optimal PRM structure. It is inspired from a BN method whose performance was already proven