26 research outputs found

    A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks

    Full text link
    Exact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in O(2(d+1)n2n)O(2(d+1)n2^n) time and space, if the number of nodes (variables) in the Bayesian network is nn and the in-degree (the number of parents) per node is bounded by a constant dd. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all n(n1)n(n-1) edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if p=2kp=2^k processors are used, the run-time reduces to O(5(d+1)n2nk+k(nk)d)O(5(d+1)n2^{n-k}+k(n-k)^d) and the space usage becomes O(n2nk)O(n2^{n-k}) per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a nn-DD hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure

    Scalable Population Synthesis with Deep Generative Modeling

    Full text link
    Population synthesis is concerned with the generation of synthetic yet realistic representations of populations. It is a fundamental problem in the modeling of transport where the synthetic populations of micro-agents represent a key input to most agent-based models. In this paper, a new methodological framework for how to 'grow' pools of micro-agents is presented. The model framework adopts a deep generative modeling approach from machine learning based on a Variational Autoencoder (VAE). Compared to the previous population synthesis approaches, including Iterative Proportional Fitting (IPF), Gibbs sampling and traditional generative models such as Bayesian Networks or Hidden Markov Models, the proposed method allows fitting the full joint distribution for high dimensions. The proposed methodology is compared with a conventional Gibbs sampler and a Bayesian Network by using a large-scale Danish trip diary. It is shown that, while these two methods outperform the VAE in the low-dimensional case, they both suffer from scalability issues when the number of modeled attributes increases. It is also shown that the Gibbs sampler essentially replicates the agents from the original sample when the required conditional distributions are estimated as frequency tables. In contrast, the VAE allows addressing the problem of sampling zeros by generating agents that are virtually different from those in the original data but have similar statistical properties. The presented approach can support agent-based modeling at all levels by enabling richer synthetic populations with smaller zones and more detailed individual characteristics.Comment: 27 pages, 15 figures, 4 table

    A Score-and-Search Approach to Learning Bayesian Networks with Noisy-OR Relations

    Full text link
    A Bayesian network is a probabilistic graphical model that consists of a directed acyclic graph (DAG), where each node is a random variable and attached to each node is a conditional probability distribution (CPD). A Bayesian network can be learned from data using the well-known score-and-search approach, and within this approach a key consideration is how to simultaneously learn the global structure in the form of the underlying DAG and the local structure in the CPDs. Several useful forms of local structure have been identified in the literature but thus far the score-and-search approach has only been extended to handle local structure in form of context-specific independence. In this paper, we show how to extend the score-and-search approach to the important and widely useful case of noisy-OR relations. We provide an effective gradient descent algorithm to score a candidate noisy-OR using the widely used BIC score and we provide pruning rules that allow the search to successfully scale to medium sized networks. Our empirical results provide evidence for the success of our approach to learning Bayesian networks that incorporate noisy-OR relations.Comment: Accepted to Probabilistic Graphical Models, 202

    Bayes factorを用いたRAIアルゴリズムによる大規模ベイジアンネットワーク学習

    Get PDF
    漸近一致性をもつベイジアンネットワークの構造学習はNP困難である.これまで動的計画法やA*探索,整数計画法による探索アルゴリズムが開発されてきたが,未だに60ノード程度の構造学習を限界とし,大規模構造学習の実現のためには,全く異なるアプローチの開発が急務である.一方で因果モデルの研究分野では,条件付き独立性テスト(CIテスト)と方向付けによる画期的に計算量を削減した構造学習アプローチが提案されている.このアプローチは制約ベースアプローチと呼ばれ,RAIアルゴリズムが最も高精度な最先端学習法として知られている.しかしRAIアルゴリズムは,CIテストに仮説検定法または条件付き相互情報量を用いている.前者の精度は帰無仮説が正しい確率を表すp値とユーザが設定する有意水準に依存する.p値はデータ数の増加により小さい値を取り,誤って帰無仮説を棄却してしまう問題が知られている.一方で,後者の精度はしきい値の設定に強く影響する.したがって,漸近的に真の構造を学習できる保証がない.本論文では,漸近一致性を有するBayes factorを用いたCIテストをRAIアルゴリズムに組み込む.これにより,数百ノードをもつ大規模構造学習を実現する.数種類のベンチマークネットワークを用いたシミュレーション実験により,本手法の有意性を示す.A score-based learning Bayesian networks is NP-hard. On the other hands, constraint-based approach, that can dynamically relaxes the computational cost, is applicable to learning huge Bayesian network structures. The approach uses conditional independence (CI) tests based on the conditional mutual information and statistical testings. However, those CI tests have no consistency. In this paper, we propose a new constraint-based learning method that uses the CI test based on the Bayes factor, which have consistency. The proposed method combines it to the RAI algorithm, that is a state-of-the-art algorithm of the constraint-based approach. The experimental result shows our proposed method provides empirically best performance

    Falcon Optimization Algorithm for Bayesian Networks Structure Learning

    Get PDF
    In machine-learning, one of the useful scientific models for producing the structure of knowledge is Bayesian network, which can draw probabilistic dependency relationships between variables. The score and search is a method used for learning the structure of a Bayesian network. The authors apply the Falcon Optimization Algorithm (FOA) as a new approach to learning the structure of Bayesian networks. This paper uses the Reversing, Deleting, Moving and Inserting operations to adopt the FOA for approaching the optimal solution of Bayesian network structure. Essentially, the falcon prey search strategy is used in the FOA algorithm. The result of the proposed technique is compared with Pigeon Inspired optimization, Greedy Search, and Simulated Annealing using the BDeu score function. The authors have also examined the performances of the confusion matrix of these techniques utilizing several benchmark data sets. As shown by the evaluations, the proposed method has more reliable performance than the other algorithms including producing better scores and accuracy values

    Low Rank Directed Acyclic Graphs and Causal Structure Learning

    Full text link
    Despite several important advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high dimensional settings when the graphs to be learned are not sparse. In particular, the recent formulation of structure learning as a continuous optimization problem proved to have considerable advantages over the traditional combinatorial formulation, but the performance of the resulting algorithms is still wanting when the target graph is relatively large and dense. In this paper we propose a novel approach to mitigate this problem, by exploiting a low rank assumption regarding the (weighted) adjacency matrix of a DAG causal model. We establish several useful results relating interpretable graphical conditions to the low rank assumption, and show how to adapt existing methods for causal structure learning to take advantage of this assumption. We also provide empirical evidence for the utility of our low rank algorithms, especially on graphs that are not sparse. Not only do they outperform state-of-the-art algorithms when the low rank condition is satisfied, the performance on randomly generated scale-free graphs is also very competitive even though the true ranks may not be as low as is assumed

    A Bounded Error, Anytime, Parallel Algorithm for Exact Bayesian Network Structure Learning

    Get PDF
    Abstract Bayesian network structure learning is NP-hard. Several anytime structure learning algorithms have been proposed which guarantee to learn optimal networks if given enough resources. In this paper, we describe a general purpose, anytime search algorithm with bounded error that also guarantees optimality. We give an efficient, sparse representation of a key data structure for structure learning. Empirical results show our algorithm often finds better networks more quickly than state of the art methods. They also highlight accepting a small, bounded amount of suboptimality can reduce the memory and runtime requirements of structure learning by several orders of magnitude

    An Exact Approach to Learning Probabilistic Relational Model

    Get PDF
    Abstract Probabilistic Graphical Models (PGMs) offer a popular framework including a variety of statistical formalisms, such as Bayesian networks (BNs). These latter are able to depict real-world situations with high degree of uncertainty. Due to their power and flexibility, several extensions were proposed, ensuring thereby the suitability of their use. Probabilistic Relational Models (PRMs) extend BNs to work with relational databases rather than propositional data. Their construction represents an active area since it remains the most complicated issue. Only few works have been proposed in this direction, and most of them don't guarantee an optimal identification of their dependency structure. In this paper we intend to propose an approach that ensures returning an optimal PRM structure. It is inspired from a BN method whose performance was already proven
    corecore