225 research outputs found

    New Results for the MAP Problem in Bayesian Networks

    Full text link
    This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.Comment: A couple of typos were fixed, as well as the notation in part of section 4, which was misleading. Theoretical and empirical results have not change

    Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks

    Get PDF
    We discuss the computational complexity of approximating maximum a posteriori inference in sum-product networks. We first show NP-hardness in trees of height two by a reduction from maximum independent set; this implies non-approximability within a sublinear factor. We show that this is a tight bound, as we can find an approximation within a linear factor in networks of height two. We then show that, in trees of height three, it is NP-hard to approximate the problem within a factor 2f(n)2^{f(n)} for any sublinear function ff of the size of the input nn. Again, this bound is tight, as we prove that the usual max-product algorithm finds (in any network) approximations within factor 2câ‹…n2^{c \cdot n} for some constant c<1c < 1. Last, we present a simple algorithm, and show that it provably produces solutions at least as good as, and potentially much better than, the max-product algorithm. We empirically analyze the proposed algorithm against max-product using synthetic and realistic networks.Comment: 18 page

    On Pruning for Score-Based Bayesian Network Structure Learning

    Get PDF
    Many algorithms for score-based Bayesian network structure learning (BNSL), in particular exact ones, take as input a collection of potentially optimal parent sets for each variable in the data. Constructing such collections naively is computationally intensive since the number of parent sets grows exponentially with the number of variables. Thus, pruning techniques are not only desirable but essential. While good pruning rules exist for the Bayesian Information Criterion (BIC), current results for the Bayesian Dirichlet equivalent uniform (BDeu) score reduce the search space very modestly, hampering the use of the (often preferred) BDeu. We derive new non-trivial theoretical upper bounds for the BDeu score that considerably improve on the state-of-the-art. Since the new bounds are mathematically proven to be tighter than previous ones and at little extra computational cost, they are a promising addition to BNSL methods

    Confidence Statements for Ordering Quantiles

    Full text link
    This work proposes Quor, a simple yet effective nonparametric method to compare independent samples with respect to corresponding quantiles of their populations. The method is solely based on the order statistics of the samples, and independence is its only requirement. All computations are performed using exact distributions with no need for any asymptotic considerations, and yet can be run using a fast quadratic-time dynamic programming idea. Computational performance is essential in high-dimensional domains, such as gene expression data. We describe the approach and discuss on the most important assumptions, building a parallel with assumptions and properties of widely used techniques for the same problem. Experiments using real data from biomedical studies are performed to empirically compare Quor and other methods in a classification task over a selection of high-dimensional data sets

    Anytime Marginal MAP Inference

    Full text link
    This paper presents a new anytime algorithm for the marginal MAP problem in graphical models. The algorithm is described in detail, its complexity and convergence rate are studied, and relations to previous theoretical results for the problem are discussed. It is shown that the algorithm runs in polynomial-time if the underlying graph of the model has bounded tree-width, and that it provides guarantees to the lower and upper bounds obtained within a fixed amount of computational resources. Experiments with both real and synthetic generated models highlight its main characteristics and show that it compares favorably against Park and Darwiche's systematic search, particularly in the case of problems with many MAP variables and moderate tree-width.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

    Learning Bounded Treewidth Bayesian Networks with Thousands of Variables

    Get PDF
    We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. We propose a novel algorithm for this task, able to scale to large domains and large treewidths. Our novel approach consistently outperforms the state of the art on data sets with up to ten thousand variables

    Bayesian Dependence Tests for Continuous, Binary and Mixed Continuous-Binary Variables

    Get PDF
    Tests for dependence of continuous, discrete and mixed continuous-discrete variables are ubiquitous in science. The goal of this paper is to derive Bayesian alternatives to frequentist null hypothesis significance tests for dependence. In particular, we will present three Bayesian tests for dependence of binary, continuous and mixed variables. These tests are nonparametric and based on the Dirichlet Process, which allows us to use the same prior model for all of them. Therefore, the tests are “consistent” among each other, in the sense that the probabilities that variables are dependent computed with these tests are commensurable across the different types of variables being tested. By means of simulations with artificial data, we show the effectiveness of the new tests

    Advances in Learning Bayesian Networks of Bounded Treewidth

    Full text link
    This work presents novel algorithms for learning Bayesian network structures with bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed-integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in uniformly sampling kk-trees (maximal graphs of treewidth kk), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that kk-tree. Some properties of these methods are discussed and proven. The approaches are empirically compared to each other and to a state-of-the-art method for learning bounded treewidth structures on a collection of public data sets with up to 100 variables. The experiments show that our exact algorithm outperforms the state of the art, and that the approximate approach is fairly accurate.Comment: 23 pages, 2 figures, 3 table

    Learning Bayesian Networks with Incomplete Data by Augmentation

    Get PDF
    We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach
    • …
    corecore