113 research outputs found

    New Results for the MAP Problem in Bayesian Networks

    Full text link
    This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.Comment: A couple of typos were fixed, as well as the notation in part of section 4, which was misleading. Theoretical and empirical results have not change

    Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks

    Get PDF
    We discuss the computational complexity of approximating maximum a posteriori inference in sum-product networks. We first show NP-hardness in trees of height two by a reduction from maximum independent set; this implies non-approximability within a sublinear factor. We show that this is a tight bound, as we can find an approximation within a linear factor in networks of height two. We then show that, in trees of height three, it is NP-hard to approximate the problem within a factor 2f(n)2^{f(n)} for any sublinear function ff of the size of the input nn. Again, this bound is tight, as we prove that the usual max-product algorithm finds (in any network) approximations within factor 2câ‹…n2^{c \cdot n} for some constant c<1c < 1. Last, we present a simple algorithm, and show that it provably produces solutions at least as good as, and potentially much better than, the max-product algorithm. We empirically analyze the proposed algorithm against max-product using synthetic and realistic networks.Comment: 18 page

    Confidence Statements for Ordering Quantiles

    Full text link
    This work proposes Quor, a simple yet effective nonparametric method to compare independent samples with respect to corresponding quantiles of their populations. The method is solely based on the order statistics of the samples, and independence is its only requirement. All computations are performed using exact distributions with no need for any asymptotic considerations, and yet can be run using a fast quadratic-time dynamic programming idea. Computational performance is essential in high-dimensional domains, such as gene expression data. We describe the approach and discuss on the most important assumptions, building a parallel with assumptions and properties of widely used techniques for the same problem. Experiments using real data from biomedical studies are performed to empirically compare Quor and other methods in a classification task over a selection of high-dimensional data sets

    Bayesian Dependence Tests for Continuous, Binary and Mixed Continuous-Binary Variables

    Get PDF
    Tests for dependence of continuous, discrete and mixed continuous-discrete variables are ubiquitous in science. The goal of this paper is to derive Bayesian alternatives to frequentist null hypothesis significance tests for dependence. In particular, we will present three Bayesian tests for dependence of binary, continuous and mixed variables. These tests are nonparametric and based on the Dirichlet Process, which allows us to use the same prior model for all of them. Therefore, the tests are “consistent” among each other, in the sense that the probabilities that variables are dependent computed with these tests are commensurable across the different types of variables being tested. By means of simulations with artificial data, we show the effectiveness of the new tests

    Learning Bounded Treewidth Bayesian Networks with Thousands of Variables

    Get PDF
    We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. We propose a novel algorithm for this task, able to scale to large domains and large treewidths. Our novel approach consistently outperforms the state of the art on data sets with up to ten thousand variables

    Learning Bayesian Networks with Incomplete Data by Augmentation

    Get PDF
    We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach

    Kuznetsov independence for interval-valued expectations and sets of probability distributions: Properties and algorithms

    Get PDF
    Kuznetsov independence of variables X and Y means that, for any pair of bounded functions f(X)f(X) and g(Y)g(Y), E[f(X)g(Y)]=E[f(X)]⊠E[g(Y)]E[f(X)g(Y)]=E[f(X)]⊠E[g(Y)], where E[⋅]E[⋅] denotes interval-valued expectation and ⊠ denotes interval multiplication. We present properties of Kuznetsov independence for several variables, and connect it with other concepts of independence in the literature; in particular we show that strong extensions are always included in sets of probability distributions whose lower and upper expectations satisfy Kuznetsov independence. We introduce an algorithm that computes lower expectations subject to judgments of Kuznetsov independence by mixing column generation techniques with nonlinear programming. Finally, we define a concept of conditional Kuznetsov independence, and study its graphoid properties.ThefirstauthorhasbeenpartiallysupportedbyCNPq,andthisworkhasbeensupportedbyFAPESPthroughgrant04/09568-0.ThesecondauthorhasbeenpartiallysupportedbytheHaslerFoundationgrantno.10030

    Efficient learning of Bayesian networks with bounded tree-width

    Get PDF
    Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [24,29] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. Finding the best k-tree, however, is computationally intractable. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an informative score function to characterize the quality of a k-tree. To further improve the quality of the k-trees, we propose a probabilistic hill climbing approach that locally refines the sampled k-trees. The proposed algorithm can efficiently learn a quality Bayesian network with tree-width at most k. Experimental results demonstrate that our approach is more computationally efficient than the exact methods with comparable accuracy, and outperforms most existing approximate methods

    Ordering Quantiles through Confidence Statements

    Get PDF
    Ranking variables according to their relevance to predict an outcome is an important task in biomedicine. For instance, such ranking can be used for selecting a smaller number of genes for then applying other sophisticated experiments only on genes identified as important. A nonparametric method called Quor is designed to provide a confidence value for the order of arbitrary quantiles of different populations using independent samples. This confidence may provide insights about possible differences among groups and yields a ranking of importance for the variables. Computations are efficient and use exact distributions with no need for asymptotic considerations. Experiments with simulated data and with multiple real -omics data sets are performed, and they show advantages and disadvantages of the method. Quor has no assumptions but independence of samples, thus it might be a better option when assumptions of other methods cannot be asserted. The software is publicly available on CRAN
    • …
    corecore