Search CORE

114 research outputs found

New Results for the MAP Problem in Bayesian Networks

Author: de Campos Cassio P.
Publication venue
Publication date: 01/01/2010
Field of study

This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.Comment: A couple of typos were fixed, as well as the notation in part of section 4, which was misleading. Theoretical and empirical results have not change

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Repository TU/e

Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks

Author: Conaty Diarmaid
de Campos Cassio P.
Mauá Denis D.
Publication venue
Publication date: 01/08/2017
Field of study

We discuss the computational complexity of approximating maximum a posteriori inference in sum-product networks. We first show NP-hardness in trees of height two by a reduction from maximum independent set; this implies non-approximability within a sublinear factor. We show that this is a tight bound, as we can find an approximation within a linear factor in networks of height two. We then show that, in trees of height three, it is NP-hard to approximate the problem within a factor

2^{f(n)}

for any sublinear function

f

of the size of the input

n

. Again, this bound is tight, as we prove that the usual max-product algorithm finds (in any network) approximations within factor

2^{c \cdot n}

for some constant

c < 1

. Last, we present a simple algorithm, and show that it provably produces solutions at least as good as, and potentially much better than, the max-product algorithm. We empirically analyze the proposed algorithm against max-product using synthetic and realistic networks.Comment: 18 page

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Confidence Statements for Ordering Quantiles

Author: de Campos Cassio P.
Pereira Carlos A. de B.
Polpo Adriano
Publication venue: 'MDPI AG'
Publication date: 17/07/2014
Field of study

This work proposes Quor, a simple yet effective nonparametric method to compare independent samples with respect to corresponding quantiles of their populations. The method is solely based on the order statistics of the samples, and independence is its only requirement. All computations are performed using exact distributions with no need for any asymptotic considerations, and yet can be run using a fast quadratic-time dynamic programming idea. Computational performance is essential in high-dimensional domains, such as gene expression data. We describe the approach and discuss on the most important assumptions, building a parallel with assumptions and properties of widely used techniques for the same problem. Experiments using real data from biomedical studies are performed to empirically compare Quor and other methods in a classification task over a selection of high-dimensional data sets

arXiv.org e-Print Archive

CiteSeerX

Bayesian Dependence Tests for Continuous, Binary and Mixed Continuous-Binary Variables

Author: Benavoli Alessio
de Campos Cassio P.
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

Tests for dependence of continuous, discrete and mixed continuous-discrete variables are ubiquitous in science. The goal of this paper is to derive Bayesian alternatives to frequentist null hypothesis significance tests for dependence. In particular, we will present three Bayesian tests for dependence of binary, continuous and mixed variables. These tests are nonparametric and based on the Dirichlet Process, which allows us to use the same prior model for all of them. Therefore, the tests are “consistent” among each other, in the sense that the probabilities that variables are dependent computed with these tests are commensurable across the different types of variables being tested. By means of simulations with artificial data, we show the effectiveness of the new tests

Queen's University Belfast Research Portal

Repository TU/e

Crossref

Directory of Open Access Journals

Learning Bounded Treewidth Bayesian Networks with Thousands of Variables

Author: Corani Giorgio
de Campos Cassio P.
Scanagatta Mauro
Zaffalon Marco
Publication venue
Publication date: 11/05/2016
Field of study

We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. We propose a novel algorithm for this task, able to scale to large domains and large treewidths. Our novel approach consistently outperforms the state of the art on data sets with up to ten thousand variables

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Learning Bayesian Networks with Incomplete Data by Augmentation

Author: Adel Tameem
de Campos Cassio P.
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 08/10/2016
Field of study

We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Enlighten

Association for the Advancement of Artificial Intelligence: AAAI Publications

Kuznetsov independence for interval-valued expectations and sets of probability distributions: Properties and algorithms

Author: Cozman Fabio G.
de Campos Cassio P.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Kuznetsov independence of variables X and Y means that, for any pair of bounded functions f(X)f(X) and g(Y)g(Y), E[f(X)g(Y)]=E[f(X)]⊠E[g(Y)]E[f(X)g(Y)]=E[f(X)]⊠E[g(Y)], where E[⋅]E[⋅] denotes interval-valued expectation and ⊠ denotes interval multiplication. We present properties of Kuznetsov independence for several variables, and connect it with other concepts of independence in the literature; in particular we show that strong extensions are always included in sets of probability distributions whose lower and upper expectations satisfy Kuznetsov independence. We introduce an algorithm that computes lower expectations subject to judgments of Kuznetsov independence by mixing column generation techniques with nonlinear programming. Finally, we define a concept of conditional Kuznetsov independence, and study its graphoid properties.ThefirstauthorhasbeenpartiallysupportedbyCNPq,andthisworkhasbeensupportedbyFAPESPthroughgrant04/09568-0.ThesecondauthorhasbeenpartiallysupportedbytheHaslerFoundationgrantno.10030

Queen's University Belfast Research Portal

CiteSeerX

Crossref

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Efficient learning of Bayesian networks with bounded tree-width

Author: de Campos Cassio P.
Ji Qiang
Nie Siqi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [24,29] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. Finding the best k-tree, however, is computationally intractable. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an informative score function to characterize the quality of a k-tree. To further improve the quality of the k-trees, we propose a probabilistic hill climbing approach that locally refines the sampled k-trees. The proposed algorithm can efficiently learn a quality Bayesian network with tree-width at most k. Experimental results demonstrate that our approach is more computationally efficient than the exact methods with comparable accuracy, and outperforms most existing approximate methods

Queen's University Belfast Research Portal

Repository TU/e

Crossref

Utrecht University Repository

Joints in Random Forests

Author: Campos Cassio P. de
Correia Alvaro H. C.
Peharz Robert
Publication venue
Publication date: 01/01/2020
Field of study

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features

Pure OAI Repository

International Symposium on Imprecise Probabilities : Theories and Applications, ISIPTA 2019, Proceedings

Author: De Bock Jasper
De Cooman Gert
P. de Campos Cassio
Quaeghebeur Erik
Wheeler Gregory
Publication venue
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography