401 research outputs found

    A Monte-Carlo Algorithm for Probabilistic Propagation in Belief Networks based on Importance Sampling and Stratified Simulation Techniques

    Get PDF
    A class of Monte Carlo algorithms for probability propagation in belief networks is given. The simulation is based on a two steps procedure. The first one is a node deletion technique to calculate the ’a posteriori’ distribution on a variable, with the particularity that when exact computations are too costly, they are carried out in an approximate way. In the second step, the computations done in the first one are used to obtain random configurations for the variables of interest. These configurations are weighted according to the importance sampling methodology. Different particular algorithms are obtained depending on the approximation procedure used in the first step and in the way of obtaining the random configurations. In this last case, a stratified sampling technique is used, which has been adapted to be applied to very large networks without problems with round-off errors

    Supervised Classification Using Probabilistic Decision Graphs

    Get PDF
    A new model for supervised classification based on probabilistic decision graphs is introduced. A probabilistic decision graph (PDG) is a graphical model that efficiently captures certain context specific independencies that are not easily represented by other graphical models traditionally used for classification, such as the Naïve Bayes (NB) or Classification Trees (CT). This means that the PDG model can capture some distributions using fewer parameters than classical models. Two approaches for constructing a PDG for classification are proposed. The first is to directly construct the model from a dataset of labelled data, while the second is to transform a previously obtained Bayesian classifier into a PDG model that can then be refined. These two approaches are compared with a wide range of classical approaches to the supervised classification problem on a number of both real world databases and artificially generated data

    Learning naive Bayes regression models with missing data using mixtures of truncated exponentials

    Get PDF
    In the last years, mixtures of truncated exponentials (MTEs) have received much attention within the context of probabilistic graphical models, as they provide a framework for hybrid Bayesian networks which is compatible with standard inference algorithms and no restriction on the structure of the network is considered. Recently, MTEs have also been successfully applied to regression problems in which the underlying network structure is a na ̈ıve Bayes or a TAN. However, the algorithms described so far in the literature operate over complete databases. In this paper we propose an iterative algorithm for constructing na ̈ıve Bayes regression models from incomplete databases. It is based on a variation of the data augmentation method in which the missing values of the explanatory variables are filled by simulating from their posterior distributions, while the missing values of the response variable are generated from its conditional expectation given the explanatory variables. We illustrate through a set of experiments with various databases that the proposed algorithm behaves reasonably well

    LEARNING BAYESIAN NETWORKS FOR REGRESSION FROM INCOMPLETE DATABASES*

    Get PDF
    In this paper we address the problem of inducing Bayesian network models for regression from incomplete databases. We use mixtures of truncated exponentials (MTEs) to represent the joint distribution in the induced networks. We consider two particular Bayesian network structures, the so-called na¨ıve Bayes and TAN, which have been successfully used as regression models when learning from complete data. We propose an iterative procedure for inducing the models, based on a variation of the data augmentation method in which the missing values of the explanatory variables are filled by simulating from their posterior distributions, while the missing values of the response variable are generated using the conditional expectation of the response given the explanatory variables. We also consider the refinement of the regression models by using variable selection and bias reduction. We illustrate through a set of experiments with various databases the performance of the proposed algorithms

    Hybrid Bayesian Networks Using Mixtures of Truncated Basis Functions

    Get PDF
    This paper introduces MoTBFs, an R package for manipulating mixtures of truncated basis functions. This class of functions allows the representation of joint probability distributions involving discrete and continuous variables simultaneously, and includes mixtures of truncated exponentials and mixtures of polynomials as special cases. The package implements functions for learning the parameters of univariate, multivariate, and conditional distributions, and provides support for parameter learning in Bayesian networks with both discrete and continuous variables. Probabilistic inference using forward sampling is also implemented. Part of the functionality of the MoTBFs package relies on the bnlearn package, which includes functions for learning the structure of a Bayesian network from a data set. Leveraging this functionality, the MoTBFs package supports learning of MoTBF-based Bayesian networks over hybrid domains. We give a brief introduction to the methodological context and algorithms implemented in the package. An extensive illustrative example is used to describe the package, its functionality, and its usage

    Modelling and Inference with Conditional Gaussian Probabilistic Decision Graphs*

    Get PDF
    Probabilistic decision graphs (PDGs) are probabilistic graphical models that represent a factorisation of a discrete joint probability distribution using a “decision graph”-like structure over local marginal parameters. The structure of a PDG enables the model to capture some context specific independence relations that are not representable in the structure of more commonly used graphical models such as Bayesian networks and Markov networks. This sometimes makes operations in PDGs more efficient than in alternative models. PDGs have previously been defined only in the discrete case, assuming a multinomial joint distribution over the variables in the model. We extend PDGs to incorporate continuous variables, by assuming a Conditional Gaussian (CG) joint distribution. We also show how inference can be carried out in an efficient way

    Parallelization of the PC Algorithm

    Get PDF
    This paper describes a parallel version of the PC algorithm for learning the structure of a Bayesian network from data. The PC algorithm is a constraint-based algorithm consisting of fi ve steps where the first step is to perform a set of (conditional) independence tests while the remaining four steps relate to identifying the structure of the Bayesian network using the results of the (conditional) independence tests. In this paper, we describe a new approach to parallelization of the (conditional) independence testing as experiments illustrate that this is by far the most time consuming step. The proposed parallel PC algorithm is evaluated on data sets generated at random from five different real- world Bayesian networks. The results demonstrate that signi cant time performance improvements are possible using the proposed algorithm

    [Accepted Manuscript] Worldwide comparison of ovarian cancer survival: Histological group and stage at diagnosis (CONCORD-2)

    Get PDF
    Ovarian cancer comprises several histological groups with widely differing levels of survival. We aimed to explore international variation in survival for each group to help interpret international differences in survival from all ovarian cancers combined. We also examined differences in stage-specific survival. The CONCORD programme is the largest population-based study of global trends in cancer survival, including data from 60 countries for 695,932 women (aged 15-99years) diagnosed with ovarian cancer during 1995-2009. We defined six histological groups: type I epithelial, type II epithelial, germ cell, sex cord-stromal, other specific non-epithelial and non-specific morphology, and estimated age-standardised 5-year net survival for each country by histological group. We also analysed data from 67 cancer registries for 233,659 women diagnosed from 2001 to 2009, for whom information on stage at diagnosis was available. We estimated age-standardised 5-year net survival by stage at diagnosis (localised or advanced). Survival from type I epithelial ovarian tumours for women diagnosed during 2005-09 ranged from 40 to 70%. Survival from type II epithelial tumours was much lower (20-45%). Survival from germ cell tumours was higher than that of type II epithelial tumours, but also varied widely between countries. Survival for sex-cord stromal tumours was higher than for the five other groups. Survival from localised tumours was much higher than for advanced disease (80% vs. 30%). There is wide variation in survival between histological groups, and stage at diagnosis remains an important factor in ovarian cancer survival. International comparisons of ovarian cancer survival should incorporate histology
    corecore