613 research outputs found

    On the Prior and Posterior Distributions Used in Graphical Modelling

    Full text link
    Graphical model learning and inference are often performed using Bayesian techniques. In particular, learning is usually performed in two separate steps. First, the graph structure is learned from the data; then the parameters of the model are estimated conditional on that graph structure. While the probability distributions involved in this second step have been studied in depth, the ones used in the first step have not been explored in as much detail. In this paper, we will study the prior and posterior distributions defined over the space of the graph structures for the purpose of learning the structure of a graphical model. In particular, we will provide a characterisation of the behaviour of those distributions as a function of the possible edges of the graph. We will then use the properties resulting from this characterisation to define measures of structural variability for both Bayesian and Markov networks, and we will point out some of their possible applications.Comment: 28 pages, 6 figure

    Structural learning of bayesian networks using statistical constraints

    Get PDF
    Bayesian Networks are probabilistic graphical models that encode in a compact manner the conditional probabilistic relations over a set of random variables. In this thesis we address the NP-complete problem of learning the structure of a Bayesian Network from observed data. We first present two algorithms from the state of the art: the Max Min Parent Children Algorithm (MMPC), Tsamardinos et al. , which uses statistical tests of independence to restrict the search space for a simple local search algorithm, and a recent complete Branch and Bound technique, de Campos and Ji . We propose in the thesis a novel hybrid algorithm, which uses the constraints given by the MMPC algorithm for reducing the size of the search space of the complete B&B algorithm. Two different statistical tests of independence were implemented: the simple asymptotic test from Tsamardinos et al. and a permutation-based test, more recently proposed by Tsamardinos and Borboudakis . We tested the different techniques for three well known Bayesian Networks in a realistic scenario, with limited memory and data sets with small sample size. Our results are promising and show that the hybrid algorithm exhibits a minimal loss in score, against a considerable gain in computational time, with respect to the original Branch and Bound algorithm, and that none of the two independence tests consistently dominates the other in terms of computational time gain. Le Reti Bayesiane sono modelli grafici probabilistici che cofificano in maniera compatta le relazioni di dipendenza su di un insieme di variabili aleatorie. In questa tesi ci occupiamo del problema NP-Completo di apprendere la struttura di una rete da dei dati osservati. Descriviamo per prima cosa due algoritmi dallo stato dell’arte: l’algoritmo Max Min Parent Children (MMPC) (Tsamardinos et al. [3]) che usa test statistici di indipendenza per restringere lo spazio di ricerca per un semplice algoritmo di ricerca locale, e un algoritmo completo basato su Branch and Bound (de Campos e Ji ). In questa tesi proponiamo un nuovo algoritmo ibrido che usa i vincoli creati da MMPC per ridurre lo spazio di ricerca per l’algoritmo completo B&B. Due diversi test sono stati implementati: un semplice test asintotico da Tsamardinos et al. e un test basato su permutazioni più recentemente proposto da Tsamardinos e Borboudakis. Abbiamo sperimentato le diverse tecniche per tre reti ben conosciute in uno scenario realistico, con memoria limitati e scarsità di dati. I nostri risultati sono promettenti e mostrano una minima perdita nella qualità a fronte di un considerevole guadagno in tempo computazionale, e nessuno dei due test ha dominato l’altro in termini di guadagno di temp

    A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

    Get PDF
    We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

    Improved Bayesian Network Structure Learning in the Model Averaging Paradigm

    Get PDF
    A Bayesian network (BN) is a probabilistic graphical model with applications in knowledge discovery and prediction. Its structure can be learned from data using the well-known score-and-search approach, where a scoring function is used to evaluate the fit of a proposed BN to the data in an unsupervised manner, and the space of directed acyclic graphs is searched for the best-scoring BNs. However, selecting a single model (i.e., the best-scoring BN) is often not the best choice. When one is learning a BN from limited data, selecting a single model may be misleading as there may be many other BNs that have scores that are close to optimal, and the posterior probability of even the best-scoring BN is often close to zero. A more preferred alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging. A widely used data analysis methodology is to: (i) learn a set of plausible networks that fit the data, (ii) perform model averaging to obtain confidence measure for each edge, and (iii) select a threshold and report all edges with confidence higher than the threshold. In this manner, a representative network can be constructed from the edges that are deemed significant that can then be examined for probabilistic dependencies and possible cause-effect relations. This thesis presents several improvements to Bayesian network structure learning that benefit the data analysis methodology. We propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches. We empirically study a selection of widely used and also recently proposed scoring functions. We address design limitations of previous empirical studies by scaling our experiments to larger BNs, comparing on an extensive set of both ground truth BNs and real-world datasets, considering alternative performance metrics, and comparing scoring functions on two model averaging frameworks: the bootstrap and the credible set. Contrary to previous recommendations based on finding a single structure, we find that for model averaging the BDeu scoring function is the preferred choice in most scenarios for the bootstrap framework and a recent score qNML is the preferred choice for the credible set framework. We identify an important shortcoming in a widely used threshold selection method. We then propose a simple transfer learning approach for maximizing target metrics and selecting a threshold that can be generalized from proxy datasets to the target dataset and show on an extensive set of benchmarks that it can perform significantly better than previous approaches. We demonstrate via ensemble methods that combining results from multiple scores significantly improve both the bootstrap and the credible set approach on various metrics, and that combining all scores from both approaches still yields better results

    Learning Bayesian network equivalence classes using ant colony optimisation

    Get PDF
    Bayesian networks have become an indispensable tool in the modelling of uncertain knowledge. Conceptually, they consist of two parts: a directed acyclic graph called the structure, and conditional probability distributions attached to each node known as the parameters. As a result of their expressiveness, understandability and rigorous mathematical basis, Bayesian networks have become one of the first methods investigated, when faced with an uncertain problem domain. However, a recurring problem persists in specifying a Bayesian network. Both the structure and parameters can be difficult for experts to conceive, especially if their knowledge is tacit.To counteract these problems, research has been ongoing, on learning both the structure and parameters of Bayesian networks from data. Whilst there are simple methods for learning the parameters, learning the structure has proved harder. Part ofthis stems from the NP-hardness of the problem and the super-exponential space of possible structures. To help solve this task, this thesis seeks to employ a relatively new technique, that has had much success in tackling NP-hard problems. This technique is called ant colony optimisation. Ant colony optimisation is a metaheuristic based on the behaviour of ants acting together in a colony. It uses the stochastic activity of artificial ants to find good solutions to combinatorial optimisation problems. In the current work, this method is applied to the problem of searching through the space of equivalence classes of Bayesian networks, in order to find a good match against a set of data. The system uses operators that evaluate potential modifications to a current state. Each of the modifications is scored and the results used to inform the search. In order to facilitate these steps, other techniques are also devised, to speed up the learning process. The techniques includeThe techniques are tested by sampling data from gold standard networks and learning structures from this sampled data. These structures are analysed using various goodnessof-fit measures to see how well the algorithms perform. The measures include structural similarity metrics and Bayesian scoring metrics. The results are compared in depth against systems that also use ant colony optimisation and other methods, including evolutionary programming and greedy heuristics. Also, comparisons are made to well known state-of-the-art algorithms and a study performed on a real-life data set. The results show favourable performance compared to the other methods and on modelling the real-life data

    Learning Bayesian network equivalence classes with ant colony optimization

    Get PDF
    Bayesian networks are a useful tool in the representation of uncertain knowledge. This paper proposes a new algorithm called ACO-E, to learn the structure of a Bayesian network. It does this by conducting a search through the space of equivalence classes of Bayesian networks using Ant Colony Optimization (ACO). To this end, two novel extensions of traditional ACO techniques are proposed and implemented. Firstly, multiple types of moves are allowed. Secondly, moves can be given in terms of indices that are not based on construction graph nodes. The results of testing show that ACO-E performs better than a greedy search and other state-of-the-art and metaheuristic algorithms whilst searching in the space of equivalence classe

    The Intersection-Validation Method for Evaluating Bayesian Network Structure Learning Without Ground Truth

    Get PDF
    Structure learning algorithms for Bayesian networks are typically evaluated by examining how accurately they recover the correct structure, given data sampled from a benchmark network. A popular metric for the evaluation is the structural Hamming distance. For real-world data there is no ground truth to compare the learned structures against. Thus, to use such data, one has been limited to evaluating the algorithms' predictive performance on separate test data or via cross-validation. The predictive performance, however, depends on the parameters of the network, for which some fixed values can be used or which can be marginalized over to obtain the posterior predictive distribution using some parameter prior. Predictive performance therefore has an intricate relationship to structural accuracy -- the two do not always perfectly mirror each other. We present intersection-validation, a method for evaluating structure learning without ground truth. The input to the method is a dataset and a set of compared algorithms. First, a partial structure, called the agreement graph, is constructed consisting of the features that the algorithms agree on given the dataset. Then, the algorithms are evaluated against the agreement graph on subsamples of the data, using a variant of the structural Hamming distance. To test the method's validity we define a set of algorithms that return a score maximizing structure using various scoring functions in combination with an exact search algorithm. Given data sampled from benchmark networks, we compare the results of the method to those obtained through direct evaluation against the ground truth structure. Specifically, we consider whether the rankings for the algorithms determined by the distances measured using the two methods conform with each other, and whether there is a strong positive correlation between the two distances. We find that across the experiments the method gives a correct ranking for two algorithms (relative to each other) with an accuracy of approximately 0.9, including when the method is applied onto a set of only two algorithms. The Pearson correlations between the distances are fairly strong but vary to a great extent, depending on the benchmark network, the amount of data given as input to intersection-validation and the sample size at which the distances are measured. We also attempt to predict when the method produces accurate results from information available in situations where the method would be used in practice, namely, without knowledge of the ground truth. The results from these experiments indicate that although some predictors can be found they do not have the same strength in all instances of use of the method. Finally, to illustrate the uses for the method we apply it on a number of real-world datasets in order to study the effect of structure priors on learning

    Three algorithms for causal learning

    Get PDF
    The field of causal learning has grown in the past decade, establishing itself as a major focus in artificial intelligence research. Traditionally, approaches to causal learning are split into two areas. One area involves the learning of structures from observational data alone and the second, involves the methodologies of conducting and learning from experiments. In this dissertation, I investigate three different aspects of causal learning, all of which are based on the causal Bayesian network framework. Constraint based structure search algorithms that learn partially directed acyclic graphs as causal models from observational data rely on the faithfulness assumption, which is often violated due to inaccurate statistical tests on finite datasets. My first contribution is a modification of the traditional approaches to achieving greater robustness in the light of these faults. Secondly, I present a new algorithm to infer the parent set of a variable when a specific type of experiment called a `hard intervention\u27 is performed. I also present an auxiliary result of this effort, a fast algorithm to estimate the Kullback Leibler divergence of high dimensional distributions from datasets. Thirdly, I introduce a fast heuristic algorithm to optimize the number and sequence of experiments required towards complete causal discovery for different classes of causal graphs and provide suggestions for implementing an interactive version. Finally, I provide numerical simulation results for each algorithm discussed and present some directions for future research

    Improved Scalability and Accuracy of Bayesian Network Structure Learning in the Score-and-Search Paradigm

    Get PDF
    A Bayesian network is a probabilistic graphical model that consists of a directed acyclic graph (DAG), where each node is a random variable and attached to each node is a conditional probability distribution (CPD). A Bayesian network (BN) can either be constructed by a domain expert or learned automatically from data using the well-known score-and-search approach, a form of unsupervised machine learning. Our interest here is in BNs as a knowledge discovery or data analysis tool, where the BN is learned automatically from data and the resulting BN is then studied for the insights that it provides on the domain such as possible cause-effect relationships, probabilistic dependencies, and conditional independence relationships. Previous work has shown that the accuracy of a data analysis can be improved by (i) incorporating structured representations of the CPDs into the score-and-search approach for learning the DAG and by (ii) learning a set of DAGs from a dataset, rather than a single DAG, and performing a technique called model averaging to obtain a representative DAG. This thesis focuses on improving the accuracy of the score-and-search approach for learning a BN and in scaling the approach to datasets with larger numbers of random variables. We introduce a novel model averaging approach to learning a BN motivated by performance guarantees in approximation algorithms. Our approach considers all optimal and all near-optimal networks for model averaging. We provide pruning rules that retain optimality while enabling our approach to scale to BNs significantly larger than the current state of the art. We extend our model averaging approach to simultaneously learn the DAG and the local structure of the CPDs in the form of a noisy-OR representation. We provide an effective gradient descent algorithm to score a candidate noisy-OR using the widely used BIC score and we provide pruning rules that allow the search to successfully scale to medium sized networks. Our empirical results provide evidence for the success of our approach to learning Bayesian networks that incorporate noisy-OR relations. We also extend our model averaging approach to simultaneously learn the DAG and the local structure of the CPD using neural networks representations. Our approach compares favourably with approaches like decision trees, and performs well in instances with low amounts of data. Finally, we introduce a score-and-search approach to simultaneously learn a DAG and model linear and non-linear local probabilistic relationships between variables using multivariate adaptive regression splines (MARS). MARS are polynomial regression models represented as piecewise spline functions. We show on a set of discrete and continuous benchmark instances that our proposed approach can improve the accuracy of the learned graph while scaling to instances with over 1,000 variables
    corecore