1,499 research outputs found

    Methods for Learning Directed and Undirected Graphical Models

    Get PDF
    Probabilistic graphical models provide a general framework for modeling relationships between multiple random variables. The main tool in this framework is a mathematical object called graph which visualizes the assertions of conditional independence between the variables. This thesis investigates methods for learning these graphs from observational data. Regarding undirected graphical models, we propose a new scoring criterion for learning a dependence structure of a Gaussian graphical model. The scoring criterion is derived as an approximation to often intractable Bayesian marginal likelihood. We prove that the scoring criterion is consistent and demonstrate its applicability to high-dimensional problems when combined with an efficient search algorithm. Secondly, we present a non-parametric method for learning undirected graphs from continuous data. The method combines a conditional mutual information estimator with a permutation test in order to perform conditional independence testing without assuming any specific parametric distributions for the involved random variables. Accompanying this test with a constraint-based structure learning algorithm creates a method which performs well in numerical experiments when the data generating mechanisms involve non-linearities. For directed graphical models, we propose a new scoring criterion for learning Bayesian network structures from discrete data. The criterion approximates a hard-to-compute quantity called the normalized maximum likelihood. We study the theoretical properties of the score and compare it experimentally to popular alternatives. Experiments show that the proposed criterion provides a robust and safe choice for structure learning and prediction over a wide variety of different settings. Finally, as an application of directed graphical models, we derive a closed form expression for Bayesian network Fisher kernel. This provides us with a similarity measure over discrete data vectors, capable of taking into account the dependence structure between the components. We illustrate the similarity measured by this kernel with an example where we use it to seek sets of observations that are important and representative of the underlying Bayesian network model.Graafiset todennäköisyysmallit ovat yleispätevä tapa mallintaa yhteyksiä usean satunnaismuuttujan välillä. Keskeinen työkalu näissä malleissa on verkko, eli graafi, jolla voidaan visuaalisesti esittää muuttujien välinen riippuvuusrakenne. Tämä väitöskirja käsittelee erilaisia menetelmiä suuntaamattomien ja suunnattujen verkkojen oppimiseen havaitusta aineistosta. Liittyen suuntaamattomiin verkkoihin, tässä työssä esitellään kaksi erilaisiin tilanteisiin soveltuvaa menetelmää verkkojen rakenteen oppimiseen. Ensiksi esitellään mallinvalintakriteeri, jolla voidaan oppia verkkojen rakenteita muuttujien ollessa normaalijakautuneita. Kriteeri johdetaan approksimaationa usein laskennallisesti vaativalle bayesiläiselle marginaaliuskottavuudelle (marginal likelihood). Työssä tutkitaan kriteerin teoreettisia ominaisuuksia ja näytetään kokeellisesti, että se toimii hyvin tilanteissa, joissa muuttujien määrä on suuri. Toinen esiteltävä menetelmä on ei-parametrinen, tarkoittaen karkeasti, että emme tarvitse tarkkoja oletuksia syötemuuttujien jakaumasta. Menetelmä käyttää hyväkseen aineistosta estimoitavia informaatioteoreettisia suureita sekä permutaatiotestiä. Kokeelliset tulokset osoittavat, että menetelmä toimii hyvin, kun riippuvuudet syöteaineiston muuttujien välillä ovat epälineaarisia. Väitöskirjan toinen osa käsittelee Bayes-verkkoja, jotka ovat suunnattuja graafisia malleja. Työssä esitellään uusi mallinvalintakriteeri Bayes-verkkojen oppimiseen diskreeteille muuttujille. Tätä kriteeriä tutkitaan teoreettisesti sekä verrataan kokeellisesti muihin yleisesti käytettyihin mallinvalintakriteereihin. Väitöskirjassa esitellään viimeisenä sovellus suunnatuille graafisille malleille johtamalla Bayes-verkkoon pohjautuva Fisher-ydin (Fisher kernel). Saatua Fisher-ydintä voidaan käyttää mittaamaan datavektoreiden samankaltaisuutta ottaen huomioon riippuvuudet vektoreiden komponenttien välillä, mitä havainnollistetaan kokeellisesti

    High dimensional Sparse Gaussian Graphical Mixture Model

    Full text link
    This paper considers the problem of networks reconstruction from heterogeneous data using a Gaussian Graphical Mixture Model (GGMM). It is well known that parameter estimation in this context is challenging due to large numbers of variables coupled with the degeneracy of the likelihood. We propose as a solution a penalized maximum likelihood technique by imposing an l1l_{1} penalty on the precision matrix. Our approach shrinks the parameters thereby resulting in better identifiability and variable selection. We use the Expectation Maximization (EM) algorithm which involves the graphical LASSO to estimate the mixing coefficients and the precision matrices. We show that under certain regularity conditions the Penalized Maximum Likelihood (PML) estimates are consistent. We demonstrate the performance of the PML estimator through simulations and we show the utility of our method for high dimensional data analysis in a genomic application

    Approximate learning of high dimensional Bayesian network structures via pruning of Candidate Parent Sets.

    Get PDF
    Score-based algorithms that learn Bayesian Network (BN) structures provide solutions ranging from different levels of approximate learning to exact learning. Approximate solutions exist because exact learning is generally not applicable to networks of moderate or higher complexity. In general, approximate solutions tend to sacrifice accuracy for speed, where the aim is to minimise the loss in accuracy and maximise the gain in speed. While some approximate algorithms are optimised to handle thousands of variables, these algorithms may still be unable to learn such high dimensional structures. Some of the most efficient score-based algorithms cast the structure learning problem as a combinatorial optimisation of candidate parent sets. This paper explores a strategy towards pruning the size of candidate parent sets, aimed at high dimensionality problems. The results illustrate how different levels of pruning affect the learning speed relative to the loss in accuracy in terms of model fitting, and show that aggressive pruning may be required to produce approximate solutions for high complexity problems

    The Intersection-Validation Method for Evaluating Bayesian Network Structure Learning Without Ground Truth

    Get PDF
    Structure learning algorithms for Bayesian networks are typically evaluated by examining how accurately they recover the correct structure, given data sampled from a benchmark network. A popular metric for the evaluation is the structural Hamming distance. For real-world data there is no ground truth to compare the learned structures against. Thus, to use such data, one has been limited to evaluating the algorithms' predictive performance on separate test data or via cross-validation. The predictive performance, however, depends on the parameters of the network, for which some fixed values can be used or which can be marginalized over to obtain the posterior predictive distribution using some parameter prior. Predictive performance therefore has an intricate relationship to structural accuracy -- the two do not always perfectly mirror each other. We present intersection-validation, a method for evaluating structure learning without ground truth. The input to the method is a dataset and a set of compared algorithms. First, a partial structure, called the agreement graph, is constructed consisting of the features that the algorithms agree on given the dataset. Then, the algorithms are evaluated against the agreement graph on subsamples of the data, using a variant of the structural Hamming distance. To test the method's validity we define a set of algorithms that return a score maximizing structure using various scoring functions in combination with an exact search algorithm. Given data sampled from benchmark networks, we compare the results of the method to those obtained through direct evaluation against the ground truth structure. Specifically, we consider whether the rankings for the algorithms determined by the distances measured using the two methods conform with each other, and whether there is a strong positive correlation between the two distances. We find that across the experiments the method gives a correct ranking for two algorithms (relative to each other) with an accuracy of approximately 0.9, including when the method is applied onto a set of only two algorithms. The Pearson correlations between the distances are fairly strong but vary to a great extent, depending on the benchmark network, the amount of data given as input to intersection-validation and the sample size at which the distances are measured. We also attempt to predict when the method produces accurate results from information available in situations where the method would be used in practice, namely, without knowledge of the ground truth. The results from these experiments indicate that although some predictors can be found they do not have the same strength in all instances of use of the method. Finally, to illustrate the uses for the method we apply it on a number of real-world datasets in order to study the effect of structure priors on learning

    Minimum Description Length Revisited

    Get PDF
    This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective

    Greedy structure learning from data that contains systematic missing values

    Get PDF
    Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random

    Bayesian stochastic blockmodels for community detection in networks and community-structured covariance selection

    Full text link
    Networks have been widely used to describe interactions among objects in diverse fields. Given the interest in explaining a network by its structure, much attention has been drawn to finding clusters of nodes with dense connections within clusters but sparse connections between clusters. Such clusters are called communities, and identifying such clusters is known as community detection. Here, to perform community detection, I focus on stochastic blockmodels (SBM), a class of statistically-based generative models. I present a flexible SBM that represents different types of data as well as node attributes under a Bayesian framework. The proposed models explicitly capture community behavior by guaranteeing that connections are denser within communities than between communities. First, I present a degree-corrected SBM based on a logistic regression formulation to model binary networks. To fit the model, I obtain posterior samples via Gibbs sampling based on Polya-Gamma latent variables. I conduct inference based on a novel, canonically mapped centroid estimator that formally addresses label non-identifiability and captures representative community assignments. Next, to accommodate large-scale datasets, I further extend the degree-corrected SBM to a broader family of generalized linear models with group correction terms. To conduct exact inference efficiently, I develop an iteratively-reweighted least squares procedure that implicitly updates sufficient statistics on the network to obtain maximum a posteriori (MAP) estimators. I demonstrate the proposed model and estimation on simulated benchmark networks and various real-world datasets. Finally, I develop a Bayesian SBM for community-structured covariance selection. Here, I assume that the data at each node are Gaussian and a latent network where two nodes are not connected if their observations are conditionally independent given observations of other nodes. Under the context of biological and social applications, I expect that this latent network shows a block dependency structure that represents community behavior. Thus, to identify the latent network and detect communities, I propose a hierarchical prior in two levels: a spike-and-slab prior on off-diagonal entries of the concentration matrix for variable selection and a degree-corrected SBM to capture community behavior. I develop an efficient routine based on ridge regularization and MAP estimation to conduct inference

    Minimum description length revisited

    Get PDF
    This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of MDL estimators. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC versus BIC and cross-validation versus Bayes can, to a large extent, be viewed from a unified perspective.Peer reviewe
    corecore