225 research outputs found

    Competent Program Evolution, Doctoral Dissertation, December 2006

    Get PDF
    Heuristic optimization methods are adaptive when they sample problem solutions based on knowledge of the search space gathered from past sampling. Recently, competent evolutionary optimization methods have been developed that adapt via probabilistic modeling of the search space. However, their effectiveness requires the existence of a compact problem decomposition in terms of prespecified solution parameters. How can we use these techniques to effectively and reliably solve program learning problems, given that program spaces will rarely have compact decompositions? One method is to manually build a problem-specific representation that is more tractable than the general space. But can this process be automated? My thesis is that the properties of programs and program spaces can be leveraged as inductive bias to reduce the burden of manual representation-building, leading to competent program evolution. The central contributions of this dissertation are a synthesis of the requirements for competent program evolution, and the design of a procedure, meta-optimizing semantic evolutionary search (MOSES), that meets these requirements. In support of my thesis, experimental results are provided to analyze and verify the effectiveness of MOSES, demonstrating scalability and real-world applicability

    Methods for Learning Directed and Undirected Graphical Models

    Get PDF
    Probabilistic graphical models provide a general framework for modeling relationships between multiple random variables. The main tool in this framework is a mathematical object called graph which visualizes the assertions of conditional independence between the variables. This thesis investigates methods for learning these graphs from observational data. Regarding undirected graphical models, we propose a new scoring criterion for learning a dependence structure of a Gaussian graphical model. The scoring criterion is derived as an approximation to often intractable Bayesian marginal likelihood. We prove that the scoring criterion is consistent and demonstrate its applicability to high-dimensional problems when combined with an efficient search algorithm. Secondly, we present a non-parametric method for learning undirected graphs from continuous data. The method combines a conditional mutual information estimator with a permutation test in order to perform conditional independence testing without assuming any specific parametric distributions for the involved random variables. Accompanying this test with a constraint-based structure learning algorithm creates a method which performs well in numerical experiments when the data generating mechanisms involve non-linearities. For directed graphical models, we propose a new scoring criterion for learning Bayesian network structures from discrete data. The criterion approximates a hard-to-compute quantity called the normalized maximum likelihood. We study the theoretical properties of the score and compare it experimentally to popular alternatives. Experiments show that the proposed criterion provides a robust and safe choice for structure learning and prediction over a wide variety of different settings. Finally, as an application of directed graphical models, we derive a closed form expression for Bayesian network Fisher kernel. This provides us with a similarity measure over discrete data vectors, capable of taking into account the dependence structure between the components. We illustrate the similarity measured by this kernel with an example where we use it to seek sets of observations that are important and representative of the underlying Bayesian network model.Graafiset todennäköisyysmallit ovat yleispätevä tapa mallintaa yhteyksiä usean satunnaismuuttujan välillä. Keskeinen työkalu näissä malleissa on verkko, eli graafi, jolla voidaan visuaalisesti esittää muuttujien välinen riippuvuusrakenne. Tämä väitöskirja käsittelee erilaisia menetelmiä suuntaamattomien ja suunnattujen verkkojen oppimiseen havaitusta aineistosta. Liittyen suuntaamattomiin verkkoihin, tässä työssä esitellään kaksi erilaisiin tilanteisiin soveltuvaa menetelmää verkkojen rakenteen oppimiseen. Ensiksi esitellään mallinvalintakriteeri, jolla voidaan oppia verkkojen rakenteita muuttujien ollessa normaalijakautuneita. Kriteeri johdetaan approksimaationa usein laskennallisesti vaativalle bayesiläiselle marginaaliuskottavuudelle (marginal likelihood). Työssä tutkitaan kriteerin teoreettisia ominaisuuksia ja näytetään kokeellisesti, että se toimii hyvin tilanteissa, joissa muuttujien määrä on suuri. Toinen esiteltävä menetelmä on ei-parametrinen, tarkoittaen karkeasti, että emme tarvitse tarkkoja oletuksia syötemuuttujien jakaumasta. Menetelmä käyttää hyväkseen aineistosta estimoitavia informaatioteoreettisia suureita sekä permutaatiotestiä. Kokeelliset tulokset osoittavat, että menetelmä toimii hyvin, kun riippuvuudet syöteaineiston muuttujien välillä ovat epälineaarisia. Väitöskirjan toinen osa käsittelee Bayes-verkkoja, jotka ovat suunnattuja graafisia malleja. Työssä esitellään uusi mallinvalintakriteeri Bayes-verkkojen oppimiseen diskreeteille muuttujille. Tätä kriteeriä tutkitaan teoreettisesti sekä verrataan kokeellisesti muihin yleisesti käytettyihin mallinvalintakriteereihin. Väitöskirjassa esitellään viimeisenä sovellus suunnatuille graafisille malleille johtamalla Bayes-verkkoon pohjautuva Fisher-ydin (Fisher kernel). Saatua Fisher-ydintä voidaan käyttää mittaamaan datavektoreiden samankaltaisuutta ottaen huomioon riippuvuudet vektoreiden komponenttien välillä, mitä havainnollistetaan kokeellisesti

    Influence modelling and learning between dynamic bayesian networks using score-based structure learning

    Get PDF
    A Ph.D. thesis submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science May 2018Although partially observable stochastic processes are ubiquitous in many fields of science, little work has been devoted to discovering and analysing the means by which several such processes may interact to influence each other. In this thesis we extend probabilistic structure learning between random variables to the context of temporal models which represent partially observable stochastic processes. Learning an influence structure and distribution between processes can be useful for density estimation and knowledge discovery. A common approach to structure learning, in observable data, is score-based structure learning, where we search for the most suitable structure by using a scoring metric to value structural configurations relative to the data. Most popular structure scores are variations on the likelihood score which calculates the probability of the data given a potential structure. In observable data, the decomposability of the likelihood score, which is the ability to represent the score as a sum of family scores, allows for efficient learning procedures and significant computational saving. However, in incomplete data (either by latent variables or missing samples), the likelihood score is not decomposable and we have to perform inference to evaluate it. This forces us to use non-linear optimisation techniques to optimise the likelihood function. Furthermore, local changes to the network can affect other parts of the network, which makes learning with incomplete data all the more difficult. We define two general types of influence scenarios: direct influence and delayed influence which can be used to define influence around richly structured spaces; consisting of multiple processes that are interrelated in various ways. We will see that although it is possible to capture both types of influence in a single complex model by using a setting of the parameters, complex representations run into fragmentation issues. This is handled by extending the language of dynamic Bayesian networks to allow us to construct single compact models that capture the properties of a system’s dynamics, and produce influence distributions dynamically. The novelty and intuition of our approach is to learn the optimal influence structure in layers. We firstly learn a set of independent temporal models, and thereafter, optimise a structure score over possible structural configurations between these temporal models. Since the search for the optimal structure is done using complete data we can take advantage of efficient learning procedures from the structure learning literature. We provide the following contributions: we (a) introduce the notion of influence between temporal models; (b) extend traditional structure scores for random variables to structure scores for temporal models; (c) provide a complete algorithm to recover the influence structure between temporal models; (d) provide a notion of structural assembles to relate temporal models for types of influence; and finally, (e) provide empirical evidence for the effectiveness of our method with respect to generative ground-truth distributions. The presented results emphasise the trade-off between likelihood of an influence structure to the ground-truth and the computational complexity to express it. Depending on the availability of samples we might choose different learning methods to express influence relations between processes. On one hand, when given too few samples, we may choose to learn a sparse structure using tree-based structure learning or even using no influence structure at all. On the other hand, when given an abundant number of samples, we can use penalty-based procedures that achieve rich meaningful representations using local search techniques. Once we consider high-level representations of dynamic influence between temporal models, we open the door to very rich and expressive representations which emphasise the importance of knowledge discovery and density estimation in the temporal setting.MT 201

    The Intersection-Validation Method for Evaluating Bayesian Network Structure Learning Without Ground Truth

    Get PDF
    Structure learning algorithms for Bayesian networks are typically evaluated by examining how accurately they recover the correct structure, given data sampled from a benchmark network. A popular metric for the evaluation is the structural Hamming distance. For real-world data there is no ground truth to compare the learned structures against. Thus, to use such data, one has been limited to evaluating the algorithms' predictive performance on separate test data or via cross-validation. The predictive performance, however, depends on the parameters of the network, for which some fixed values can be used or which can be marginalized over to obtain the posterior predictive distribution using some parameter prior. Predictive performance therefore has an intricate relationship to structural accuracy -- the two do not always perfectly mirror each other. We present intersection-validation, a method for evaluating structure learning without ground truth. The input to the method is a dataset and a set of compared algorithms. First, a partial structure, called the agreement graph, is constructed consisting of the features that the algorithms agree on given the dataset. Then, the algorithms are evaluated against the agreement graph on subsamples of the data, using a variant of the structural Hamming distance. To test the method's validity we define a set of algorithms that return a score maximizing structure using various scoring functions in combination with an exact search algorithm. Given data sampled from benchmark networks, we compare the results of the method to those obtained through direct evaluation against the ground truth structure. Specifically, we consider whether the rankings for the algorithms determined by the distances measured using the two methods conform with each other, and whether there is a strong positive correlation between the two distances. We find that across the experiments the method gives a correct ranking for two algorithms (relative to each other) with an accuracy of approximately 0.9, including when the method is applied onto a set of only two algorithms. The Pearson correlations between the distances are fairly strong but vary to a great extent, depending on the benchmark network, the amount of data given as input to intersection-validation and the sample size at which the distances are measured. We also attempt to predict when the method produces accurate results from information available in situations where the method would be used in practice, namely, without knowledge of the ground truth. The results from these experiments indicate that although some predictors can be found they do not have the same strength in all instances of use of the method. Finally, to illustrate the uses for the method we apply it on a number of real-world datasets in order to study the effect of structure priors on learning

    Structural learning of bayesian networks using statistical constraints

    Get PDF
    Bayesian Networks are probabilistic graphical models that encode in a compact manner the conditional probabilistic relations over a set of random variables. In this thesis we address the NP-complete problem of learning the structure of a Bayesian Network from observed data. We first present two algorithms from the state of the art: the Max Min Parent Children Algorithm (MMPC), Tsamardinos et al. , which uses statistical tests of independence to restrict the search space for a simple local search algorithm, and a recent complete Branch and Bound technique, de Campos and Ji . We propose in the thesis a novel hybrid algorithm, which uses the constraints given by the MMPC algorithm for reducing the size of the search space of the complete B&B algorithm. Two different statistical tests of independence were implemented: the simple asymptotic test from Tsamardinos et al. and a permutation-based test, more recently proposed by Tsamardinos and Borboudakis . We tested the different techniques for three well known Bayesian Networks in a realistic scenario, with limited memory and data sets with small sample size. Our results are promising and show that the hybrid algorithm exhibits a minimal loss in score, against a considerable gain in computational time, with respect to the original Branch and Bound algorithm, and that none of the two independence tests consistently dominates the other in terms of computational time gain. Le Reti Bayesiane sono modelli grafici probabilistici che cofificano in maniera compatta le relazioni di dipendenza su di un insieme di variabili aleatorie. In questa tesi ci occupiamo del problema NP-Completo di apprendere la struttura di una rete da dei dati osservati. Descriviamo per prima cosa due algoritmi dallo stato dell’arte: l’algoritmo Max Min Parent Children (MMPC) (Tsamardinos et al. [3]) che usa test statistici di indipendenza per restringere lo spazio di ricerca per un semplice algoritmo di ricerca locale, e un algoritmo completo basato su Branch and Bound (de Campos e Ji ). In questa tesi proponiamo un nuovo algoritmo ibrido che usa i vincoli creati da MMPC per ridurre lo spazio di ricerca per l’algoritmo completo B&B. Due diversi test sono stati implementati: un semplice test asintotico da Tsamardinos et al. e un test basato su permutazioni più recentemente proposto da Tsamardinos e Borboudakis. Abbiamo sperimentato le diverse tecniche per tre reti ben conosciute in uno scenario realistico, con memoria limitati e scarsità di dati. I nostri risultati sono promettenti e mostrano una minima perdita nella qualità a fronte di un considerevole guadagno in tempo computazionale, e nessuno dei due test ha dominato l’altro in termini di guadagno di temp

    Non-stationary continuous dynamic Bayesian networks

    Get PDF
    corecore