471 research outputs found

    Markov chain Monte Carlo for exact inference for diffusions

    Get PDF
    We develop exact Markov chain Monte Carlo methods for discretely-sampled, directly and indirectly observed diffusions. The qualification "exact" refers to the fact that the invariant and limiting distribution of the Markov chains is the posterior distribution of the parameters free of any discretisation error. The class of processes to which our methods directly apply are those which can be simulated using the most general to date exact simulation algorithm. The article introduces various methods to boost the performance of the basic scheme, including reparametrisations and auxiliary Poisson sampling. We contrast both theoretically and empirically how this new approach compares to irreducible high frequency imputation, which is the state-of-the-art alternative for the class of processes we consider, and we uncover intriguing connections. All methods discussed in the article are tested on typical examples.Comment: 23 pages, 6 Figures, 3 Table

    DEUM: a framework for an estimation of distribution algorithm based on Markov random fields.

    Get PDF
    Estimation of Distribution Algorithms (EDAs) belong to the class of population based optimisation algorithms. They are motivated by the idea of discovering and exploiting the interaction between variables in the solution. They estimate a probability distribution from population of solutions, and sample it to generate the next population. Many EDAs use probabilistic graphical modelling techniques for this purpose. In particular, directed graphical models (Bayesian networks) have been widely used in EDA. This thesis proposes an undirected graphical model (Markov Random Field (MRF)) approach to estimate and sample the distribution in EDAs. The interaction between variables in the solution is modelled as an undirected graph and the joint probability of a solution is factorised as a Gibbs distribution. The thesis describes a model of fitness function that approximates the energy in the Gibbs distribution, and shows how this model can be fitted to a population of solutions to estimate the parameters of the MRF. The estimated MRF is then sampled to generate the next population. This approach is applied to estimation of distribution in a general framework of an EDA, called Distribution Estimation using Markov Random Fields (DEUM). The thesis then proposes several variants of DEUM using different sampling techniques and tests their performance on a range of optimisation problems. The results show that, for most of the tested problems, the DEUM algorithms significantly outperform other EDAs, both in terms of number of fitness evaluations and the quality of the solutions found by them. There are two main explanations for the success of DEUM algorithms. Firstly, DEUM builds a model of fitness function to approximate the MRF. This contrasts with other EDAs, which build a model of selected solutions. This allows DEUM to use fitness in variation part of the evolution. Secondly, DEUM exploits the temperature coefficient in the Gibbs distribution to regulate the behaviour of the algorithm. In particular, with higher temperature, the distribution is closer to being uniform and with lower temperature it concentrates near some global optima. This gives DEUM an explicit control over the convergence of the algorithm, resulting in better optimisation

    New methods for generating populations in Markov network based EDAs: Decimation strategies and model-based template recombination

    Get PDF
    Methods for generating a new population are a fundamental component of estimation of distribution algorithms (EDAs). They serve to transfer the information contained in the probabilistic model to the new generated population. In EDAs based on Markov networks, methods for generating new populations usually discard information contained in the model to gain in efficiency. Other methods like Gibbs sampling use information about all interactions in the model but are computationally very costly. In this paper we propose new methods for generating new solutions in EDAs based on Markov networks. We introduce approaches based on inference methods for computing the most probable configurations and model-based template recombination. We show that the application of different variants of inference methods can increase the EDAs’ convergence rate and reduce the number of function evaluations needed to find the optimum of binary and non-binary discrete functions

    The IBMAP approach for Markov networks structure learning

    Full text link
    In this work we consider the problem of learning the structure of Markov networks from data. We present an approach for tackling this problem called IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC algorithm, designed for avoiding important limitations of existing independence-based algorithms. These algorithms proceed by performing statistical independence tests on data, trusting completely the outcome of each test. In practice tests may be incorrect, resulting in potential cascading errors and the consequent reduction in the quality of the structures learned. IBMAP contemplates this uncertainty in the outcome of the tests through a probabilistic maximum-a-posteriori approach. The approach is instantiated in the IBMAP-HC algorithm, a structure selection strategy that performs a polynomial heuristic local search in the space of possible structures. We present an extensive empirical evaluation on synthetic and real data, showing that our algorithm outperforms significantly the current independence-based algorithms, in terms of data efficiency and quality of learned structures, with equivalent computational complexities. We also show the performance of IBMAP-HC in a real-world application of knowledge discovery: EDAs, which are evolutionary algorithms that use structure learning on each generation for modeling the distribution of populations. The experiments show that when IBMAP-HC is used to learn the structure, EDAs improve the convergence to the optimum

    A review on probabilistic graphical models in evolutionary computation

    Get PDF
    Thanks to their inherent properties, probabilistic graphical models are one of the prime candidates for machine learning and decision making tasks especially in uncertain domains. Their capabilities, like representation, inference and learning, if used effectively, can greatly help to build intelligent systems that are able to act accordingly in different problem domains. Evolutionary algorithms is one such discipline that has employed probabilistic graphical models to improve the search for optimal solutions in complex problems. This paper shows how probabilistic graphical models have been used in evolutionary algorithms to improve their performance in solving complex problems. Specifically, we give a survey of probabilistic model building-based evolutionary algorithms, called estimation of distribution algorithms, and compare different methods for probabilistic modeling in these algorithms

    Solving the Ising spin glass problem using a bivariate EDA based on Markov random fields.

    Get PDF
    Markov Random Field (MRF) modelling techniques have been recently proposed as a novel approach to probabilistic modelling for Estimation of Distribution Algorithms (EDAs). An EDA using this technique was called Distribution Estimation using Markov Random Fields (DEUM). DEUM was later extended to DEUMd. DEUM and DEUMd use a univariate model of probability distribution, and have been shown to perform better than other univariate EDAs for a range of optimization problems. This paper extends DEUM to use a bivariate model and applies it to the Ising spin glass problems. We propose two variants of DEUM that use different sampling techniques. Our experimental result show a noticeable gain in performance

    Estimation of Distribution Algorithms and Minimum Relative Entropy

    Get PDF
    In the field of optimization using probabilistic models of the search space, this thesis identifies and elaborates several advancements in which the principles of maximum entropy and minimum relative entropy from information theory are used to estimate a probability distribution. The probability distribution within the search space is represented by a graphical model (factorization, Bayesian network or junction tree). An estimation of distribution algorithm (EDA) is an evolutionary optimization algorithm which uses a graphical model to sample a population within the search space and then estimates a new graphical model from the selected individuals of the population. - So far, the Factorized Distribution Algorithm (FDA) builds a factorization or Bayesian network from a given additive structure of the objective function to be optimized using a greedy algorithm which only considers a subset of the variable dependencies. Important connections can be lost by this method. This thesis presents a heuristic subfunction merge algorithm which is able to consider all dependencies between the variables (as long as the marginal distributions of the model do not become too large). On a 2-D grid structure, this algorithm builds a pentavariate factorization which allows to solve the deceptive grid benchmark problem with a much smaller population size than the conventional factorization. Especially for small population sizes, calculating large marginal distributions from smaller ones using Maximum Entropy and iterative proportional fitting leads to a further improvement. - The second topic is the generalization of graphical models to loopy structures. Using the Bethe-Kikuchi approximation, the loopy graphical model (region graph) can learn the Boltzmann distribution of an objective function by a generalized belief propagation algorithm (GBP). It minimizes the free energy, a notion adopted from statistical physics which is equivalent to the relative entropy to the Boltzmann distribution. Previous attempts to combine the Kikuchi approximation with EDA have relied on an expensive Gibbs sampling procedure for generating a population from this loopy probabilistic model. In this thesis a combination with a factorization is presented which allows more efficient sampling. The free energy is generalized to incorporate the inverse temperature ß. The factorization building algorithm mentioned above can be employed here, too. The dynamics of GBP is investigated, and the method is applied on Ising spin glass ground state search. Small instances (7 x 7) are solved without difficulty. Larger instances (10 x 10 and 15 x 15) do not converge to the true optimum with large ß, but sampling from the factorization can find the optimum with about 1000-10000 sampling attempts, depending on the instance. If GBP does not converge, it can be replaced by a concave-convex procedure which guarantees convergence. - Third, if no probabilistic structure is given for the objective function, a Bayesian network can be learned to capture the dependencies in the population. The relative entropy between the population-induced distribution and the Bayesian network distribution is equivalent to the log-likelihood of the model. The log-likelihood has been generalized to the BIC/MDL score which reduces overfitting by punishing complicated structure of the Bayesian network. A previous information theoretic analysis of BIC/MDL in the context of EDA is continued, and empiric evidence is given that the method is able to learn the correct structure of an objective function, given a sufficiently large population. - Finally, a way to reduce the search space of EDA is presented by combining it with a local search heuristics. The Kernighan Lin hillclimber, known originally for the traveling salesman problem and graph bipartitioning, is generalized to arbitrary binary problems. It can be applied in a stand-alone manner, as an iterative 1+1 search algorithm, or combined with EDA. On the MAXSAT problem it performs in a similar scale to the specialized SAT solver Walksat. An analysis of the Kernighan Lin local optima indicates that the combination with an EDA is favorable. The thesis shows how evolutionary optimization can be improved using interdisciplinary results from information theory, statistics, probability calculus and statistical physics. The principles of information theory for estimating probability distributions are applicable in many areas. EDAs are a good application because an improved estimation affects directly the optimization success.Estimation of Distribution Algorithms und Minimierung der relativen Entropie Im Bereich der Optimierung mit probabilistischen Modellen des Suchraums werden einige Fortschritte identifiziert und herausgearbeitet, in denen die Prinzipien der maximalen Entropie und der minimalen relativen Entropie aus der Informationstheorie verwendet werden, um eine Wahrscheinlichkeitsverteilung zu schätzen. Die Wahrscheinlichkeitsverteilung im Suchraum wird durch ein graphisches Modell beschrieben (Faktorisierung, Bayessches Netz oder Verbindungsbaum). Ein Estimation of Distribution Algorithm (EDA) ist ein evolutionärer Optimierungsalgorithmus, der mit Hilfe eines graphischen Modells eine Population im Suchraum erzeugt und dann anhand der selektierten Individuen dieser Population ein neues graphisches Modell erzeugt. - Bislang baut der Factorized Distribution Algorithm (FDA) eine Faktorisierung oder ein Bayessches Netz aus einer gegebenen additiven Struktur der Zielfunktion durch einen Greedy-Algorithmus, der nur einen Teil der Verbindungen zwischen den Variablen berücksichtigt. Wichtige verbindungen können durch diese Methode verloren gehen. Diese Arbeit stellt einen heuristischen Subfunktionenverschmelzungsalgorithmus vor, der in der Lage ist, alle Abhängigkeiten zwischen den Variablen zu berücksichtigen (wofern die Randverteilungen des Modells nicht zu groß werden). Auf einem 2D-Gitter erzeugt dieser Algorithmus eine pentavariate Faktorisierung, die es ermöglicht, das Deceptive-Grid-Testproblem mit viel kleinerer Populationsgröße zu lösen als mit der konventionellen Faktorisierung. Insbesondere für kleine Populationsgrößen kann das Ergebnis noch verbessert werden, wenn große Randverteilungen aus kleineren vermittels des Prinzips der maximalen Entropie und des Iterative Proportional Fitting- Algorithmus berechnet werden. - Das zweite Thema ist die Verallgemeinerung graphischer Modelle zu zirkulären Strukturen. Mit der Bethe-Kikuchi-Approximation kann das zirkuläre graphische Modell (der Regionen-Graph) die Boltzmannverteilung einer Zielfunktion durch einen generalisierten Belief Propagation-Algorithmus (GBP) lernen. Er minimiert die freie Energie, eine Größe aus der statistischen Physik, die äquivalent zur relativen Entropie zur Boltzmannverteilung ist. Frühere Versuche, die Kikuchi-Approximation mit EDA zu verbinden, benutzen einen aufwendigen Gibbs-Sampling-Algorithmus, um eine Population aus dem zirkulären Wahrscheinlichkeitsmodell zu erzeugen. In dieser Arbeit wird eine Verbindung mit Faktorisierungen vorgestellt, die effizienteres Sampling erlaubt. Die freie Energie wird um die inverse Temperatur ß erweitert. Der oben erwähnte Algorithmus zur Erzeugung einer Faktorisierung kann auch hier angewendet werden. Die Dynamik von GBP wird untersucht und auf Ising-Modelle angewendet. Kleine Probleme (7 x 7) werden ohne Schwierigkeit gelöst. Größere Probleme (10 x 10 und 15 x 15) konvergieren mit großem ß nicht mehr zum wahren Optimum, aber durch Sampling von der Faktorisierung kann das Optimum bei einer Samplegröße von 1000 bis 10000, je nach Probleminstanz, gefunden werden. Wenn GBP nicht konvergiert, kann es durch eine Konkav-Konvex-Prozedur ersetzt werden, die Konvergenz garantiert. - Drittens kann, wenn für die Zielfunktion keine Struktur gegeben ist, ein Bayessches Netz gelernt werden, um die Abhängigkeiten in der Population zu erfassen. Die relative Entropie zwischen der Populationsverteilung und der Verteilung durch das Bayessche Netz ist äquivalent zur Log-Likelihood des Modells. Diese wurde erweitert zum BIC/MDL-Kriterium, das Überanpassung lindert, indem komplizierte Strukturen bestraft werden. Eine vorangegangene informationstheoretische Analyse von BIC/MDL im EDA-Bereich wird erweitert, und empirisch wird belegt, daß die Methode die korrekte Struktur einer Zielfunktion bei genügend großer Population lernen kann. - Schließlich wird vorgestellt, wie durch eine lokale Suchheuristik der Suchraum von EDA reduziert werden kann. Der Kernighan-Lin-Hillclimber, der ursprünglich für das Problem des Handlungsreisenden und Graphen-Bipartitionierung konzipiert ist, wird für beliebige binäre Probleme erweitert. Er kann allein angewandt werden, als iteratives 1+1-Suchverfahren, oder in Kombination mit EDA. Er löst das MAXSAT-Problem in ähnlicher Größenordnung wie der spezialisierte Hillclimber Walksat. Eine Analyse der lokalen Optima von Kernighan-Lin zeigt, daß die Kombination mit EDA vorteilhaft ist. Die Arbeit zeigt, wie evolutionäre Optimierung verbessert werden kann, indem interdisziplinäre Ergebnisse aus Informationstheorie, Statistik, Wahrscheinlichkeitsrechnung und statistischer Physik eingebracht werden. Die Prinzipien der Informationstheorie zur Schätzung von Wahrscheinlichkeitsverteilungen lassen sich in vielen Bereichen anwenden. EDAs sind eine gute Anwendung, denn eine verbesserte Schätzung beeinflußt direkt den Optimierungserfolg
    corecore