880 research outputs found

    A quantitative analysis of estimation of distribution algorithms based on Bayesian networks

    Get PDF
    The successful application of estimation of distribution algorithms (EDAs) to solve different kinds of problems has reinforced their candidature as promising black-box optimization tools. However, their internal behavior is still not completely understood and therefore it is necessary to work in this direction in order to advance their development. This paper presents a new methodology of analysis which provides new information about the behavior of EDAs by quantitatively analyzing the probabilistic models learned during the search. We particularly focus on calculating the probabilities of the optimal solutions, the most probable solution given by the model and the best individual of the population at each step of the algorithm. We carry out the analysis by optimizing functions of different nature such as Trap5, two variants of Ising spin glass and Max-SAT. By using different structures in the probabilistic models, we also analyze the influence of the structural model accuracy in the quantitative behavior of EDAs. In addition, the objective function values of our analyzed key solutions are contrasted with their probability values in order to study the connection between function and probabilistic models. The results not only show information about the EDA behavior, but also about the quality of the optimization process and setup of the parameters, the relationship between the probabilistic model and the fitness function, and even about the problem itself. Furthermore, the results allow us to discover common patterns of behavior in EDAs and propose new ideas in the development of this type of algorithms

    Causal Discovery with Continuous Additive Noise Models

    Get PDF
    We consider the problem of learning causal directed acyclic graphs from an observational joint distribution. One can use these graphs to predict the outcome of interventional experiments, from which data are often not available. We show that if the observational distribution follows a structural equation model with an additive noise structure, the directed acyclic graph becomes identifiable from the distribution under mild conditions. This constitutes an interesting alternative to traditional methods that assume faithfulness and identify only the Markov equivalence class of the graph, thus leaving some edges undirected. We provide practical algorithms for finitely many samples, RESIT (Regression with Subsequent Independence Test) and two methods based on an independence score. We prove that RESIT is correct in the population setting and provide an empirical evaluation

    The role of Walsh structure and ordinal linkage in the optimisation of pseudo-Boolean functions under monotonicity invariance.

    Get PDF
    Optimisation heuristics rely on implicit or explicit assumptions about the structure of the black-box fitness function they optimise. A review of the literature shows that understanding of structure and linkage is helpful to the design and analysis of heuristics. The aim of this thesis is to investigate the role that problem structure plays in heuristic optimisation. Many heuristics use ordinal operators; which are those that are invariant under monotonic transformations of the fitness function. In this thesis we develop a classification of pseudo-Boolean functions based on rank-invariance. This approach classifies functions which are monotonic transformations of one another as equivalent, and so partitions an infinite set of functions into a finite set of classes. Reasoning about heuristics composed of ordinal operators is, by construction, invariant over these classes. We perform a complete analysis of 2-bit and 3-bit pseudo-Boolean functions. We use Walsh analysis to define concepts of necessary, unnecessary, and conditionally necessary interactions, and of Walsh families. This helps to make precise some existing ideas in the literature such as benign interactions. Many algorithms are invariant under the classes we define, which allows us to examine the difficulty of pseudo-Boolean functions in terms of function classes. We analyse a range of ordinal selection operators for an EDA. Using a concept of directed ordinal linkage, we define precedence networks and precedence profiles to represent key algorithmic steps and their interdependency in terms of problem structure. The precedence profiles provide a measure of problem difficulty. This corresponds to problem difficulty and algorithmic steps for optimisation. This work develops insight into the relationship between function structure and problem difficulty for optimisation, which may be used to direct the development of novel algorithms. Concepts of structure are also used to construct easy and hard problems for a hill-climber

    Denoising Autoencoders for fast Combinatorial Black Box Optimization

    Full text link
    Estimation of Distribution Algorithms (EDAs) require flexible probability models that can be efficiently learned and sampled. Autoencoders (AE) are generative stochastic networks with these desired properties. We integrate a special type of AE, the Denoising Autoencoder (DAE), into an EDA and evaluate the performance of DAE-EDA on several combinatorial optimization problems with a single objective. We asses the number of fitness evaluations as well as the required CPU times. We compare the results to the performance to the Bayesian Optimization Algorithm (BOA) and RBM-EDA, another EDA which is based on a generative neural network which has proven competitive with BOA. For the considered problem instances, DAE-EDA is considerably faster than BOA and RBM-EDA, sometimes by orders of magnitude. The number of fitness evaluations is higher than for BOA, but competitive with RBM-EDA. These results show that DAEs can be useful tools for problems with low but non-negligible fitness evaluation costs.Comment: corrected typos and small inconsistencie

    Uncovering latent structure in valued graphs: A variational approach

    Full text link
    As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case. We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host--parasite interaction networks in forest ecosystems.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS361 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An analysis of the local optima storage capacity of Hopfield network based fitness function models

    Get PDF
    A Hopfield Neural Network (HNN) with a new weight update rule can be treated as a second order Estimation of Distribution Algorithm (EDA) or Fitness Function Model (FFM) for solving optimisation problems. The HNN models promising solutions and has a capacity for storing a certain number of local optima as low energy attractors. Solutions are generated by sampling the patterns stored in the attractors. The number of attractors a network can store (its capacity) has an impact on solution diversity and, consequently solution quality. This paper introduces two new HNN learning rules and presents the Hopfield EDA (HEDA), which learns weight values from samples of the fitness function. It investigates the attractor storage capacity of the HEDA and shows it to be equal to that known in the literature for a standard HNN. The relationship between HEDA capacity and linkage order is also investigated

    Estimation of Distribution Algorithms and Minimum Relative Entropy

    Get PDF
    In the field of optimization using probabilistic models of the search space, this thesis identifies and elaborates several advancements in which the principles of maximum entropy and minimum relative entropy from information theory are used to estimate a probability distribution. The probability distribution within the search space is represented by a graphical model (factorization, Bayesian network or junction tree). An estimation of distribution algorithm (EDA) is an evolutionary optimization algorithm which uses a graphical model to sample a population within the search space and then estimates a new graphical model from the selected individuals of the population. - So far, the Factorized Distribution Algorithm (FDA) builds a factorization or Bayesian network from a given additive structure of the objective function to be optimized using a greedy algorithm which only considers a subset of the variable dependencies. Important connections can be lost by this method. This thesis presents a heuristic subfunction merge algorithm which is able to consider all dependencies between the variables (as long as the marginal distributions of the model do not become too large). On a 2-D grid structure, this algorithm builds a pentavariate factorization which allows to solve the deceptive grid benchmark problem with a much smaller population size than the conventional factorization. Especially for small population sizes, calculating large marginal distributions from smaller ones using Maximum Entropy and iterative proportional fitting leads to a further improvement. - The second topic is the generalization of graphical models to loopy structures. Using the Bethe-Kikuchi approximation, the loopy graphical model (region graph) can learn the Boltzmann distribution of an objective function by a generalized belief propagation algorithm (GBP). It minimizes the free energy, a notion adopted from statistical physics which is equivalent to the relative entropy to the Boltzmann distribution. Previous attempts to combine the Kikuchi approximation with EDA have relied on an expensive Gibbs sampling procedure for generating a population from this loopy probabilistic model. In this thesis a combination with a factorization is presented which allows more efficient sampling. The free energy is generalized to incorporate the inverse temperature ß. The factorization building algorithm mentioned above can be employed here, too. The dynamics of GBP is investigated, and the method is applied on Ising spin glass ground state search. Small instances (7 x 7) are solved without difficulty. Larger instances (10 x 10 and 15 x 15) do not converge to the true optimum with large ß, but sampling from the factorization can find the optimum with about 1000-10000 sampling attempts, depending on the instance. If GBP does not converge, it can be replaced by a concave-convex procedure which guarantees convergence. - Third, if no probabilistic structure is given for the objective function, a Bayesian network can be learned to capture the dependencies in the population. The relative entropy between the population-induced distribution and the Bayesian network distribution is equivalent to the log-likelihood of the model. The log-likelihood has been generalized to the BIC/MDL score which reduces overfitting by punishing complicated structure of the Bayesian network. A previous information theoretic analysis of BIC/MDL in the context of EDA is continued, and empiric evidence is given that the method is able to learn the correct structure of an objective function, given a sufficiently large population. - Finally, a way to reduce the search space of EDA is presented by combining it with a local search heuristics. The Kernighan Lin hillclimber, known originally for the traveling salesman problem and graph bipartitioning, is generalized to arbitrary binary problems. It can be applied in a stand-alone manner, as an iterative 1+1 search algorithm, or combined with EDA. On the MAXSAT problem it performs in a similar scale to the specialized SAT solver Walksat. An analysis of the Kernighan Lin local optima indicates that the combination with an EDA is favorable. The thesis shows how evolutionary optimization can be improved using interdisciplinary results from information theory, statistics, probability calculus and statistical physics. The principles of information theory for estimating probability distributions are applicable in many areas. EDAs are a good application because an improved estimation affects directly the optimization success.Estimation of Distribution Algorithms und Minimierung der relativen Entropie Im Bereich der Optimierung mit probabilistischen Modellen des Suchraums werden einige Fortschritte identifiziert und herausgearbeitet, in denen die Prinzipien der maximalen Entropie und der minimalen relativen Entropie aus der Informationstheorie verwendet werden, um eine Wahrscheinlichkeitsverteilung zu schätzen. Die Wahrscheinlichkeitsverteilung im Suchraum wird durch ein graphisches Modell beschrieben (Faktorisierung, Bayessches Netz oder Verbindungsbaum). Ein Estimation of Distribution Algorithm (EDA) ist ein evolutionärer Optimierungsalgorithmus, der mit Hilfe eines graphischen Modells eine Population im Suchraum erzeugt und dann anhand der selektierten Individuen dieser Population ein neues graphisches Modell erzeugt. - Bislang baut der Factorized Distribution Algorithm (FDA) eine Faktorisierung oder ein Bayessches Netz aus einer gegebenen additiven Struktur der Zielfunktion durch einen Greedy-Algorithmus, der nur einen Teil der Verbindungen zwischen den Variablen berücksichtigt. Wichtige verbindungen können durch diese Methode verloren gehen. Diese Arbeit stellt einen heuristischen Subfunktionenverschmelzungsalgorithmus vor, der in der Lage ist, alle Abhängigkeiten zwischen den Variablen zu berücksichtigen (wofern die Randverteilungen des Modells nicht zu groß werden). Auf einem 2D-Gitter erzeugt dieser Algorithmus eine pentavariate Faktorisierung, die es ermöglicht, das Deceptive-Grid-Testproblem mit viel kleinerer Populationsgröße zu lösen als mit der konventionellen Faktorisierung. Insbesondere für kleine Populationsgrößen kann das Ergebnis noch verbessert werden, wenn große Randverteilungen aus kleineren vermittels des Prinzips der maximalen Entropie und des Iterative Proportional Fitting- Algorithmus berechnet werden. - Das zweite Thema ist die Verallgemeinerung graphischer Modelle zu zirkulären Strukturen. Mit der Bethe-Kikuchi-Approximation kann das zirkuläre graphische Modell (der Regionen-Graph) die Boltzmannverteilung einer Zielfunktion durch einen generalisierten Belief Propagation-Algorithmus (GBP) lernen. Er minimiert die freie Energie, eine Größe aus der statistischen Physik, die äquivalent zur relativen Entropie zur Boltzmannverteilung ist. Frühere Versuche, die Kikuchi-Approximation mit EDA zu verbinden, benutzen einen aufwendigen Gibbs-Sampling-Algorithmus, um eine Population aus dem zirkulären Wahrscheinlichkeitsmodell zu erzeugen. In dieser Arbeit wird eine Verbindung mit Faktorisierungen vorgestellt, die effizienteres Sampling erlaubt. Die freie Energie wird um die inverse Temperatur ß erweitert. Der oben erwähnte Algorithmus zur Erzeugung einer Faktorisierung kann auch hier angewendet werden. Die Dynamik von GBP wird untersucht und auf Ising-Modelle angewendet. Kleine Probleme (7 x 7) werden ohne Schwierigkeit gelöst. Größere Probleme (10 x 10 und 15 x 15) konvergieren mit großem ß nicht mehr zum wahren Optimum, aber durch Sampling von der Faktorisierung kann das Optimum bei einer Samplegröße von 1000 bis 10000, je nach Probleminstanz, gefunden werden. Wenn GBP nicht konvergiert, kann es durch eine Konkav-Konvex-Prozedur ersetzt werden, die Konvergenz garantiert. - Drittens kann, wenn für die Zielfunktion keine Struktur gegeben ist, ein Bayessches Netz gelernt werden, um die Abhängigkeiten in der Population zu erfassen. Die relative Entropie zwischen der Populationsverteilung und der Verteilung durch das Bayessche Netz ist äquivalent zur Log-Likelihood des Modells. Diese wurde erweitert zum BIC/MDL-Kriterium, das Überanpassung lindert, indem komplizierte Strukturen bestraft werden. Eine vorangegangene informationstheoretische Analyse von BIC/MDL im EDA-Bereich wird erweitert, und empirisch wird belegt, daß die Methode die korrekte Struktur einer Zielfunktion bei genügend großer Population lernen kann. - Schließlich wird vorgestellt, wie durch eine lokale Suchheuristik der Suchraum von EDA reduziert werden kann. Der Kernighan-Lin-Hillclimber, der ursprünglich für das Problem des Handlungsreisenden und Graphen-Bipartitionierung konzipiert ist, wird für beliebige binäre Probleme erweitert. Er kann allein angewandt werden, als iteratives 1+1-Suchverfahren, oder in Kombination mit EDA. Er löst das MAXSAT-Problem in ähnlicher Größenordnung wie der spezialisierte Hillclimber Walksat. Eine Analyse der lokalen Optima von Kernighan-Lin zeigt, daß die Kombination mit EDA vorteilhaft ist. Die Arbeit zeigt, wie evolutionäre Optimierung verbessert werden kann, indem interdisziplinäre Ergebnisse aus Informationstheorie, Statistik, Wahrscheinlichkeitsrechnung und statistischer Physik eingebracht werden. Die Prinzipien der Informationstheorie zur Schätzung von Wahrscheinlichkeitsverteilungen lassen sich in vielen Bereichen anwenden. EDAs sind eine gute Anwendung, denn eine verbesserte Schätzung beeinflußt direkt den Optimierungserfolg

    Spatial modeling of extreme snow depth

    Full text link
    The spatial modeling of extreme snow is important for adequate risk management in Alpine and high altitude countries. A natural approach to such modeling is through the theory of max-stable processes, an infinite-dimensional extension of multivariate extreme value theory. In this paper we describe the application of such processes in modeling the spatial dependence of extreme snow depth in Switzerland, based on data for the winters 1966--2008 at 101 stations. The models we propose rely on a climate transformation that allows us to account for the presence of climate regions and for directional effects, resulting from synoptic weather patterns. Estimation is performed through pairwise likelihood inference and the models are compared using penalized likelihood criteria. The max-stable models provide a much better fit to the joint behavior of the extremes than do independence or full dependence models.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS464 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore