15 research outputs found

    Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

    Full text link
    Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES

    Landscapes and Effective Fitness

    Get PDF
    The concept of a fitness landscape arose in theoretical biology, while that of effective fitness has its origin in evolutionary computation. Both have emerged as useful conceptual tools with which to understand the dynamics of evolutionary processes, especially in the presence of complex genotype-phenotype relations. In this contribution we attempt to provide a unified discussion of these two approaches, discussing both their advantages and disadvantages in the context of some simple models. We also discuss how fitness and effective fitness change under various transformations of the configuration space of the underlying genetic model, concentrating on coarse-graining transformations and on a particular coordinate transformation that provides an appropriate basis for illuminating the structure and consequences of recombination

    Impact of Data Selection on the Accuracy of Atmospheric Refractivity Inversions Performed over Marine Surfaces

    Get PDF
    Within the Earth’s atmosphere there is a planetary boundary layer that extends from the surface to roughly 1 km above the surface. Within this planetary boundary layer exists the marine atmospheric boundary layer, which is a complex turbulent surface layer that extends from the sea surface to roughly 100 m in altitude. The turbulent nature of this layer combined with the interactions across the air-sea interface cause ever changing environmental conditions within it, including atmospheric properties that affect the index of refraction, or atmospheric refractivity. Variations in atmospheric refractivity lead to many types of anomalous propagation phenomena of electromagnetic (EM) signals; thus, improving performance of these EM systems requires in-situ knowledge of the refractivity. Efforts to inversely obtain refractivity from radar power returns have done so using both reflected sea clutter and bi-static radar approaches. These types of inversion methods are driven by radar measurements. This study applies a bi-static radar data inversion process to estimate atmospheric refractivity parameters in evaporative ducting conditions and examines the impacts of radar propagation loss data quantity and source location on the accuracy of refractivity inversions. Genetic algorithms and the Variable Terrain Radio Parabolic Equation radar propagation model are used to perform the inversions for three refractivity parameters. Numerical experiments are performed to test various randomly distributed amounts of synthetic data from a 100 m altitude by 60 km range domain. To compare the impact of location of data on the inverse solutions, three domains were examined from which data was sourced, including the whole domain (0 m to 100 m altitude and 0 km to 60 km range), a lower domain (0 m to 60 m altitude and 0 km to 60 km range), and a long-range domain (0 m to 100 m altitude and 30 km to 60 km range). Comparisons of inversion performance across experiments involved evaluation of several metrics: fitness scores, fitness-distance-correlations, the root-mean-square-errors of refractivity profiles, and percent errors of each individual refractivity parameter. The results of the data quantity experiments show that propagation loss measurement coverage of approximately 1% of the prediction domain yields the most accurate refractivity estimates. It is concluded that this amount of data is needed to sufficiently eliminate non-unique solutions that were observed using smaller data quantities. The results of the regional study indicate that the long-range domain produced slightly more accurate results with less data compared to the other regions. From the results of these experiments and prior studies, four specific sampling patterns were developed that were hypothesized to generate accurate inversion results. It was shown that the pattern containing the most data cells with the widest spread over the domain generated inversion results with the highest parameter and refractivity accuracy; although, a second pattern that sourced data concentrated in a short range low altitude region performed similarly with significantly less data. The results from this study enable advancement of refractivity inversion techniques by providing insight into where and how many EM measurements are needed for successful refractivity inversions. Improvements in refractivity inversion techniques enable performance improvements of EM sensing and communication technologies

    Novelty grammar swarms

    Get PDF
    Tese de mestrado, Engenharia Informática (Sistemas de Informação), Universidade de Lisboa, Faculdade de Ciências, 2015Particle Swarm Optimization (PSO) é um dos métodos de optimização populacionais mais conhecido. Normalmente é aplicado na otimização funções de fitness, que indicam o quão perto o algoritmo está de atingir o objectivo da pesquisa, fazendo com que esta se foque em áreas de fitness mais elevado. Em problemas com muitos ótimos locais, regularmente a pesquisa fica presa em locais com fitness elevado mas que não são o verdadeiro objetivo. Com vista a solucionar este problema em certos domínios, nesta tese é introduzido o Novelty-driven Particle Swarm Optimization (NdPSO). Este algoritmo é inspirado na pesquisa pela novidade (novelty search), um método relativamente recente que guia a pesquisa de forma a encontrar instâncias significativamente diferentes das anteriores. Desta forma, o NdPSO ignora por completo o objetivo perseguindo apenas a novidade, isto torna-o menos susceptivel a ser enganado em problemas com muitos optimos locais. Uma vez que o novelty search mostrou potencial a resolver tarefas no âmbito da programação genética, em particular na evolução gramatical, neste projeto o NdPSO é usado como uma extensão do método de Grammatical Swarm que é uma combinação do PSO com a programação genética. A implementação do NdPSO é testada em três domínios diferentes, representativos daqueles para o qual este algoritmo poderá ser mais vantajoso que os algoritmos guiados pelo objectivo. Isto é, domínios enganadores nos quais seja relativamente intuitivo descrever um comportamento. Em cada um dos domínios testados, o NdPSO supera o aloritmo standard do PSO, uma das suas variantes mais conhecidas (Barebones PSO) e a pesquisa aleatória, mostrando ser uma ferramenta promissora para resolver problemas enganadores. Uma vez que esta é a primeira aplicação da pesquisa por novidade fora do paradigma evolucionário, neste projecto é também efectuado um estudo comparativo do novo algoritmo com a forma mais comum de usar a pesquisa pela novidade (na forma de algoritmo evolucionário).Particle Swarm Optimization (PSO) is a well-known population-based optimization algorithm. Most often it is applied to optimize fitness functions that specify the goal of reaching a desired objective or behavior. As a result, search focuses on higher-fitness areas. In problems with many local optima, search often becomes stuck, and thus can fail to find the intended objective. To remedy this problem in certain kinds of domains, this thesis introduces Novelty-driven Particle Swarm Optimization (NdPSO). Taking motivation from the novelty search algorithm in evolutionary computation, in this method search is driven only towards finding instances significantly different from those found before. In this way, NdPSO completely ignores the objective in its pursuit of novelty, making it less susceptible to deception and local optima. Because novelty search has previously shown potential for solving tasks in Genetic Programming, particularly, in Grammatical Evolution, this paper implements NdPSO as an extension of the Grammatical Swarm method which in effect is a combination of PSO and Genetic Programming.The resulting NdPSO implementation was tested in three different domains representative of those in which it might provide advantage over objective-driven PSO, in particular, those which are deceptive and in which a meaningful high-level description of novel behavior is easy to derive. In each of the tested domains NdPSO outperforms both objective-based PSO and random-search, demonstrating its promise as a tool for solving deceptive problems. Since this is the first application of the search for novelty outside the evolutionary paradigm an empirical comparative study of the new algorithm to a standard novelty search Evolutionary Algorithm is performed

    Novelty-assisted Interactive Evolution Of Control Behaviors

    Get PDF
    The field of evolutionary computation is inspired by the achievements of natural evolution, in which there is no final objective. Yet the pursuit of objectives is ubiquitous in simulated evolution because evolutionary algorithms that can consistently achieve established benchmarks are lauded as successful, thus reinforcing this paradigm. A significant problem is that such objective approaches assume that intermediate stepping stones will increasingly resemble the final objective when in fact they often do not. The consequence is that while solutions may exist, searching for such objectives may not discover them. This problem with objectives is demonstrated through an experiment in this dissertation that compares how images discovered serendipitously during interactive evolution in an online system called Picbreeder cannot be rediscovered when they become the final objective of the very same algorithm that originally evolved them. This negative result demonstrates that pursuing an objective limits evolution by selecting offspring only based on the final objective. Furthermore, even when high fitness is achieved, the experimental results suggest that the resulting solutions are typically brittle, piecewise representations that only perform well by exploiting idiosyncratic features in the target. In response to this problem, the dissertation next highlights the importance of leveraging human insight during search as an alternative to articulating explicit objectives. In particular, a new approach called novelty-assisted interactive evolutionary computation (NA-IEC) combines human intuition with a method called novelty search for the first time to facilitate the serendipitous discovery of agent behaviors. iii In this approach, the human user directs evolution by selecting what is interesting from the on-screen population of behaviors. However, unlike in typical IEC, the user can then request that the next generation be filled with novel descendants, as opposed to only the direct descendants of typical IEC. The result of such an approach, unconstrained by a priori objectives, is that it traverses key stepping stones that ultimately accumulate meaningful domain knowledge. To establishes this new evolutionary approach based on the serendipitous discovery of key stepping stones during evolution, this dissertation consists of four key contributions: (1) The first contribution establishes the deleterious effects of a priori objectives on evolution. The second (2) introduces the NA-IEC approach as an alternative to traditional objective-based approaches. The third (3) is a proof-of-concept that demonstrates how combining human insight with novelty search finds solutions significantly faster and at lower genomic complexities than fully-automated processes, including pure novelty search, suggesting an important role for human users in the search for solutions. Finally, (4) the NA-IEC approach is applied in a challenge domain wherein leveraging human intuition and domain knowledge accelerates the evolution of solutions for the nontrivial octopus-arm control task. The culmination of these contributions demonstrates the importance of incorporating human insights into simulated evolution as a means to discovering better solutions more rapidly than traditional approaches

    Analysis of Linkage-Friendly Genetic Algorithms

    Get PDF
    Evolutionary algorithms (EAs) are stochastic population-based algorithms inspired by the natural processes of selection, mutation, and recombination. EAs are often employed as optimum seeking techniques. A formal framework for EAs is proposed, in which evolutionary operators are viewed as mappings from parameter spaces to spaces of random functions. Formal definitions within this framework capture the distinguishing characteristics of the classes of recombination, mutation, and selection operators. EAs which use strictly invariant selection operators and order invariant representation schemes comprise the class of linkage-friendly genetic algorithms (lfGAs). Fast messy genetic algorithms (fmGAs) are lfGAs which use binary tournament selection (BTS) with thresholding, periodic filtering of a fixed number of randomly selected genes from each individual, and generalized single-point crossover. Probabilistic variants of thresholding and filtering are proposed. EAs using the probabilistic operators are generalized fmGAs (gfmGAs). A dynamical systems model of lfGAs is developed which permits prediction of expected effectiveness. BTS with probabilistic thresholding is modeled at various levels of abstraction as a Markov chain. Transitions at the most detailed level involve decisions between classes of individuals. The probability of correct decision making is related to appropriate maximal order statistics, the distributions of which are obtained. Existing filtering models are extended to include probabilistic individual lengths. Sensitivity of lfGA effectiveness to exogenous parameters limits practical applications. The lfGA parameter selection problem is formally posed as a constrained optimization problem in which the cost functional is related to expected effectiveness. Kuhn-Tucker conditions for the optimality of gfmGA parameters are derived

    Exploring and Exploiting Models of the Fitness Landscape: a Case Against Evolutionary Optimization

    Get PDF
    In recent years, the theories of natural selection and biological evolution have proved popular metaphors for understanding and solving optimization problems in engineering design. This thesis identifies some fundamental problems associated with this use of such metaphors. Key objections are the failure of evolutionary optimization techniques to represent explicitly the goal of the optimization process, and poor use of knowledge developed during the process. It is also suggested that convergent behaviour of an optimization algorithm is an undesirable quality if the algorithm is to be applied to multimodal problems. An alternative approach to optimization is suggested, based on the explicit use of knowledge and/or assumptions about the nature of the optimization problem to construct Bayesian probabilistic models of the surface being optimized and the goal of the optimization. Distinct exploratory and exploitative strategies are identified for carrying out optimization based on such models—exploration based on attempting to reduce maximally an entropy-based measure of the total uncertainty concerning the satisfaction of the optimization goal over the space, exploitation based on evalutation of the point judged most likely to achieve the goal—together with a composite strategy which combines exploration and exploitation in a principled manner. The behaviour of these strategies is empirically investigated on a number of test problems. Results suggest that the approach taken may well provide effective optimization in a way which addresses the criticisms made of the evolutionary metaphor, subject to issues of the computational cost of the approach being satisfactorily addressed

    New approaches to optimization in aerospace conceptual design

    Get PDF
    Aerospace design can be viewed as an optimization process, but conceptual studies are rarely performed using formal search algorithms. Three issues that restrict the success of automatic search are identified in this work. New approaches are introduced to address the integration of analyses and optimizers, to avoid the need for accurate gradient information and a smooth search space (required for calculus-based optimization), and to remove the restrictions imposed by fixed complexity problem formulations. (1) Optimization should be performed in a flexible environment. A quasi-procedural architecture is used to conveniently link analysis modules and automatically coordinate their execution. It efficiently controls a large-scale design tasks. (2) Genetic algorithms provide a search method for discontinuous or noisy domains. The utility of genetic optimization is demonstrated here, but parameter encodings and constraint-handling schemes must be carefully chosen to avoid premature convergence to suboptimal designs. The relationship between genetic and calculus-based methods is explored. (3) A variable-complexity genetic algorithm is created to permit flexible parameterization, so that the level of description can change during optimization. This new optimizer automatically discovers novel designs in structural and aerodynamic tasks

    Rapid and Thorough Exploration of Low Dimensional Phenotypic Landscapes

    Get PDF
    PhDThis thesis presents two novel algorithms for the evolutionary optimisation of agent populations through divergent search of low dimensional phenotypic landscapes. As the eld of Evolutionary Robotics (ER) develops towards more complex domains, which often involve deception and uncertainty, the promotion of phenotypic diversity has become of increasing interest. Divergent exploration of the phenotypic feature space has been shown to avoid convergence towards local optima and to provide diverse sets of solutions to a given objective. Novelty Search (NS) and the more recent Multi-dimensional Archive of Phenotypic Elites (MAP-Elites), are two state of the art algorithms which utilise divergent phenotypic search. In this thesis, the individual merits and weaknesses of these algorithms are built upon in order to further develop the study of divergent phenotypic search within ER. An observation that the diverse range of individuals produced through the optimisation of novelty will likely contain solutions to multiple independent objectives is utilised to develop Multiple Assessment Directed Novelty Search (MADNS). The MADNS algorithm is introduced as an extension to NS for the simultaneous optimisation of multiple independent objectives, and is shown to become more e ective than NS as the size of the state space increases. The central contribution of this thesis is the introduction of a novel algorithm for rapid and thorough divergent search of low dimensional phenotypic landscapes. The Spatial, Hierarchical, Illuminated NeuroEvolution (SHINE) algorithm di ers from previous divergent search algorithms, in that it utilises a tree structure for the maintenance and selection of potential candidates. Unlike previous approaches, SHINE iteratively focusses upon sparsely visited areas of the phenotypic landscape without the computationally expensive distance comparison required by NS; rather, the sparseness of the area within the landscape where a potential solution resides is inferred through its depth within the tree. Experimental results in a range of domains show that SHINE signi cantly outperforms NS and MAP-Elites in both performance and exploration
    corecore