4,956 research outputs found

    Evolutionary Reinforcement Learning: A Survey

    Full text link
    Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field

    Spatial Evolutionary Generative Adversarial Networks

    Full text link
    Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse. These pathologies mainly arise from a lack of diversity in their adversarial interactions. Evolutionary generative adversarial networks apply the principles of evolutionary computation to mitigate these problems. We hybridize two of these approaches that promote training diversity. One, E-GAN, at each batch, injects mutation diversity by training the (replicated) generator with three independent objective functions then selecting the resulting best performing generator for the next batch. The other, Lipizzaner, injects population diversity by training a two-dimensional grid of GANs with a distributed evolutionary algorithm that includes neighbor exchanges of additional training adversaries, performance based selection and population-based hyper-parameter tuning. We propose to combine mutation and population approaches to diversity improvement. We contribute a superior evolutionary GANs training method, Mustangs, that eliminates the single loss function used across Lipizzaner's grid. Instead, each training round, a loss function is selected with equal probability, from among the three E-GAN uses. Experimental analyses on standard benchmarks, MNIST and CelebA, demonstrate that Mustangs provides a statistically faster training method resulting in more accurate networks

    Enhanced parallel Differential Evolution algorithm for problems in computational systems biology

    Get PDF
    [Abstract] Many key problems in computational systems biology and bioinformatics can be formulated and solved using a global optimization framework. The complexity of the underlying mathematical models require the use of efficient solvers in order to obtain satisfactory results in reasonable computation times. Metaheuristics are gaining recognition in this context, with Differential Evolution (DE) as one of the most popular methods. However, for most realistic applications, like those considering parameter estimation in dynamic models, DE still requires excessive computation times. Here we consider this latter class of problems and present several enhancements to DE based on the introduction of additional algorithmic steps and the exploitation of parallelism. In particular, we propose an asynchronous parallel implementation of DE which has been extended with improved heuristics to exploit the specific structure of parameter estimation problems in computational systems biology. The proposed method is evaluated with different types of benchmarks problems: (i) black-box global optimization problems and (ii) calibration of non-linear dynamic models of biological systems, obtaining excellent results both in terms of quality of the solution and regarding speedup and scalability.Ministerio de Economía y Competitividad; DPI2011-28112-C04-03Consejo Superior de Investigaciones Científicas; PIE-201170E018Ministerio de Ciencia e Innovación; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05

    Embodied Evolution in Collective Robotics: A Review

    Get PDF
    This paper provides an overview of evolutionary robotics techniques applied to on-line distributed evolution for robot collectives -- namely, embodied evolution. It provides a definition of embodied evolution as well as a thorough description of the underlying concepts and mechanisms. The paper also presents a comprehensive summary of research published in the field since its inception (1999-2017), providing various perspectives to identify the major trends. In particular, we identify a shift from considering embodied evolution as a parallel search method within small robot collectives (fewer than 10 robots) to embodied evolution as an on-line distributed learning method for designing collective behaviours in swarm-like collectives. The paper concludes with a discussion of applications and open questions, providing a milestone for past and an inspiration for future research.Comment: 23 pages, 1 figure, 1 tabl

    Treasure hunt : a framework for cooperative, distributed parallel optimization

    Get PDF
    Orientador: Prof. Dr. Daniel WeingaertnerCoorientadora: Profa. Dra. Myriam Regattieri DelgadoTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 27/05/2019Inclui referências: p. 18-20Área de concentração: Ciência da ComputaçãoResumo: Este trabalho propõe um framework multinível chamado Treasure Hunt, que é capaz de distribuir algoritmos de busca independentes para um grande número de nós de processamento. Com o objetivo de obter uma convergência conjunta entre os nós, este framework propõe um mecanismo de direcionamento que controla suavemente a cooperação entre múltiplas instâncias independentes do Treasure Hunt. A topologia em árvore proposta pelo Treasure Hunt garante a rápida propagação da informação pelos nós, ao mesmo tempo em que provê simutaneamente explorações (pelos nós-pai) e intensificações (pelos nós-filho), em vários níveis de granularidade, independentemente do número de nós na árvore. O Treasure Hunt tem boa tolerância à falhas e está parcialmente preparado para uma total tolerância à falhas. Como parte dos métodos desenvolvidos durante este trabalho, um método automatizado de Particionamento Iterativo foi proposto para controlar o balanceamento entre explorações e intensificações ao longo da busca. Uma Modelagem de Estabilização de Convergência para operar em modo Online também foi proposto, com o objetivo de encontrar pontos de parada com bom custo/benefício para os algoritmos de otimização que executam dentro das instâncias do Treasure Hunt. Experimentos em benchmarks clássicos, aleatórios e de competição, de vários tamanhos e complexidades, usando os algoritmos de busca PSO, DE e CCPSO2, mostram que o Treasure Hunt melhora as características inerentes destes algoritmos de busca. O Treasure Hunt faz com que os algoritmos de baixa performance se tornem comparáveis aos de boa performance, e os algoritmos de boa performance possam estender seus limites até problemas maiores. Experimentos distribuindo instâncias do Treasure Hunt, em uma rede cooperativa de até 160 processos, demonstram a escalabilidade robusta do framework, apresentando melhoras nos resultados mesmo quando o tempo de processamento é fixado (wall-clock) para todas as instâncias distribuídas do Treasure Hunt. Resultados demonstram que o mecanismo de amostragem fornecido pelo Treasure Hunt, aliado à maior cooperação entre as múltiplas populações em evolução, reduzem a necessidade de grandes populações e de algoritmos de busca complexos. Isto é especialmente importante em problemas de mundo real que possuem funções de fitness muito custosas. Palavras-chave: Inteligência artificial. Métodos de otimização. Algoritmos distribuídos. Modelagem de convergência. Alta dimensionalidade.Abstract: This work proposes a multilevel framework called Treasure Hunt, which is capable of distributing independent search algorithms to a large number of processing nodes. Aiming to obtain joint convergences between working nodes, Treasure Hunt proposes a driving mechanism that smoothly controls the cooperation between the multiple independent Treasure Hunt instances. The tree topology proposed by Treasure Hunt ensures quick propagation of information, while providing simultaneous explorations (by parents) and exploitations (by children), on several levels of granularity, regardless the number of nodes in the tree. Treasure Hunt has good fault tolerance and is partially prepared to full fault tolerance. As part of the methods developed during this work, an automated Iterative Partitioning method is proposed to control the balance between exploration and exploitation as the search progress. A Convergence Stabilization Modeling to operate in Online mode is also proposed, aiming to find good cost/benefit stopping points for the optimization algorithms running within the Treasure Hunt instances. Experiments on classic, random and competition benchmarks of various sizes and complexities, using the search algorithms PSO, DE and CCPSO2, show that Treasure Hunt boosts the inherent characteristics of these search algorithms. Treasure Hunt makes algorithms with poor performances to become comparable to good ones, and algorithms with good performances to be capable of extending their limits to larger problems. Experiments distributing Treasure Hunt instances in a cooperative network up to 160 processes show the robust scaling of the framework, presenting improved results even when fixing a wall-clock time for the instances. Results show that the sampling mechanism provided by Treasure Hunt, allied to the increased cooperation between multiple evolving populations, reduce the need for large population sizes and complex search algorithms. This is specially important on real-world problems with time-consuming fitness functions. Keywords: Artificial intelligence. Optimization methods. Distributed algorithms. Convergence modeling. High dimensionality

    Enhancing Exploration and Safety in Deep Reinforcement Learning

    Get PDF
    A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
    corecore