26 research outputs found

    Multiagent Learning Through Indirect Encoding

    Get PDF
    Designing a system of multiple, heterogeneous agents that cooperate to achieve a common goal is a difficult task, but it is also a common real-world problem. Multiagent learning addresses this problem by training the team to cooperate through a learning algorithm. However, most traditional approaches treat multiagent learning as a combination of multiple single-agent learning problems. This perspective leads to many inefficiencies in learning such as the problem of reinvention, whereby fundamental skills and policies that all agents should possess must be rediscovered independently for each team member. For example, in soccer, all the players know how to pass and kick the ball, but a traditional algorithm has no way to share such vital information because it has no way to relate the policies of agents to each other. In this dissertation a new approach to multiagent learning that seeks to address these issues is presented. This approach, called multiagent HyperNEAT, represents teams as a pattern of policies rather than individual agents. The main idea is that an agent’s location within a canonical team layout (such as a soccer team at the start of a game) tends to dictate its role within that team, called the policy geometry. For example, as soccer positions move from goal to center they become more offensive and less defensive, a concept that is compactly represented as a pattern. iii The first major contribution of this dissertation is a new method for evolving neural network controllers called HyperNEAT, which forms the foundation of the second contribution and primary focus of this work, multiagent HyperNEAT. Multiagent learning in this dissertation is investigated in predator-prey, room-clearing, and patrol domains, providing a real-world context for the approach. Interestingly, because the teams in multiagent HyperNEAT are represented as patterns they can scale up to an infinite number of multiagent policies that can be sampled from the policy geometry as needed. Thus the third contribution is a method for teams trained with multiagent HyperNEAT to dynamically scale their size without further learning. Fourth, the capabilities to both learn and scale in multiagent HyperNEAT are compared to the traditional multiagent SARSA(λ) approach in a comprehensive study. The fifth contribution is a method for efficiently learning and encoding multiple policies for each agent on a team to facilitate learning in multi-task domains. Finally, because there is significant interest in practical applications of multiagent learning, multiagent HyperNEAT is tested in a real-world military patrolling application with actual Khepera III robots. The ultimate goal is to provide a new perspective on multiagent learning and to demonstrate the practical benefits of training heterogeneous, scalable multiagent teams through generative encoding

    Worldwide Infrastructure for Neuroevolution: A Modular Library to Turn Any Evolutionary Domain into an Online Interactive Platform

    Get PDF
    Across many scientific disciplines, there has emerged an open opportunity to utilize the scale and reach of the Internet to collect scientific contributions from scientists and non-scientists alike. This process, called citizen science, has already shown great promise in the fields of biology and astronomy. Within the fields of artificial life (ALife) and evolutionary computation (EC) experiments in collaborative interactive evolution (CIE) have demonstrated the ability to collect thousands of experimental contributions from hundreds of users across the glob. However, such collaborative evolutionary systems can take nearly a year to build with a small team of researchers. This dissertation introduces a new developer framework enabling researchers to easily build fully persistent online collaborative experiments around almost any evolutionary domain, thereby reducing the time to create such systems to weeks for a single researcher. To add collaborative functionality to any potential domain, this framework, called Worldwide Infrastructure for Neuroevolution (WIN), exploits an important unifying principle among all evolutionary algorithms: regardless of the overall methods and parameters of the evolutionary experiment, every individual created has an explicit parent-child relationship, wherein one individual is considered the direct descendant of another. This principle alone is enough to capture and preserve the relationships and results for a wide variety of evolutionary experiments, while allowing multiple human users to meaningfully contribute. The WIN framework is first validated through two experimental domains, image evolution and a new two-dimensional virtual creature domain, Indirectly Encoded SodaRace (IESoR), that is shown to produce a visually diverse variety of ambulatory creatures. Finally, an Android application built with WIN, filters, allows users to interactively evolve custom image effects to apply to personalized photographs, thereby introducing the first CIE application available for any mobile device. Together, these collaborative experiments and new mobile application establish a comprehensive new platform for evolutionary computation that can change how researchers design and conduct citizen science online

    Algebraic Neural Architecture Representation, Evolutionary Neural Architecture Search, and Novelty Search in Deep Reinforcement Learning

    Get PDF
    Evolutionary algorithms have recently re-emerged as powerful tools for machine learning and artificial intelligence, especially when combined with advances in deep learning developed over the last decade. In contrast to the use of fixed architectures and rigid learning algorithms, we leveraged the open-endedness of evolutionary algorithms to make both theoretical and methodological contributions to deep reinforcement learning. This thesis explores and develops two major areas at the intersection of evolutionary algorithms and deep reinforcement learning: generative network architectures and behaviour-based optimization. Over three distinct contributions, both theoretical and experimental methods were applied to deliver a novel mathematical framework and experimental method for generative, modular neural network architecture search for reinforcement learning, and a generalized formulation of a behaviour- based optimization framework for reinforcement learning called novelty search. Experimental results indicate that both alternative, behaviour-based optimization and neural architecture search can each be used to improve learning in the popular Atari 2600 benchmark compared to DQN — a popular gradient-based method. These results are in-line with related work demonstrating that strictly gradient-free methods are competitive with gradient-based reinforcement learning. These contributions, together with other successful combinations of evolutionary algorithms and deep learning, demonstrate that alternative architectures and learning algorithms to those conventionally used in deep learning should be seriously investigated in an effort to drive progress in artificial intelligence

    Evolutionary Reinforcement Learning: A Survey

    Full text link
    Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field

    Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric Distance

    Full text link
    Reinforcement learning (RL) problems often feature deceptive local optima, and learning methods that optimize purely for reward signal often fail to learn strategies for overcoming them. Deep neuroevolution and novelty search have been proposed as effective alternatives to gradient-based methods for learning RL policies directly from pixels. In this paper, we introduce and evaluate the use of novelty search over agent action sequences by string edit metric distance as a means for promoting innovation. We also introduce a method for stagnation detection and population resampling inspired by recent developments in the RL community that uses the same mechanisms as novelty search to promote and develop innovative policies. Our methods extend a state-of-the-art method for deep neuroevolution using a simple-yet-effective genetic algorithm (GA) designed to efficiently learn deep RL policy network weights. Experiments using four games from the Atari 2600 benchmark were conducted. Results provide further evidence that GAs are competitive with gradient-based algorithms for deep RL. Results also demonstrate that novelty search over action sequences is an effective source of selection pressure that can be integrated into existing evolutionary algorithms for deep RL.Comment: Submitted to GECCO 201

    Neuroevolutional Methods for Decision Support Under Uncertainty

    Get PDF
    The article presents a comparative analysis of the fundamental neuroevolutional methods, which are widely applied for the intellectualization of the decision making support systems under uncertainty. Based on this analysis the new neuroevolutionary method is introduced. It is intended to modify both the topology and the parameters of the neural network, and not to impose additional constraints on the individual. The results of the experimental evaluation of the performance of the methods based on the series of benchmark tasks of adaptive control, classification and restoration of damaged data are carried out. As criteria of the methods evaluation the number of failures and the total number of evolution epochs are used

    Bio-inspired Dynamic Control Systems with Time Delays

    Get PDF
    The world around us exhibits a rich and ever changing environment of startling, bewildering and fascinating complexity. Almost everything is never as simple as it seems, but through the chaos we may catch fleeting glimpses of the mechanisms within. Throughout the history of human endeavour we have mimicked nature to harness it for our own ends. Our attempts to develop truly autonomous and intelligent machines have however struggled with the limitations of our human ability. This has encouraged some to shirk this responsibility and instead model biological processes and systems to do it for us. This Thesis explores the introduction of continuous time delays into biologically inspired dynamic control systems. We seek to exploit rich temporal dynamics found in physical and biological systems for modelling complex or adaptive behaviour through the artificial evolution of networks to control robots. Throughout, arguments have been presented for the modelling of delays not only to better represent key facets of physical and biological systems, but to increase the computational potential of such systems for the synthesis of control. The thorough investigation of the dynamics of small delayed networks with a wide range of time delays has been undertaken, with a detailed mathematical description of the fixed points of the system and possible oscillatory modes developed to fully describe the behaviour of a single node. Exploration of the behaviour for even small delayed networks illustrates the range of complex behaviour possible and guides the development of interesting solutions. To further exploit the potential of the rich dynamics in such systems, a novel approach to the 3D simulation of locomotory robots has been developed focussing on minimising the computational cost. To verify this simulation tool a simple quadruped robot was developed and the motion of the robot when undergoing a manually designed gait evaluated. The results displayed a high degree of agreement between the simulation and laser tracker data, verifying the accuracy of the model developed. A new model of a dynamic system which includes continuous time delays has been introduced, and its utility demonstrated in the evolution of networks for the solution of simple learning behaviours. A range of methods has been developed for determining the time delays, including the novel concept of representing the time delays as related to the distance between nodes in a spatial representation of the network. The application of these tools to a range of examples has been explored, from Gene Regulatory Networks (GRNs) to robot control and neural networks. The performance of these systems has been compared and contrasted with the efficacy of evolutionary runs for the same task over the whole range of network and delay types. It has been shown that delayed dynamic neural systems are at least as capable as traditional Continuous Time Recurrent Neural Networks (CTRNNs) and show significant performance improvements in the control of robot gaits. Experiments in adaptive behaviour, where there is not such a direct link between the enhanced system dynamics and performance, showed no such discernible improvement. Whilst we hypothesise that the ability of such delayed networks to generate switched pattern generating nodes may be useful in Evolutionary Robotics (ER) this was not borne out here. The spatial representation of delays was shown to be more efficient for larger networks, however these techniques restricted the search to lower complexity solutions or led to a significant falloff as the network structure becomes more complex. This would suggest that for anything other than a simple genotype, the direct method for encoding delays is likely most appropriate. With proven benefits for robot locomotion and the open potential for adaptive behaviour delayed dynamic systems for evolved control remain an interesting and promising field in complex systems research
    corecore