6,463 research outputs found

    Novelty search creates robots with general skills for exploration

    Get PDF
    ABSTRACT Novelty Search, a new type of Evolutionary Algorithm, has shown much promise in the last few years. Instead of selecting for phenotypes that are closer to an objective, Novelty Search assigns rewards based on how di↵erent the phenotypes are from those already generated. A common criticism of Novelty Search is that it is e↵ectively random or exhaustive search because it tries solutions in an unordered manner until a correct one is found. Its creators respond that over time Novelty Search accumulates information about the environment in the form of skills relevant to reaching uncharted territory, but to date no evidence for that hypothesis has been presented. In this paper we test that hypothesis by transferring robots evolved under Novelty Search to new environments (here, mazes) to see if the skills they've acquired generalize. Three lines of evidence support the claim that Novelty Search agents do indeed learn general exploration skills. First, robot controllers evolved via Novelty Search in one maze and then transferred to a new maze explore significantly more of the new environment than nonevolved (randomly generated) agents. Second, a Novelty Search process to solve the new mazes works significantly faster when seeded with the transferred controllers versus randomly-generated ones. Third, no significant di↵erence exists when comparing two types of transferred agents: those evolved in the original maze under (1) Novelty Search vs. (2) a traditional, objective-based fitness function. The evidence gathered suggests that, like traditional Evolutionary Algorithms with objective-based fitness functions, Novelty Search is not a random or exhaustive search process, but instead is accumulating information about the environment, resulting in phenotypes possessing skills needed to explore their world. Keywords Novelty Search; Evolutionary Robotics Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

    Full text link
    Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES

    A digest of the main insights and achievements of the project IM-CLeVeR - Intrinsically Motivated Cumulative Learning Versatile Robots

    Get PDF
    This document is directed to illustrate the main achievements and insights gained by the IM-CLeVeR project. These achievements suggest new hypothesis of research on intrinsic motivations and autonomous cumulative open-ended learning both for running new neuroscience and psychology experiments and for building future autonomously developing robots. IM-CLeVeR is a project funded by the European Commission under the 7th Framework Programme (FP7/2007-2013), \u27\u27Challenge 2 - Cognitive Systems, Interaction, Robotics\u27\u27, grant agreement No. ICTIP-231722

    Intrinsic Motivation Systems for Autonomous Mental Development

    Get PDF
    Exploratory activities seem to be intrinsically rewarding for children and crucial for their cognitive development. Can a machine be endowed with such an intrinsic motivation system? This is the question we study in this paper, presenting a number of computational systems that try to capture this drive towards novel or curious situations. After discussing related research coming from developmental psychology, neuroscience, developmental robotics, and active learning, this paper presents the mechanism of Intelligent Adaptive Curiosity, an intrinsic motivation system which pushes a robot towards situations in which it maximizes its learning progress. This drive makes the robot focus on situations which are neither too predictable nor too unpredictable, thus permitting autonomous mental development.The complexity of the robot’s activities autonomously increases and complex developmental sequences self-organize without being constructed in a supervised manner. Two experiments are presented illustrating the stage-like organization emerging with this mechanism. In one of them, a physical robot is placed on a baby play mat with objects that it can learn to manipulate. Experimental results show that the robot first spends time in situations which are easy to learn, then shifts its attention progressively to situations of increasing difficulty, avoiding situations in which nothing can be learned. Finally, these various results are discussed in relation to more complex forms of behavioral organization and data coming from developmental psychology. Key words: Active learning, autonomy, behavior, complexity, curiosity, development, developmental trajectory, epigenetic robotics, intrinsic motivation, learning, reinforcement learning, values

    GRIMGEP: Learning Progress for Robust Goal Sampling in Visual Deep Reinforcement Learning

    Get PDF
    Designing agents, capable of learning autonomously a wide range of skills is critical in order to increase the scope of reinforcement learning. It will both increase the diversity of learned skills and reduce the burden of manually designing reward functions for each skill. Self-supervised agents, setting their own goals, and trying to maximize the diversity of those goals have shown great promise towards this end. However, a currently known limitation of agents trying to maximize the diversity of sampled goals is that they tend to get attracted to noise or more generally to parts of the environments that cannot be controlled (distractors). When agents have access to predefined goal features or expert knowledge, absolute Learning Progress (ALP) provides a way to distinguish between regions that can be controlled and those that cannot. However, those methods often fall short when the agents are only provided with raw sensory inputs such as images. In this work we extend those concepts to unsupervised image-based goal exploration. We propose a framework that allows agents to autonomously identify and ignore noisy distracting regions while searching for novelty in the learnable regions to both improve overall performance and avoid catastrophic forgetting. Our framework can be combined with any state-of-the-art novelty seeking goal exploration approaches. We construct a rich 3D image based environment with distractors. Experiments on this environment show that agents using our framework successfully identify interesting regions of the environment, resulting in drastically improved performances. The source code is available at https://sites.google.com/view/grimgep
    • …
    corecore