25 research outputs found

    Deep Neuroevolution of Recurrent and Discrete World Models

    Get PDF
    Neural architectures inspired by our own human cognitive system, such as the recently introduced world models, have been shown to outperform traditional deep reinforcement learning (RL) methods in a variety of different domains. Instead of the relatively simple architectures employed in most RL experiments, world models rely on multiple different neural components that are responsible for visual information processing, memory, and decision-making. However, so far the components of these models have to be trained separately and through a variety of specialized training methods. This paper demonstrates the surprising finding that models with the same precise parts can be instead efficiently trained end-to-end through a genetic algorithm (GA), reaching a comparable performance to the original world model by solving a challenging car racing task. An analysis of the evolved visual and memory system indicates that they include a similar effective representation to the system trained through gradient descent. Additionally, in contrast to gradient descent methods that struggle with discrete variables, GAs also work directly with such representations, opening up opportunities for classical planning in latent space. This paper adds additional evidence on the effectiveness of deep neuroevolution for tasks that require the intricate orchestration of multiple components in complex heterogeneous architectures

    Deep learning for video game playing

    Get PDF
    In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards

    Automated Curriculum Learning by Rewarding Temporally Rare Events

    Get PDF
    Reward shaping allows reinforcement learning (RL) agents to accelerate learning by receiving additional reward signals. However, these signals can be difficult to design manually, especially for complex RL tasks. We propose a simple and general approach that determines the reward of pre-defined events by their rarity alone. Here events become less rewarding as they are experienced more often, which encourages the agent to continually explore new types of events as it learns. The adaptiveness of this reward function results in a form of automated curriculum learning that does not have to be specified by the experimenter. We demonstrate that this \emph{Rarity of Events} (RoE) approach enables the agent to succeed in challenging VizDoom scenarios without access to the extrinsic reward from the environment. Furthermore, the results demonstrate that RoE learns a more versatile policy that adapts well to critical changes in the environment. Rewarding events based on their rarity could help in many unsolved RL environments that are characterized by sparse extrinsic rewards but a plethora of known event types.Comment: 8 page

    Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems

    Full text link
    Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.Comment: 9 pages. Accepted as a full paper in the Genetic and Evolutionary Computation Conference (GECCO 2020

    Deep Innovation Protection: Confronting the Credit Assignment Problem in Training Heterogeneous Neural Architectures

    Get PDF
    Deep reinforcement learning approaches have shown impressive results in a variety of different domains, however, more complex heterogeneous architectures such as world models require the different neural components to be trained separately instead of end-to-end. While a simple genetic algorithm recently showed end-to-end training is possible, it failed to solve a more complex 3D task. This paper presents a method called Deep Innovation Protection (DIP) that addresses the credit assignment problem in training complex heterogenous neural network models end-to-end for such environments. The main idea behind the approach is to employ multiobjective optimization to temporally reduce the selection pressure on specific components in multi-component network, allowing other components to adapt. We investigate the emergent representations of these evolved networks, which learn to predict properties important for the survival of the agent, without the need for a specific forward-prediction loss

    Evolutionary reinforcement learning for vision-based general video game playing.

    Get PDF
    Over the past decade, video games have become increasingly utilised for research in artificial intelligence. Perhaps the most extensive use of video games has been as benchmark problems in the field of reinforcement learning. Part of the reason for this is because video games are designed to challenge humans, and as a result, developing methods capable of mastering them is considered a stepping stone to achieving human-level per- formance in real-world tasks. Of particular interest are vision-based general video game playing (GVGP) methods. These are methods that learn from pixel inputs and can be applied, without modification, across sets of games. One of the challenges in evolutionary computing is scaling up neuroevolution methods, which have proven effective at solving simpler reinforcement learning problems in the past, to tasks with high- dimensional input spaces, such as video games. This thesis proposes a novel method for vision-based GVGP that combines the representational learning power of deep neural networks and the policy learning benefits of neuroevolution. This is achieved by separating state representation and policy learning and applying neuroevolution only to the latter. The method, AutoEncoder-augmented NeuroEvolution of Augmented Topologies (AE-NEAT), uses a deep autoencoder to learn compact state representations that are used as input for policy networks evolved using NEAT. Experiments on a selection of Atari games showed that this approach can successfully evolve high-performing agents and scale neuroevolution methods that evolve both weights and topology to do- mains with high-dimensional inputs. Overall, the experiments and results demonstrate a proof-of-concept of this separated state representation and policy learning approach and show that hybrid deep learning and neuroevolution-based GVGP methods are a promising avenue for future research
    corecore