26 research outputs found
Deep Neuroevolution of Recurrent and Discrete World Models
Neural architectures inspired by our own human cognitive system, such as the
recently introduced world models, have been shown to outperform traditional
deep reinforcement learning (RL) methods in a variety of different domains.
Instead of the relatively simple architectures employed in most RL experiments,
world models rely on multiple different neural components that are responsible
for visual information processing, memory, and decision-making. However, so far
the components of these models have to be trained separately and through a
variety of specialized training methods. This paper demonstrates the surprising
finding that models with the same precise parts can be instead efficiently
trained end-to-end through a genetic algorithm (GA), reaching a comparable
performance to the original world model by solving a challenging car racing
task. An analysis of the evolved visual and memory system indicates that they
include a similar effective representation to the system trained through
gradient descent. Additionally, in contrast to gradient descent methods that
struggle with discrete variables, GAs also work directly with such
representations, opening up opportunities for classical planning in latent
space. This paper adds additional evidence on the effectiveness of deep
neuroevolution for tasks that require the intricate orchestration of multiple
components in complex heterogeneous architectures
Evolutionary Machine Learning and Games
Evolutionary machine learning (EML) has been applied to games in multiple
ways, and for multiple different purposes. Importantly, AI research in games is
not only about playing games; it is also about generating game content,
modeling players, and many other applications. Many of these applications pose
interesting problems for EML. We will structure this chapter on EML for games
based on whether evolution is used to augment machine learning (ML) or ML is
used to augment evolution. For completeness, we also briefly discuss the usage
of ML and evolution separately in games.Comment: 27 pages, 5 figures, part of Evolutionary Machine Learning Book
(https://link.springer.com/book/10.1007/978-981-99-3814-8
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
Automated Curriculum Learning by Rewarding Temporally Rare Events
Reward shaping allows reinforcement learning (RL) agents to accelerate
learning by receiving additional reward signals. However, these signals can be
difficult to design manually, especially for complex RL tasks. We propose a
simple and general approach that determines the reward of pre-defined events by
their rarity alone. Here events become less rewarding as they are experienced
more often, which encourages the agent to continually explore new types of
events as it learns. The adaptiveness of this reward function results in a form
of automated curriculum learning that does not have to be specified by the
experimenter. We demonstrate that this \emph{Rarity of Events} (RoE) approach
enables the agent to succeed in challenging VizDoom scenarios without access to
the extrinsic reward from the environment. Furthermore, the results demonstrate
that RoE learns a more versatile policy that adapts well to critical changes in
the environment. Rewarding events based on their rarity could help in many
unsolved RL environments that are characterized by sparse extrinsic rewards but
a plethora of known event types.Comment: 8 page
Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems
Rapid online adaptation to changing tasks is an important problem in machine
learning and, recently, a focus of meta-reinforcement learning. However,
reinforcement learning (RL) algorithms struggle in POMDP environments because
the state of the system, essential in a RL framework, is not always visible.
Additionally, hand-designed meta-RL architectures may not include suitable
computational structures for specific learning problems. The evolution of
online learning mechanisms, on the contrary, has the ability to incorporate
learning strategies into an agent that can (i) evolve memory when required and
(ii) optimize adaptation speed to specific online learning problems. In this
paper, we exploit the highly adaptive nature of neuromodulated neural networks
to evolve a controller that uses the latent space of an autoencoder in a POMDP.
The analysis of the evolved networks reveals the ability of the proposed
algorithm to acquire inborn knowledge in a variety of aspects such as the
detection of cues that reveal implicit rewards, and the ability to evolve
location neurons that help with navigation. The integration of inborn knowledge
and online plasticity enabled fast adaptation and better performance in
comparison to some non-evolutionary meta-reinforcement learning algorithms. The
algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.Comment: 9 pages. Accepted as a full paper in the Genetic and Evolutionary
Computation Conference (GECCO 2020
Deep Innovation Protection: Confronting the Credit Assignment Problem in Training Heterogeneous Neural Architectures
Deep reinforcement learning approaches have shown impressive results in a
variety of different domains, however, more complex heterogeneous architectures
such as world models require the different neural components to be trained
separately instead of end-to-end. While a simple genetic algorithm recently
showed end-to-end training is possible, it failed to solve a more complex 3D
task. This paper presents a method called Deep Innovation Protection (DIP) that
addresses the credit assignment problem in training complex heterogenous neural
network models end-to-end for such environments. The main idea behind the
approach is to employ multiobjective optimization to temporally reduce the
selection pressure on specific components in multi-component network, allowing
other components to adapt. We investigate the emergent representations of these
evolved networks, which learn to predict properties important for the survival
of the agent, without the need for a specific forward-prediction loss