73 research outputs found
Evolution-Guided Policy Gradient in Reinforcement Learning
Deep Reinforcement Learning (DRL) algorithms have been successfully applied
to a range of challenging control tasks. However, these methods typically
suffer from three core difficulties: temporal credit assignment with sparse
rewards, lack of effective exploration, and brittle convergence properties that
are extremely sensitive to hyperparameters. Collectively, these challenges
severely limit the applicability of these approaches to real-world problems.
Evolutionary Algorithms (EAs), a class of black box optimization techniques
inspired by natural evolution, are well suited to address each of these three
challenges. However, EAs typically suffer from high sample complexity and
struggle to solve problems that require optimization of a large number of
parameters. In this paper, we introduce Evolutionary Reinforcement Learning
(ERL), a hybrid algorithm that leverages the population of an EA to provide
diversified data to train an RL agent, and reinserts the RL agent into the EA
population periodically to inject gradient information into the EA. ERL
inherits EA's ability of temporal credit assignment with a fitness metric,
effective exploration with a diverse set of policies, and stability of a
population-based approach and complements it with off-policy DRL's ability to
leverage gradients for higher sample efficiency and faster learning.
Experiments in a range of challenging continuous control benchmarks demonstrate
that ERL significantly outperforms prior DRL and EA methods.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018),
Montr\'eal, Canad
Neural Network Quine
Self-replication is a key aspect of biological life that has been largely
overlooked in Artificial Intelligence systems. Here we describe how to build
and train self-replicating neural networks. The network replicates itself by
learning to output its own weights. The network is designed using a loss
function that can be optimized with either gradient-based or non-gradient-based
methods. We also describe a method we call regeneration to train the network
without explicit optimization, by injecting the network with predictions of its
own parameters. The best solution for a self-replicating network was found by
alternating between regeneration and optimization steps. Finally, we describe a
design for a self-replicating neural network that can solve an auxiliary task
such as MNIST image classification. We observe that there is a trade-off
between the network's ability to classify images and its ability to replicate,
but training is biased towards increasing its specialization at image
classification at the expense of replication. This is analogous to the
trade-off between reproduction and other tasks observed in nature. We suggest
that a self-replication mechanism for artificial intelligence is useful because
it introduces the possibility of continual improvement through natural
selection
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in
partially observable environments. In 2013, our large RL recurrent neural
networks (RNNs) learned from scratch to drive simulated cars from
high-dimensional video input. However, real brains are more powerful in many
ways. In particular, they learn a predictive model of their initially unknown
environment, and somehow use it for abstract (e.g., hierarchical) planning and
reasoning. Guided by algorithmic information theory, we describe RNN-based AIs
(RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending
sequences of tasks, some of them provided by the user, others invented by the
RNNAI itself in a curious, playful fashion, to improve its RNN-based world
model. Unlike our previous model-building RNN-based RL machines dating back to
1990, the RNNAI learns to actively query its model for abstract reasoning and
planning and decision making, essentially "learning to think." The basic ideas
of this report can be applied to many other cases where one RNN-like system
exploits the algorithmic information content of another. They are taken from a
grant proposal submitted in Fall 2014, and also explain concepts such as
"mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1404.782
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing
applications to Reinforcement Learning. It facilitates learning a population of
agents where each member is optimized to simultaneously accumulate high
task-returns and exhibit behavioral diversity compared to other members. In
this paper, we build on a recent kernel-based method for training a QD policy
ensemble with Stein variational gradient descent. With kernels based on
-divergence between the stationary distributions of policies, we convert the
problem to that of efficient estimation of the ratio of these stationary
distributions. We then study various distribution ratio estimators used
previously for off-policy evaluation and imitation and re-purpose them to
compute the gradients for policies in an ensemble such that the resultant
population is diverse and of high-quality.Comment: CoRL 2020 camera-read
Evolving Static Representations for Task Transfer
An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird\u27s eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
Much of model-based reinforcement learning involves learning a model of an
agent's world, and training an agent to leverage this model to perform a task
more efficiently. While these models are demonstrably useful for agents, every
naturally occurring model of the world of which we are aware---e.g., a
brain---arose as the byproduct of competing evolutionary pressures for
survival, not minimization of a supervised forward-predictive loss via gradient
descent. That useful models can arise out of the messy and slow optimization
process of evolution suggests that forward-predictive modeling can arise as a
side-effect of optimization under the right circumstances. Crucially, this
optimization process need not explicitly be a forward-predictive loss. In this
work, we introduce a modification to traditional reinforcement learning which
we call observational dropout, whereby we limit the agents ability to observe
the real environment at each timestep. In doing so, we can coerce an agent into
learning a world model to fill in the observation gaps during reinforcement
learning. We show that the emerged world model, while not explicitly trained to
predict the future, can help the agent learn key skills required to perform
well in its environment. Videos of our results available at
https://learningtopredict.github.io/Comment: To appear at the Thirty-third Conference on Neural Information
Processing Systems (NeurIPS 2019
An automatic selection of optimal recurrent neural network architecture for processes dynamics modelling purposes
A problem related to the development of algorithms designed to find the
structure of artificial neural network used for behavioural (black-box)
modelling of selected dynamic processes has been addressed in this paper. The
research has included four original proposals of algorithms dedicated to neural
network architecture search. Algorithms have been based on well-known
optimisation techniques such as evolutionary algorithms and gradient descent
methods. In the presented research an artificial neural network of recurrent
type has been used, whose architecture has been selected in an optimised way
based on the above-mentioned algorithms. The optimality has been understood as
achieving a trade-off between the size of the neural network and its accuracy
in capturing the response of the mathematical model under which it has been
learnt. During the optimisation, original specialised evolutionary operators
have been proposed. The research involved an extended validation study based on
data generated from a mathematical model of the fast processes occurring in a
pressurised water nuclear reactor.Comment: 32 pages, 17 figures, code availabl
Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL
In the real world, linguistic agents are also embodied agents: they perceive
and act in the physical world. The notion of Language Grounding questions the
interactions between language and embodiment: how do learning agents connect or
ground linguistic representations to the physical world ? This question has
recently been approached by the Reinforcement Learning community under the
framework of instruction-following agents. In these agents, behavioral policies
or reward functions are conditioned on the embedding of an instruction
expressed in natural language. This paper proposes another approach: using
language to condition goal generators. Given any goal-conditioned policy, one
could train a language-conditioned goal generator to generate language-agnostic
goals for the agent. This method allows to decouple sensorimotor learning from
language acquisition and enable agents to demonstrate a diversity of behaviors
for any given instruction. We propose a particular instantiation of this
approach and demonstrate its benefits
Reinforcement Learning
Reinforcement learning (RL) is a general framework for adaptive control,
which has proven to be efficient in many domains, e.g., board games, video
games or autonomous vehicles. In such problems, an agent faces a sequential
decision-making problem where, at every time step, it observes its state,
performs an action, receives a reward and moves to a new state. An RL agent
learns by trial and error a good policy (or controller) based on observations
and numeric reward feedback on the previously performed action. In this
chapter, we present the basic framework of RL and recall the two main families
of approaches that have been developed to learn a good policy. The first one,
which is value-based, consists in estimating the value of an optimal policy,
value from which a policy can be recovered, while the other, called policy
search, directly works in a policy space. Actor-critic methods can be seen as a
policy search technique where the policy value that is learned guides the
policy improvement. Besides, we give an overview of some extensions of the
standard RL framework, notably when risk-averse behavior needs to be taken into
account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research",
Springe
- …