262 research outputs found
Proximal Distilled Evolutionary Reinforcement Learning
Reinforcement Learning (RL) has achieved impressive performance in many
complex environments due to the integration with Deep Neural Networks (DNNs).
At the same time, Genetic Algorithms (GAs), often seen as a competing approach
to RL, had limited success in scaling up to the DNNs required to solve
challenging tasks. Contrary to this dichotomic view, in the physical world,
evolution and learning are complementary processes that continuously interact.
The recently proposed Evolutionary Reinforcement Learning (ERL) framework has
demonstrated mutual benefits to performance when combining the two methods.
However, ERL has not fully addressed the scalability problem of GAs. In this
paper, we show that this problem is rooted in an unfortunate combination of a
simple genetic encoding for DNNs and the use of traditional
biologically-inspired variation operators. When applied to these encodings, the
standard operators are destructive and cause catastrophic forgetting of the
traits the networks acquired. We propose a novel algorithm called Proximal
Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by
a hierarchical integration between evolution and learning. The main innovation
of PDERL is the use of learning-based variation operators that compensate for
the simplicity of the genetic representation. Unlike traditional operators, our
proposals meet the functional requirements of variation operators when applied
on directly-encoded DNNs. We evaluate PDERL in five robot locomotion settings
from the OpenAI gym. Our method outperforms ERL, as well as two
state-of-the-art RL algorithms, PPO and TD3, in all tested environments.Comment: Camera-ready version for AAAI-20. Contains 10 pages, 11 figure
Evolutionary Reinforcement Learning: A Survey
Reinforcement learning (RL) is a machine learning approach that trains agents
to maximize cumulative rewards through interactions with environments. The
integration of RL with deep learning has recently resulted in impressive
achievements in a wide range of challenging tasks, including board games,
arcade games, and robot control. Despite these successes, there remain several
crucial challenges, including brittle convergence properties caused by
sensitive hyperparameters, difficulties in temporal credit assignment with long
time horizons and sparse rewards, a lack of diverse exploration, especially in
continuous search space scenarios, difficulties in credit assignment in
multi-agent reinforcement learning, and conflicting objectives for rewards.
Evolutionary computation (EC), which maintains a population of learning agents,
has demonstrated promising performance in addressing these limitations. This
article presents a comprehensive survey of state-of-the-art methods for
integrating EC into RL, referred to as evolutionary reinforcement learning
(EvoRL). We categorize EvoRL methods according to key research fields in RL,
including hyperparameter optimization, policy search, exploration, reward
shaping, meta-RL, and multi-objective RL. We then discuss future research
directions in terms of efficient methods, benchmarks, and scalable platforms.
This survey serves as a resource for researchers and practitioners interested
in the field of EvoRL, highlighting the important challenges and opportunities
for future research. With the help of this survey, researchers and
practitioners can develop more efficient methods and tailored benchmarks for
EvoRL, further advancing this promising cross-disciplinary research field
Evolutionary Reinforcement Learning via Cooperative Coevolutionary Negatively Correlated Search
Evolutionary algorithms (EAs) have been successfully applied to optimize the
policies for Reinforcement Learning (RL) tasks due to their exploration
ability. The recently proposed Negatively Correlated Search (NCS) provides a
distinct parallel exploration search behavior and is expected to facilitate RL
more effectively. Considering that the commonly adopted neural policies usually
involves millions of parameters to be optimized, the direct application of NCS
to RL may face a great challenge of the large-scale search space. To address
this issue, this paper presents an NCS-friendly Cooperative Coevolution (CC)
framework to scale-up NCS while largely preserving its parallel exploration
search behavior. The issue of traditional CC that can deteriorate NCS is also
discussed. Empirical studies on 10 popular Atari games show that the proposed
method can significantly outperform three state-of-the-art deep RL methods with
50% less computational time by effectively exploring a 1.7 million-dimensional
search space
Self-Organisation of Neural Topologies by Evolutionary Reinforcement Learning
In this article we present EANT, "Evolutionary Acquisition of Neural Topologies", a method that creates neural networks (NNs) by evolutionary reinforcement learning. The structure of NNs is developed using mutation operators, starting from a minimal structure. Their parameters are optimised using CMA-ES. EANT can create NNs that are very specialised; they achieve a very good performance while being relatively small. This can be seen in experiments where our method competes with a different one, called NEAT, "NeuroEvolution of Augmenting Topologies", to create networks that control a robot in a visual serving scenario
BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization
Evolutionary reinforcement learning (ERL) algorithms recently raise attention
in tackling complex reinforcement learning (RL) problems due to high
parallelism, while they are prone to insufficient exploration or model collapse
without carefully tuning hyperparameters (aka meta-parameters). In the paper,
we propose a general meta ERL framework via bilevel optimization (BiERL) to
jointly update hyperparameters in parallel to training the ERL model within a
single agent, which relieves the need for prior domain knowledge or costly
optimization procedure before model deployment. We design an elegant meta-level
architecture that embeds the inner-level's evolving experience into an
informative population representation and introduce a simple and feasible
evaluation of the meta-level fitness function to facilitate learning
efficiency. We perform extensive experiments in MuJoCo and Box2D tasks to
verify that as a general framework, BiERL outperforms various baselines and
consistently improves the learning performance for a diversity of ERL
algorithms.Comment: Published as a conference paper at European Conference on Artificial
Intelligence (ECAI) 202
Evolutionary Reinforcement Learning of Spoken Dialogue Strategies
Institute for Communicating and Collaborative SystemsFrom a system developer's perspective, designing a spoken dialogue system can be a time-consuming and difficult process. A developer may spend a lot of time anticipating how a potential user might interact with the system and then deciding on the most appropriate system response. These decisions are encoded in a dialogue strategy, essentially a mapping between anticipated user inputs and appropriate system outputs.
To reduce the time and effort associated with developing a dialogue strategy, recent work has concentrated on modelling the development of a dialogue strategy as a sequential decision problem. Using this model, reinforcement learning algorithms have been employed to generate dialogue strategies automatically. These algorithms learn strategies by interacting with simulated users. Some progress has been made with this method but a number of important challenges remain. For instance, relatively little success has been achieved with the large state representations that are typical of real-life systems. Another crucial issue is the time and effort associated with the creation of simulated users.
In this thesis, I propose an alternative to existing reinforcement learning methods of dialogue strategy development. More specifically, I explore how XCS, an evolutionary reinforcement learning algorithm, can be used to find dialogue strategies that cover large state spaces. Furthermore, I suggest that hand-coded simulated users are sufficient for the learning of useful dialogue strategies. I argue that the use of evolutionary reinforcement learning and hand-coded simulated users is an effective approach to the rapid development of spoken dialogue strategies.
Finally, I substantiate this claim by evaluating a learned strategy with real users. Both the learned strategy and a state-of-the-art hand-coded strategy were integrated into an end-to-end spoken dialogue system. The dialogue system allowed real users to make flight enquiries using a live database for an Edinburgh-based airline. The performance of the learned and hand-coded strategies were compared. The evaluation results show that the learned strategy performs as well as the hand-coded one (81% and 77% task completion respectively) but takes much less time to design (two days instead of two weeks). Moreover, the learned strategy compares favourably with previous user evaluations of learned strategies
Recommended from our members
Optimizing thermodynamic trajectories using evolutionary and gradient-based reinforcement learning
Using a model heat engine, we show that neural network-based reinforcement
learning can identify thermodynamic trajectories of maximal efficiency. We
consider both gradient and gradient-free reinforcement learning. We use an
evolutionary learning algorithm to evolve a population of neural networks,
subject to a directive to maximize the efficiency of a trajectory composed of a
set of elementary thermodynamic processes; the resulting networks learn to
carry out the maximally-efficient Carnot, Stirling, or Otto cycles. When given
an additional irreversible process, this evolutionary scheme learns a
previously unknown thermodynamic cycle. Gradient-based reinforcement learning
is able to learn the Stirling cycle, whereas an evolutionary approach achieves
the optimal Carnot cycle. Our results show how the reinforcement learning
strategies developed for game playing can be applied to solve physical problems
conditioned upon path-extensive order parameters
- ā¦