962 research outputs found
Proximal Distilled Evolutionary Reinforcement Learning
Reinforcement Learning (RL) has achieved impressive performance in many
complex environments due to the integration with Deep Neural Networks (DNNs).
At the same time, Genetic Algorithms (GAs), often seen as a competing approach
to RL, had limited success in scaling up to the DNNs required to solve
challenging tasks. Contrary to this dichotomic view, in the physical world,
evolution and learning are complementary processes that continuously interact.
The recently proposed Evolutionary Reinforcement Learning (ERL) framework has
demonstrated mutual benefits to performance when combining the two methods.
However, ERL has not fully addressed the scalability problem of GAs. In this
paper, we show that this problem is rooted in an unfortunate combination of a
simple genetic encoding for DNNs and the use of traditional
biologically-inspired variation operators. When applied to these encodings, the
standard operators are destructive and cause catastrophic forgetting of the
traits the networks acquired. We propose a novel algorithm called Proximal
Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by
a hierarchical integration between evolution and learning. The main innovation
of PDERL is the use of learning-based variation operators that compensate for
the simplicity of the genetic representation. Unlike traditional operators, our
proposals meet the functional requirements of variation operators when applied
on directly-encoded DNNs. We evaluate PDERL in five robot locomotion settings
from the OpenAI gym. Our method outperforms ERL, as well as two
state-of-the-art RL algorithms, PPO and TD3, in all tested environments.Comment: Camera-ready version for AAAI-20. Contains 10 pages, 11 figure
Transformer Reinforcement Learning for Procedural Level Generation
This paper examines how recent advances in sequence modeling translate for machine learning assisted procedural level generation. We explore the use of Transformer based models like DistilGPT-2 to generate platformer levels, specifically for the game Super Mario Bros., and explore how we can use reinforcement learning to push the model towards a task like generating levels that are actually beatable. We found that large language models (LLMs) can be used without any major modifications from the original NLP focused models to instead generate levels for the aforementioned game.
However, the main focus of the research is connected to how advancement in the area of NLP by the use of reinforcement learning (RL) algorithms, specifically PPO, translates to the arena of procedural level generation in cases where the levels can be treated as token sequences.
We did however not find any combinations of hyperparameters that allowed the PPO to reach higher better results than our baseline model trained for next token prediction.
Despite it's success in the area of NLP, we failed to find a combination of hyperparameters that improved upon the level generation by applying an reward for the whole level. However there are methods that we did not try yet, like finding specific parts of the level to reward and penalize
Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation
In reinforcement learning, domain randomisation is an increasingly popular
technique for learning more general policies that are robust to domain-shifts
at deployment. However, naively aggregating information from randomised domains
may lead to high variance in gradient estimation and unstable learning process.
To address this issue, we present a peer-to-peer online distillation strategy
for RL termed P2PDRL, where multiple workers are each assigned to a different
environment, and exchange knowledge through mutual regularisation based on
Kullback-Leibler divergence. Our experiments on continuous control tasks show
that P2PDRL enables robust learning across a wider randomisation distribution
than baselines, and more robust generalisation to new environments at testing
Transformer Reinforcement Learning for Procedural Level Generation
This paper examines how recent advances in sequence modeling translate for machine learning assisted procedural level generation. We explore the use of Transformer based models like DistilGPT-2 to generate platformer levels, specifically for the game Super Mario Bros., and explore how we can use reinforcement learning to push the model towards a task like generating levels that are actually beatable. We found that large language models (LLMs) can be used without any major modifications from the original NLP focused models to instead generate levels for the aforementioned game.
However, the main focus of the research is connected to how advancement in the area of NLP by the use of reinforcement learning (RL) algorithms, specifically PPO, translates to the arena of procedural level generation in cases where the levels can be treated as token sequences.
We did however not find any combinations of hyperparameters that allowed the PPO to reach higher better results than our baseline model trained for next token prediction.
Despite its success in the area of NLP, we failed to find a combination of hyperparameters that improved upon the level generation by applying an reward for the whole level. However there are methods that we did not try yet, like finding specific parts of the level to reward and penalize
- …