Search CORE

962 research outputs found

Proximal Distilled Evolutionary Reinforcement Learning

Author: Bodnar Cristian
Day Ben
Lió Pietro
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 03/04/2020
Field of study

Reinforcement Learning (RL) has achieved impressive performance in many complex environments due to the integration with Deep Neural Networks (DNNs). At the same time, Genetic Algorithms (GAs), often seen as a competing approach to RL, had limited success in scaling up to the DNNs required to solve challenging tasks. Contrary to this dichotomic view, in the physical world, evolution and learning are complementary processes that continuously interact. The recently proposed Evolutionary Reinforcement Learning (ERL) framework has demonstrated mutual benefits to performance when combining the two methods. However, ERL has not fully addressed the scalability problem of GAs. In this paper, we show that this problem is rooted in an unfortunate combination of a simple genetic encoding for DNNs and the use of traditional biologically-inspired variation operators. When applied to these encodings, the standard operators are destructive and cause catastrophic forgetting of the traits the networks acquired. We propose a novel algorithm called Proximal Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by a hierarchical integration between evolution and learning. The main innovation of PDERL is the use of learning-based variation operators that compensate for the simplicity of the genetic representation. Unlike traditional operators, our proposals meet the functional requirements of variation operators when applied on directly-encoded DNNs. We evaluate PDERL in five robot locomotion settings from the OpenAI gym. Our method outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments.Comment: Camera-ready version for AAAI-20. Contains 10 pages, 11 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Transformer Reinforcement Learning for Procedural Level Generation

Author: Aas Sebastian Bekkvik
Mrozik Lukasz Filip
Publication venue: 'University of Agder'
Publication date: 01/01/2023
Field of study

This paper examines how recent advances in sequence modeling translate for machine learning assisted procedural level generation. We explore the use of Transformer based models like DistilGPT-2 to generate platformer levels, specifically for the game Super Mario Bros., and explore how we can use reinforcement learning to push the model towards a task like generating levels that are actually beatable. We found that large language models (LLMs) can be used without any major modifications from the original NLP focused models to instead generate levels for the aforementioned game. However, the main focus of the research is connected to how advancement in the area of NLP by the use of reinforcement learning (RL) algorithms, specifically PPO, translates to the arena of procedural level generation in cases where the levels can be treated as token sequences. We did however not find any combinations of hyperparameters that allowed the PPO to reach higher better results than our baseline model trained for next token prediction. Despite it's success in the area of NLP, we failed to find a combination of hyperparameters that improved upon the level generation by applying an reward for the whole level. However there are methods that we did not try yet, like finding specific parts of the level to reward and penalize

Agder University Research Archive

Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

Author: Hospedales Timothy M
Zhao Chenyang
Publication venue
Publication date: 08/12/2020
Field of study

In reinforcement learning, domain randomisation is an increasingly popular technique for learning more general policies that are robust to domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variance in gradient estimation and unstable learning process. To address this issue, we present a peer-to-peer online distillation strategy for RL termed P2PDRL, where multiple workers are each assigned to a different environment, and exchange knowledge through mutual regularisation based on Kullback-Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation to new environments at testing

arXiv.org e-Print Archive

Edinburgh Research Explorer

Transformer Reinforcement Learning for Procedural Level Generation

Author: Aas Sebastian Bekkvik
Mrozik Lukasz Filip
Publication venue: 'University of Agder'
Publication date: 01/01/2023
Field of study

This paper examines how recent advances in sequence modeling translate for machine learning assisted procedural level generation. We explore the use of Transformer based models like DistilGPT-2 to generate platformer levels, specifically for the game Super Mario Bros., and explore how we can use reinforcement learning to push the model towards a task like generating levels that are actually beatable. We found that large language models (LLMs) can be used without any major modifications from the original NLP focused models to instead generate levels for the aforementioned game. However, the main focus of the research is connected to how advancement in the area of NLP by the use of reinforcement learning (RL) algorithms, specifically PPO, translates to the arena of procedural level generation in cases where the levels can be treated as token sequences. We did however not find any combinations of hyperparameters that allowed the PPO to reach higher better results than our baseline model trained for next token prediction. Despite its success in the area of NLP, we failed to find a combination of hyperparameters that improved upon the level generation by applying an reward for the whole level. However there are methods that we did not try yet, like finding specific parts of the level to reward and penalize

Agder University Research Archive