Search CORE

63 research outputs found

The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors

Author: Codel Cayden
Guss William H.
Hofmann Katja
Houghton Brandon
Kuno Noboru
Liebana Diego Perez
Milani Stephanie
Mohanty Sharada
Salakhutdinov Ruslan
Topin Nicholay
Veloso Manuela
Wang Phillip
Publication venue
Publication date: 19/01/2021
Field of study

Though deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. As state-of-the-art reinforcement learning (RL) systems require an exponentially increasing number of samples, their development is restricted to a continually shrinking segment of the AI community. Likewise, many of these systems cannot be applied to real-world problems, where environment samples are expensive. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we introduce the MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments. To that end, we introduce: (1) the Minecraft ObtainDiamond task, a sequential decision making environment requiring long-term planning, hierarchical control, and efficient exploration methods; and (2) the MineRL-v0 dataset, a large-scale collection of over 60 million state-action pairs of human demonstrations that can be resimulated into embodied trajectories with arbitrary modifications to game state and visuals. Participants will compete to develop systems which solve the ObtainDiamond task with a limited number of samples from the environment simulator, Malmo. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment with different game textures. At the end of each round, competitors will submit containerized versions of their learning algorithms and they will then be trained/evaluated from scratch on a hold-out dataset-environment pair for a total of 4-days on a prespecified hardware platform.Comment: accepted at NeurIPS 2019, 28 page

arXiv.org e-Print Archive

Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning

Author: Guss William H.
Houghton Brandon
Kuno Noboru Sean
Milani Stephanie
Mohanty Sharada P.
Nakata Keisuke
Topin Nicholay
Vinyals Oriol
Publication venue
Publication date: 18/06/2020
Field of study

To facilitate research in the direction of sample efficient reinforcement learning, we held the MineRL Competition on Sample Efficient Reinforcement Learning Using Human Priors at the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019). The primary goal of this competition was to promote the development of algorithms that use human demonstrations alongside reinforcement learning to reduce the number of samples needed to solve complex, hierarchical, and sparse environments. We describe the competition, outlining the primary challenge, the competition design, and the resources that we provided to the participants. We provide an overview of the top solutions, each of which use deep reinforcement learning and/or imitation learning. We also discuss the impact of our organizational decisions on the competition and future directions for improvement.Comment: To appear in Proceedings of Machine Learning Research: NeurIPS 2019 Competition & Demonstration Track Postproceedings. 12 pages, 2 figure

arXiv.org e-Print Archive

Guaranteeing Reproducibility in Deep Learning Competitions

Author: Guss William
Hofmann Katja
Houghton Brandon
Milani Stephanie
Perez-Liebana Diego
Salakhutdinov Ruslan
Topin Nicholay
Veloso Manuela
Publication venue
Publication date: 12/05/2020
Field of study

To encourage the development of methods with reproducible and robust training behavior, we propose a challenge paradigm where competitors are evaluated directly on the performance of their learning procedures rather than pre-trained agents. Since competition organizers re-train proposed methods in a controlled setting they can guarantee reproducibility, and -- by retraining submissions using a held-out test set -- help ensure generalization past the environments on which they were trained.Comment: Accepted as a poster presentation to the 2019 NeruIPS Challenges in Machine Learning workshop (CiML

arXiv.org e-Print Archive

Playing Minecraft with Behavioural Cloning

Author: Hautamäki Ville
Kanervisto Anssi
Karttunen Janne
Publication venue
Publication date: 07/05/2020
Field of study

MineRL 2019 competition challenged participants to train sample-efficient agents to play Minecraft, by using a dataset of human gameplay and a limit number of steps the environment. We approached this task with behavioural cloning by predicting what actions human players would take, and reached fifth place in the final ranking. Despite being a simple algorithm, we observed the performance of such an approach can vary significantly, based on when the training is stopped. In this paper, we detail our submission to the competition, run further experiments to study how performance varied over training and study how different engineering decisions affected these results.Comment: To appear in Post Proceedings of the Competitions & Demonstrations Track @ NeurIPS2019. Source code available at https://github.com/Miffyli/minecraft-b

arXiv.org e-Print Archive

Action Space Shaping in Deep Reinforcement Learning

Author: Hautamäki Ville
Kanervisto Anssi
Scheller Christian
Publication venue
Publication date: 26/05/2020
Field of study

Reinforcement learning (RL) has been successful in training agents in various learning environments, including video-games. However, such work modifies and shrinks the action space from the game's original. This is to avoid trying "pointless" actions and to ease the implementation. Currently, this is mostly done based on intuition, with little systematic research supporting the design decisions. In this work, we aim to gain insight on these action space modifications by conducting extensive experiments in video-game environments. Our results show how domain-specific removal of actions and discretization of continuous actions can be crucial for successful learning. With these insights, we hope to ease the use of RL in new environments, by clarifying what action-spaces are easy to learn.Comment: To appear in IEEE Conference on Games 2020. Experiment code is available at https://github.com/Miffyli/rl-action-space-shapin

arXiv.org e-Print Archive

Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft

Author: Aitygulov Ermek
Aksenov Kirill
Davydov Vasilii
Panov Aleksandr I.
Skrynnik Alexey
Staroverov Aleksey
Publication venue
Publication date: 13/07/2020
Field of study

We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing technique that allow the HDQfD agent to gradually erase poor-quality expert data from the buffer. In this paper, we present the details of the HDQfD algorithm and give the experimental results in the Minecraft domain

arXiv.org e-Print Archive

Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft

Author: Scheller Christian
Schraner Yanick
Vogel Manfred
Publication venue
Publication date: 12/03/2020
Field of study

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.Comment: 10 pages, 2 figure

arXiv.org e-Print Archive

Distilling Reinforcement Learning Tricks for Video Games

Author: Hautamäki Ville
Kanervisto Anssi
Scheller Christian
Schraner Yanick
Publication venue
Publication date: 01/07/2021
Field of study

Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps ("tricks") which may be needed to effectively use RL, such as reward shaping, curriculum learning, and splitting a large task into smaller chunks. Such tricks are common, if not necessary, to achieve state-of-the-art results and win RL competitions. To ease the engineering efforts, we distill descriptions of tricks from state-of-the-art results and study how well these tricks can improve a standard deep Q-learning agent. The long-term goal of this work is to enable combining proven RL methods with domain-specific tricks by providing a unified software framework and accompanying insights in multiple domains.Comment: To appear in IEEE Conference on Games 2021. Experiment code is available at https://github.com/Miffyli/rl-human-prior-trick

arXiv.org e-Print Archive

Scaling Imitation Learning in Minecraft

Author: Amiranashvili Artemij
Brox Thomas
Burgard Wolfram
Dorka Nicolai
Koltun Vladlen
Publication venue
Publication date: 06/07/2020
Field of study

Imitation learning is a powerful family of techniques for learning sensorimotor coordination in immersive environments. We apply imitation learning to attain state-of-the-art performance on hard exploration problems in the Minecraft environment. We report experiments that highlight the influence of network architecture, loss function, and data augmentation. An early version of our approach reached second place in the MineRL competition at NeurIPS 2019. Here we report stronger results that can be used as a starting point for future competition entries and related research. Our code is available at https://github.com/amiranas/minerl_imitation_learning

arXiv.org e-Print Archive

The MineRL BASALT Competition on Learning from Human Feedback

Author: Abbeel Pieter
Alex Neel
Dragan Anca
Guss William
Houghton Brandon
Kanervisto Anssi
Milani Stephanie
Mohanty Sharada
Russell Stuart
Shah Rohin
Topin Nicholay
Wang Steven H.
Wild Cody
Publication venue
Publication date: 05/07/2021
Field of study

The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve. The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations. Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem.Comment: NeurIPS 2021 Competition Trac

arXiv.org e-Print Archive