63 research outputs found
The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors
Though deep reinforcement learning has led to breakthroughs in many difficult
domains, these successes have required an ever-increasing number of samples. As
state-of-the-art reinforcement learning (RL) systems require an exponentially
increasing number of samples, their development is restricted to a continually
shrinking segment of the AI community. Likewise, many of these systems cannot
be applied to real-world problems, where environment samples are expensive.
Resolution of these limitations requires new, sample-efficient methods. To
facilitate research in this direction, we introduce the MineRL Competition on
Sample Efficient Reinforcement Learning using Human Priors.
The primary goal of the competition is to foster the development of
algorithms which can efficiently leverage human demonstrations to drastically
reduce the number of samples needed to solve complex, hierarchical, and sparse
environments. To that end, we introduce: (1) the Minecraft ObtainDiamond task,
a sequential decision making environment requiring long-term planning,
hierarchical control, and efficient exploration methods; and (2) the MineRL-v0
dataset, a large-scale collection of over 60 million state-action pairs of
human demonstrations that can be resimulated into embodied trajectories with
arbitrary modifications to game state and visuals.
Participants will compete to develop systems which solve the ObtainDiamond
task with a limited number of samples from the environment simulator, Malmo.
The competition is structured into two rounds in which competitors are provided
several paired versions of the dataset and environment with different game
textures. At the end of each round, competitors will submit containerized
versions of their learning algorithms and they will then be trained/evaluated
from scratch on a hold-out dataset-environment pair for a total of 4-days on a
prespecified hardware platform.Comment: accepted at NeurIPS 2019, 28 page
Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning
To facilitate research in the direction of sample efficient reinforcement
learning, we held the MineRL Competition on Sample Efficient Reinforcement
Learning Using Human Priors at the Thirty-third Conference on Neural
Information Processing Systems (NeurIPS 2019). The primary goal of this
competition was to promote the development of algorithms that use human
demonstrations alongside reinforcement learning to reduce the number of samples
needed to solve complex, hierarchical, and sparse environments. We describe the
competition, outlining the primary challenge, the competition design, and the
resources that we provided to the participants. We provide an overview of the
top solutions, each of which use deep reinforcement learning and/or imitation
learning. We also discuss the impact of our organizational decisions on the
competition and future directions for improvement.Comment: To appear in Proceedings of Machine Learning Research: NeurIPS 2019
Competition & Demonstration Track Postproceedings. 12 pages, 2 figure
Guaranteeing Reproducibility in Deep Learning Competitions
To encourage the development of methods with reproducible and robust training
behavior, we propose a challenge paradigm where competitors are evaluated
directly on the performance of their learning procedures rather than
pre-trained agents. Since competition organizers re-train proposed methods in a
controlled setting they can guarantee reproducibility, and -- by retraining
submissions using a held-out test set -- help ensure generalization past the
environments on which they were trained.Comment: Accepted as a poster presentation to the 2019 NeruIPS Challenges in
Machine Learning workshop (CiML
Playing Minecraft with Behavioural Cloning
MineRL 2019 competition challenged participants to train sample-efficient
agents to play Minecraft, by using a dataset of human gameplay and a limit
number of steps the environment. We approached this task with behavioural
cloning by predicting what actions human players would take, and reached fifth
place in the final ranking. Despite being a simple algorithm, we observed the
performance of such an approach can vary significantly, based on when the
training is stopped. In this paper, we detail our submission to the
competition, run further experiments to study how performance varied over
training and study how different engineering decisions affected these results.Comment: To appear in Post Proceedings of the Competitions & Demonstrations
Track @ NeurIPS2019. Source code available at
https://github.com/Miffyli/minecraft-b
Action Space Shaping in Deep Reinforcement Learning
Reinforcement learning (RL) has been successful in training agents in various
learning environments, including video-games. However, such work modifies and
shrinks the action space from the game's original. This is to avoid trying
"pointless" actions and to ease the implementation. Currently, this is mostly
done based on intuition, with little systematic research supporting the design
decisions. In this work, we aim to gain insight on these action space
modifications by conducting extensive experiments in video-game environments.
Our results show how domain-specific removal of actions and discretization of
continuous actions can be crucial for successful learning. With these insights,
we hope to ease the use of RL in new environments, by clarifying what
action-spaces are easy to learn.Comment: To appear in IEEE Conference on Games 2020. Experiment code is
available at https://github.com/Miffyli/rl-action-space-shapin
Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft
We present Hierarchical Deep Q-Network (HDQfD) that took first place in the
MineRL competition. HDQfD works on imperfect demonstrations and utilizes the
hierarchical structure of expert trajectories. We introduce the procedure of
extracting an effective sequence of meta-actions and subgoals from
demonstration data. We present a structured task-dependent replay buffer and
adaptive prioritizing technique that allow the HDQfD agent to gradually erase
poor-quality expert data from the buffer. In this paper, we present the details
of the HDQfD algorithm and give the experimental results in the Minecraft
domain
Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft
Sample inefficiency of deep reinforcement learning methods is a major
obstacle for their use in real-world applications. In this work, we show how
human demonstrations can improve final performance of agents on the Minecraft
minigame ObtainDiamond with only 8M frames of environment interaction. We
propose a training procedure where policy networks are first trained on human
data and later fine-tuned by reinforcement learning. Using a policy
exploitation mechanism, experience replay and an additional loss against
catastrophic forgetting, our best agent was able to achieve a mean score of 48.
Our proposed solution placed 3rd in the NeurIPS MineRL Competition for
Sample-Efficient Reinforcement Learning.Comment: 10 pages, 2 figure
Distilling Reinforcement Learning Tricks for Video Games
Reinforcement learning (RL) research focuses on general solutions that can be
applied across different domains. This results in methods that RL practitioners
can use in almost any domain. However, recent studies often lack the
engineering steps ("tricks") which may be needed to effectively use RL, such as
reward shaping, curriculum learning, and splitting a large task into smaller
chunks. Such tricks are common, if not necessary, to achieve state-of-the-art
results and win RL competitions. To ease the engineering efforts, we distill
descriptions of tricks from state-of-the-art results and study how well these
tricks can improve a standard deep Q-learning agent. The long-term goal of this
work is to enable combining proven RL methods with domain-specific tricks by
providing a unified software framework and accompanying insights in multiple
domains.Comment: To appear in IEEE Conference on Games 2021. Experiment code is
available at https://github.com/Miffyli/rl-human-prior-trick
Scaling Imitation Learning in Minecraft
Imitation learning is a powerful family of techniques for learning
sensorimotor coordination in immersive environments. We apply imitation
learning to attain state-of-the-art performance on hard exploration problems in
the Minecraft environment. We report experiments that highlight the influence
of network architecture, loss function, and data augmentation. An early version
of our approach reached second place in the MineRL competition at NeurIPS 2019.
Here we report stronger results that can be used as a starting point for future
competition entries and related research. Our code is available at
https://github.com/amiranas/minerl_imitation_learning
The MineRL BASALT Competition on Learning from Human Feedback
The last decade has seen a significant increase of interest in deep learning
research, with many public successes that have demonstrated its potential. As
such, these systems are now being incorporated into commercial products. With
this comes an additional challenge: how can we build AI systems that solve
tasks where there is not a crisp, well-defined specification? While multiple
solutions have been proposed, in this competition we focus on one in
particular: learning from human feedback. Rather than training AI systems using
a predefined reward function or using a labeled dataset with a predefined set
of categories, we instead train the AI system using a learning signal derived
from some form of human feedback, which can evolve over time as the
understanding of the task changes, or as the capabilities of the AI system
improve.
The MineRL BASALT competition aims to spur forward research on this important
class of techniques. We design a suite of four tasks in Minecraft for which we
expect it will be hard to write down hardcoded reward functions. These tasks
are defined by a paragraph of natural language: for example, "create a
waterfall and take a scenic picture of it", with additional clarifying details.
Participants must train a separate agent for each task, using any method they
want. Agents are then evaluated by humans who have read the task description.
To help participants get started, we provide a dataset of human demonstrations
on each of the four tasks, as well as an imitation learning baseline that
leverages these demonstrations.
Our hope is that this competition will improve our ability to build AI
systems that do what their designers intend them to do, even when the intent
cannot be easily formalized. Besides allowing AI to solve more tasks, this can
also enable more effective regulation of AI systems, as well as making progress
on the value alignment problem.Comment: NeurIPS 2021 Competition Trac
- …