141 research outputs found
The NetHack learning environment
Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source and available at https://github.com/facebookresearch/nle
Insights from the NeurIPS 2021 NetHack Challenge
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., ‘ascend’ in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack’s suitability as a long-term benchmark for AI research
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors
Large language models (LLMs) are being applied as actors for sequential
decision making tasks in domains such as robotics and games, utilizing their
general world knowledge and planning abilities. However, previous work does
little to explore what environment state information is provided to LLM actors
via language. Exhaustively describing high-dimensional states can impair
performance and raise inference costs for LLM actors. Previous LLM actors avoid
the issue by relying on hand-engineered, task-specific protocols to determine
which features to communicate about a state and which to leave out. In this
work, we propose Brief Language INputs for DEcision-making Responses (BLINDER),
a method for automatically selecting concise state descriptions by learning a
value function for task-conditioned state descriptions. We evaluate BLINDER on
the challenging video game NetHack and a robotic manipulation task. Our method
improves task success rate, reduces input size and compute costs, and
generalizes between LLM actors
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents
Large Language Models (LLMs) have shown great success as high-level planners
for zero-shot game-playing agents. However, these agents are primarily
evaluated on Minecraft, where long-term planning is relatively straightforward.
In contrast, agents tested in dynamic robot environments face limitations due
to simplistic environments with only a few objects and interactions. To fill
this gap in the literature, we present NetPlay, the first LLM-powered zero-shot
agent for the challenging roguelike NetHack. NetHack is a particularly
challenging environment due to its diverse set of items and monsters, complex
interactions, and many ways to die.
NetPlay uses an architecture designed for dynamic robot environments,
modified for NetHack. Like previous approaches, it prompts the LLM to choose
from predefined skills and tracks past interactions to enhance decision-making.
Given NetHack's unpredictable nature, NetPlay detects important game events to
interrupt running skills, enabling it to react to unforeseen circumstances.
While NetPlay demonstrates considerable flexibility and proficiency in
interacting with NetHack's mechanics, it struggles with ambiguous task
descriptions and a lack of explicit feedback. Our findings demonstrate that
NetPlay performs best with detailed context information, indicating the
necessity for dynamic methods in supplying context information for complex
games such as NetHack
NetHack is Hard to Hack
Neural policy learning methods have achieved remarkable results in various
control problems, ranging from Atari games to simulated locomotion. However,
these methods struggle in long-horizon tasks, especially in open-ended
environments with multi-modal observations, such as the popular dungeon-crawler
game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that
symbolic agents outperformed neural approaches by over four times in median
game score. In this paper, we delve into the reasons behind this performance
gap and present an extensive study on neural policy learning for NetHack. To
conduct this study, we analyze the winning symbolic agent, extending its
codebase to track internal strategy selection in order to generate one of the
largest available demonstration datasets. Utilizing this dataset, we examine
(i) the advantages of an action hierarchy; (ii) enhancements in neural
architecture; and (iii) the integration of reinforcement learning with
imitation learning. Our investigations produce a state-of-the-art neural agent
that surpasses previous fully neural policies by 127% in offline settings and
25% in online settings on median game score. However, we also demonstrate that
mere scaling is insufficient to bridge the performance gap with the best
symbolic models or even the top human players
Dungeons and Data: A Large-Scale NetHack Dataset
Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go [50], StarCraft [58], or DOTA [3], have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run [23]. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks
Improving Policy Learning via Language Dynamics Distillation
Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, VAE pretraining, and methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modeling with expert demonstrations is more effective than with non-experts
Katakomba: Tools and Benchmarks for Data-Driven NetHack
NetHack is known as the frontier of reinforcement learning research where
learning-based methods still need to catch up to rule-based solutions. One of
the promising directions for a breakthrough is using pre-collected datasets
similar to recent developments in robotics, recommender systems, and more under
the umbrella of offline reinforcement learning (ORL). Recently, a large-scale
NetHack dataset was released; while it was a necessary step forward, it has yet
to gain wide adoption in the ORL community. In this work, we argue that there
are three major obstacles for adoption: resource-wise, implementation-wise, and
benchmark-wise. To address them, we develop an open-source library that
provides workflow fundamentals familiar to the ORL community: pre-defined
D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation
tools with accompanying configs and logs synced to the cloud.Comment: Neural Information Processing Systems (NeurIPS 2023) Track on
Datasets and Benchmarks. Source code at
https://github.com/corl-team/katakomb
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Progress in deep reinforcement learning (RL) is heavily driven by the availability of challenging benchmarks used for training agents. However, benchmarks that are widely adopted by the community are not explicitly designed for evaluating specific capabilities of RL methods. While there exist environments for assessing particular open problems in RL (such as exploration, transfer learning, unsupervised environment design, or even language-assisted RL), it is generally difficult to extend these to richer, more complex environments once research goes beyond proof-of-concept results. We present MiniHack, a powerful sandbox framework for easily designing novel RL environments. MiniHack is a one-stop shop for RL experiments with environments ranging from small rooms to complex, procedurally generated worlds. By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use. With this sandbox framework, novel environments can be designed easily, either using a human-readable description language or a simple Python interface. In addition to a variety of RL tasks and baselines, MiniHack can wrap existing RL benchmarks and provide ways to seamlessly add additional complexity
- …