12 research outputs found
Rapid Exploration for Open-World Navigation with Latent Goal Models
We describe a robotic learning system for autonomous exploration and
navigation in diverse, open-world environments. At the core of our method is a
learned latent variable model of distances and actions, along with a
non-parametric topological memory of images. We use an information bottleneck
to regularize the learned policy, giving us (i) a compact visual representation
of goals, (ii) improved generalization capabilities, and (iii) a mechanism for
sampling feasible goals for exploration. Trained on a large offline dataset of
prior experience, the model acquires a representation of visual goals that is
robust to task-irrelevant distractors. We demonstrate our method on a mobile
ground robot in open-world exploration scenarios. Given an image of a goal that
is up to 80 meters away, our method leverages its representation to explore and
discover the goal in under 20 minutes, even amidst previously-unseen obstacles
and weather conditions. Please check out the project website for videos of our
experiments and information about the real-world dataset used at
https://sites.google.com/view/recon-robot.Comment: Accepted for presentation at 5th Annual Conference on Robot Learning
(CoRL 2021), London, UK as an Oral Talk. Project page and dataset release at
https://sites.google.com/view/recon-robo
Language Decision Transformers with Exponential Tilt for Interactive Text Environments
Text-based game environments are challenging because agents must deal with
long sequences of text, execute compositional actions using text and learn from
sparse rewards. We address these challenges by proposing Language Decision
Transformers (LDTs), a framework that is based on transformer language models
and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1)
exponential tilt to guide the agent towards high obtainable goals, (2) novel
goal conditioning methods yielding better results than the traditional
return-to-go (sum of all future rewards), and (3) a model of future
observations that improves agent performance. LDTs are the first to address
offline RL with DTs on these challenging games. Our experiments show that LDTs
achieve the highest scores among many different types of agents on some of the
most challenging Jericho games, such as Enchanter.Comment: 19 pages, 6 figures, 5 table
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning
Recent work has demonstrated the effectiveness of formulating decision making
as a supervised learning problem on offline-collected trajectories. However,
the benefits of performing sequence modeling on trajectory data is not yet
clear. In this work we investigate if sequence modeling has the capability to
condense trajectories into useful representations that can contribute to policy
learning. To achieve this, we adopt a two-stage framework that first summarizes
trajectories with sequence modeling techniques, and then employs these
representations to learn a policy along with a desired goal. This design allows
many existing supervised offline RL methods to be considered as specific
instances of our framework. Within this framework, we introduce
Goal-Conditioned Predicitve Coding (GCPC), an approach that brings powerful
trajectory representations and leads to performant policies. We conduct
extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion
environments, and observe that sequence modeling has a significant impact on
some decision making tasks. In addition, we demonstrate that GCPC learns a
goal-conditioned latent representation about the future, which serves as an
"implicit planner", and enables competitive performance on all three
benchmarks