5 research outputs found
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Pre-training on Internet data has proven to be a key ingredient for broad
generalization in many modern ML systems. What would it take to enable such
capabilities in robotic reinforcement learning (RL)? Offline RL methods, which
learn from datasets of robot experience, offer one way to leverage prior data
into the robotic learning pipeline. However, these methods have a "type
mismatch" with video data (such as Ego4D), the largest prior datasets available
for robotics, since video offers observation-only experience without the action
or reward annotations needed for RL methods. In this paper, we develop a system
for leveraging large-scale human video datasets in robotic offline RL, based
entirely on learning value functions via temporal-difference learning. We show
that value learning on video datasets learns representations that are more
conducive to downstream robotic offline RL than other approaches for learning
from video data. Our system, called V-PTR, combines the benefits of
pre-training on video data with robotic offline RL approaches that train on
diverse robot data, resulting in value functions and policies for manipulation
tasks that perform better, act robustly, and generalize broadly. On several
manipulation tasks on a real WidowX robot, our framework produces policies that
greatly improve over prior methods. Our video and additional details can be
found at https://dibyaghosh.com/vptr/Comment: First three authors contributed equall
Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information
Learning to control an agent from data collected offline in a rich
pixel-based visual observation space is vital for real-world applications of
reinforcement learning (RL). A major challenge in this setting is the presence
of input information that is hard to model and irrelevant to controlling the
agent. This problem has been approached by the theoretical RL community through
the lens of exogenous information, i.e, any control-irrelevant information
contained in observations. For example, a robot navigating in busy streets
needs to ignore irrelevant information, such as other people walking in the
background, textures of objects, or birds in the sky. In this paper, we focus
on the setting with visually detailed exogenous information, and introduce new
offline RL benchmarks offering the ability to study this problem. We find that
contemporary representation learning techniques can fail on datasets where the
noise is a complex and time dependent process, which is prevalent in practical
applications. To address these, we propose to use multi-step inverse models,
which have seen a great deal of interest in the RL theory community, to learn
Agent-Controller Representations for Offline-RL (ACRO). Despite being simple
and requiring no reward, we show theoretically and empirically that the
representation created by this objective greatly outperforms baselines.Comment: ICML 202