96,514 research outputs found
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
Offline goal-conditioned RL (GCRL) offers a feasible paradigm to learn
general-purpose policies from diverse and multi-task offline datasets. Despite
notable recent progress, the predominant offline GCRL methods have been
restricted to model-free approaches, constraining their capacity to tackle
limited data budgets and unseen goal generalization. In this work, we propose a
novel two-stage model-based framework, Goal-conditioned Offline Planning
(GOPlan), including (1) pretraining a prior policy capable of capturing
multi-modal action distribution within the multi-goal dataset; (2) employing
the reanalysis method with planning to generate imagined trajectories for
funetuning policies. Specifically, the prior policy is based on an
advantage-weighted Conditioned Generative Adversarial Networks that exhibits
distinct mode separation to overcome the pitfalls of out-of-distribution (OOD)
actions. For further policy optimization, the reanalysis method generates
high-quality imaginary data by planning with learned models for both
intra-trajectory and inter-trajectory goals. Through experimental evaluations,
we demonstrate that GOPlan achieves state-of-the-art performance on various
offline multi-goal manipulation tasks. Moreover, our results highlight the
superior ability of GOPlan to handle small data budgets and generalize to OOD
goals.Comment: Spotlight Presentation at Goal-conditioned Reinforcement Learning
Workshop at NeurIPS, 202
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space
General-purpose robots require diverse repertoires of behaviors to complete
challenging tasks in real-world unstructured environments. To address this
issue, goal-conditioned reinforcement learning aims to acquire policies that
can reach configurable goals for a wide range of tasks on command. However,
such goal-conditioned policies are notoriously difficult and time-consuming to
train from scratch. In this paper, we propose Planning to Practice (PTP), a
method that makes it practical to train goal-conditioned policies for
long-horizon tasks that require multiple distinct types of interactions to
solve. Our approach is based on two key ideas. First, we decompose the
goal-reaching problem hierarchically, with a high-level planner that sets
intermediate subgoals using conditional subgoal generators in the latent space
for a low-level model-free policy. Second, we propose a hybrid approach which
first pre-trains both the conditional subgoal generator and the policy on
previously collected data through offline reinforcement learning, and then
fine-tunes the policy via online exploration. This fine-tuning process is
itself facilitated by the planned subgoals, which breaks down the original
target task into short-horizon goal-reaching tasks that are significantly
easier to learn. We conduct experiments in both the simulation and real world,
in which the policy is pre-trained on demonstrations of short primitive
behaviors and fine-tuned for temporally extended tasks that are unseen in the
offline data. Our experimental results show that PTP can generate feasible
sequences of subgoals that enable the policy to efficiently solve the target
tasks
ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes
Understanding the continuous states of objects is essential for task learning
and planning in the real world. However, most existing task learning benchmarks
assume discrete(e.g., binary) object goal states, which poses challenges for
the learning of complex tasks and transferring learned policy from simulated
environments to the real world. Furthermore, state discretization limits a
robot's ability to follow human instructions based on the grounding of actions
and states. To tackle these challenges, we present ARNOLD, a benchmark that
evaluates language-grounded task learning with continuous states in realistic
3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve
understanding object states and learning policies for continuous goals. To
promote language-instructed learning, we provide expert demonstrations with
template-generated language descriptions. We assess task performance by
utilizing the latest language-conditioned policy learning models. Our results
indicate that current models for language-conditioned manipulations continue to
experience significant challenges in novel goal-state generalizations, scene
generalizations, and object generalizations. These findings highlight the need
to develop new algorithms that address this gap and underscore the potential
for further research in this area. See our project page at:
https://arnold-benchmark.github.ioComment: The first two authors contributed equally; 20 pages; 17 figures;
project availalbe: https://arnold-benchmark.github.io
- …