37 research outputs found
Reinforcement Learning and Planning for Preference Balancing Tasks
Robots are often highly non-linear dynamical systems with many degrees of freedom, making solving motion problems computationally challenging. One solution has been reinforcement learning (RL), which learns through experimentation to automatically perform the near-optimal motions that complete a task. However, high-dimensional problems and task formulation often prove challenging for RL. We address these problems with PrEference Appraisal Reinforcement Learning (PEARL), which solves Preference Balancing Tasks (PBTs). PBTs define a problem as a set of preferences that the system must balance to achieve a goal. The method is appropriate for acceleration-controlled systems with continuous state-space and either discrete or continuous action spaces with unknown system dynamics. We show that PEARL learns a sub-optimal policy on a subset of states and actions, and transfers the policy to the expanded domain to produce a more refined plan on a class of robotic problems. We establish convergence to task goal conditions, and even when preconditions are not verifiable, show that this is a valuable method to use before other more expensive approaches. Evaluation is done on several robotic problems, such as Aerial Cargo Delivery, Multi-Agent Pursuit, Rendezvous, and Inverted Flying Pendulum both in simulation and experimentally. Additionally, PEARL is leveraged outside of robotics as an array sorting agent. The results demonstrate high accuracy and fast learning times on a large set of practical applications
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
We present PRM-RL, a hierarchical method for long-range navigation task
completion that combines sampling based path planning with reinforcement
learning (RL). The RL agents learn short-range, point-to-point navigation
policies that capture robot dynamics and task constraints without knowledge of
the large-scale topology. Next, the sampling-based planners provide roadmaps
which connect robot configurations that can be successfully navigated by the RL
agent. The same RL agents are used to control the robot under the direction of
the planning, enabling long-range navigation. We use the Probabilistic Roadmaps
(PRMs) for the sampling-based planner. The RL agents are constructed using
feature-based and deep neural net policies in continuous state and action
spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation
tasks with non-trivial robot dynamics: end-to-end differential drive indoor
navigation in office environments, and aerial cargo delivery in urban
environments with load displacement constraints. Our results show improvement
in task completion over both RL agents on their own and traditional
sampling-based planners. In the indoor navigation task, PRM-RL successfully
completes up to 215 m long trajectories under noisy sensor conditions, and the
aerial cargo delivery completes flights over 1000 m without violating the task
constraints in an environment 63 million times larger than used in training.Comment: 9 pages, 7 figure
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Language model agents (LMA) recently emerged as a promising paradigm on
muti-step decision making tasks, often outperforming humans and other
reinforcement learning agents. Despite the promise, their performance on
real-world applications that often involve combinations of tasks is still
underexplored. In this work, we introduce a new benchmark, called CompWoB -- 50
new compositional web automation tasks reflecting more realistic assumptions.
We show that while existing prompted LMAs (gpt-3.5-turbo or gpt-4) achieve
94.0% average success rate on base tasks, their performance degrades to 24.9%
success rate on compositional tasks. On the other hand, transferred LMAs
(finetuned only on base tasks) show less generalization gap, dropping from
85.4% to 54.8%. By balancing data distribution across tasks, we train a new
model, HTML-T5++, that surpasses human-level performance (95.2%) on MiniWoB,
and achieves the best zero-shot performance on CompWoB (61.5%). While these
highlight the promise of small-scale finetuned and transferred models for task
compositionality, their performance further degrades under different
instruction compositions changing combinational order. In contrast to the
recent remarkable success of LMA, our benchmark and detailed analysis emphasize
the necessity of building LMAs that are robust and generalizable to task
compositionality for real-world deployment.Comment: Code:
https://github.com/google-research/google-research/tree/master/compositional_rl/compwo