67 research outputs found
Robotic manipulation of multiple objects as a POMDP
This paper investigates manipulation of multiple unknown objects in a crowded
environment. Because of incomplete knowledge due to unknown objects and
occlusions in visual observations, object observations are imperfect and action
success is uncertain, making planning challenging. We model the problem as a
partially observable Markov decision process (POMDP), which allows a general
reward based optimization objective and takes uncertainty in temporal evolution
and partial observations into account. In addition to occlusion dependent
observation and action success probabilities, our POMDP model also
automatically adapts object specific action success probabilities. To cope with
the changing system dynamics and performance constraints, we present a new
online POMDP method based on particle filtering that produces compact policies.
The approach is validated both in simulation and in physical experiments in a
scenario of moving dirty dishes into a dishwasher. The results indicate that:
1) a greedy heuristic manipulation approach is not sufficient, multi-object
manipulation requires multi-step POMDP planning, and 2) on-line planning is
beneficial since it allows the adaptation of the system dynamics model based on
actual experience
Prioritized offline Goal-swapping Experience Replay
In goal-conditioned offline reinforcement learning, an agent learns from
previously collected data to go to an arbitrary goal. Since the offline data
only contains a finite number of trajectories, a main challenge is how to
generate more data. Goal-swapping generates additional data by switching
trajectory goals but while doing so produces a large number of invalid
trajectories. To address this issue, we propose prioritized goal-swapping
experience replay (PGSER). PGSER uses a pre-trained Q function to assign higher
priority weights to goal swapped transitions that allow reaching the goal. In
experiments, PGSER significantly improves over baselines in a wide range of
benchmark tasks, including challenging previously unsuccessful dexterous
in-hand manipulation tasks
Hybrid Search for Efficient Planning with Completeness Guarantees
Solving complex planning problems has been a long-standing challenge in
computer science. Learning-based subgoal search methods have shown promise in
tackling these problems, but they often suffer from a lack of completeness
guarantees, meaning that they may fail to find a solution even if one exists.
In this paper, we propose an efficient approach to augment a subgoal search
method to achieve completeness in discrete action spaces. Specifically, we
augment the high-level search with low-level actions to execute a multi-level
(hybrid) search, which we call complete subgoal search. This solution achieves
the best of both worlds: the practical efficiency of high-level search and the
completeness of low-level search. We apply the proposed search method to a
recently proposed subgoal search algorithm and evaluate the algorithm trained
on offline data on complex planning problems. We demonstrate that our complete
subgoal search not only guarantees completeness but can even improve
performance in terms of search expansions for instances that the high-level
could solve without low-level augmentations. Our approach makes it possible to
apply subgoal-level planning for systems where completeness is a critical
requirement.Comment: NeurIPS 2023 Poste
Hierarchical Imitation Learning with Vector Quantized Models
The ability to plan actions on multiple levels of abstraction enables
intelligent agents to solve complex tasks effectively. However, learning the
models for both low and high-level planning from demonstrations has proven
challenging, especially with higher-dimensional inputs. To address this issue,
we propose to use reinforcement learning to identify subgoals in expert
trajectories by associating the magnitude of the rewards with the
predictability of low-level actions given the state and the chosen subgoal. We
build a vector-quantized generative model for the identified subgoals to
perform subgoal-level planning. In experiments, the algorithm excels at solving
complex, long-horizon decision-making problems outperforming state-of-the-art.
Because of its ability to plan, our algorithm can find better trajectories than
the ones in the training setComment: To appear at ICML 202
Probabilistic approach to physical object disentangling
Physically disentangling entangled objects from each other is a problem
encountered in waste segregation or in any task that requires disassembly of
structures. Often there are no object models, and, especially with cluttered
irregularly shaped objects, the robot can not create a model of the scene due
to occlusion. One of our key insights is that based on previous sensory input
we are only interested in moving an object out of the disentanglement around
obstacles. That is, we only need to know where the robot can successfully move
in order to plan the disentangling. Due to the uncertainty we integrate
information about blocked movements into a probability map. The map defines the
probability of the robot successfully moving to a specific configuration. Using
as cost the failure probability of a sequence of movements we can then plan and
execute disentangling iteratively. Since our approach circumvents only
previously encountered obstacles, new movements will yield information about
unknown obstacles that block movement until the robot has learned to circumvent
all obstacles and disentangling succeeds. In the experiments, we use a special
probabilistic version of the Rapidly exploring Random Tree (RRT) algorithm for
planning and demonstrate successful disentanglement of objects both in 2-D and
3-D simulation, and, on a KUKA LBR 7-DOF robot. Moreover, our approach
outperforms baseline methods
Backpropagation Through Agents
A fundamental challenge in multi-agent reinforcement learning (MARL) is to
learn the joint policy in an extremely large search space, which grows
exponentially with the number of agents. Moreover, fully decentralized policy
factorization significantly restricts the search space, which may lead to
sub-optimal policies. In contrast, the auto-regressive joint policy can
represent a much richer class of joint policies by factorizing the joint policy
into the product of a series of conditional individual policies. While such
factorization introduces the action dependency among agents explicitly in
sequential execution, it does not take full advantage of the dependency during
learning. In particular, the subsequent agents do not give the preceding agents
feedback about their decisions. In this paper, we propose a new framework
Back-Propagation Through Agents (BPTA) that directly accounts for both agents'
own policy updates and the learning of their dependent counterparts. This is
achieved by propagating the feedback through action chains. With the proposed
framework, our Bidirectional Proximal Policy Optimisation (BPPO) outperforms
the state-of-the-art methods. Extensive experiments on matrix games,
StarCraftII v2, Multi-agent MuJoCo, and Google Research Football demonstrate
the effectiveness of the proposed method
- …