3 research outputs found
Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image
For a robotic grasping task in which diverse unseen target objects exist in a
cluttered environment, some deep learning-based methods have achieved
state-of-the-art results using visual input directly. In contrast, actor-critic
deep reinforcement learning (RL) methods typically perform very poorly when
grasping diverse objects, especially when learning from raw images and sparse
rewards. To make these RL techniques feasible for vision-based grasping tasks,
we employ state representation learning (SRL), where we encode essential
information first for subsequent use in RL. However, typical representation
learning procedures are unsuitable for extracting pertinent information for
learning the grasping skill, because the visual inputs for representation
learning, where a robot attempts to grasp a target object in clutter, are
extremely complex. We found that preprocessing based on the disentanglement of
a raw input image is the key to effectively capturing a compact representation.
This enables deep RL to learn robotic grasping skills from highly varied and
diverse visual inputs. We demonstrate the effectiveness of this approach with
varying levels of disentanglement in a realistic simulated environment
Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter
When searching for objects in cluttered environments, it is often necessary to perform complex interactions in order to move occluding objects out of the way and fully reveal the object of interest and make it graspable. Due to the complexity of the physics involved and the lack of accurate models of the clutter, planning and controlling precise predefined interactions with accurate outcome is extremely hard, when not impossible. In problems where accurate (forward) models are lacking, Deep Reinforcement Learning (RL) has shown to be a viable solution to map observations (e.g. images) to good interactions in the form of close-loop visuomotor policies. However, Deep RL is sample inefficient and fails when applied directly to the problem of unoccluding objects based on images. In this work we present a novel Deep RL procedure that combines i) teacher-aided exploration, ii) a critic with privileged information, and iii) mid-level representations, resulting in sample efficient and effective learning for the problem of uncovering a target object occluded by a heap of unknown objects. Our experiments show that our approach trains faster and converges to more efficient uncovering solutions than baselines and ablations, and that our uncovering policies lead to an average improvement in the graspability of the target object, facilitating downstream retrieval applications
Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter
When searching for objects in cluttered environments, it is often necessary
to perform complex interactions in order to move occluding objects out of the
way and fully reveal the object of interest and make it graspable. Due to the
complexity of the physics involved and the lack of accurate models of the
clutter, planning and controlling precise predefined interactions with accurate
outcome is extremely hard, when not impossible. In problems where accurate
(forward) models are lacking, Deep Reinforcement Learning (RL) has shown to be
a viable solution to map observations (e.g. images) to good interactions in the
form of close-loop visuomotor policies. However, Deep RL is sample inefficient
and fails when applied directly to the problem of unoccluding objects based on
images. In this work we present a novel Deep RL procedure that combines i)
teacher-aided exploration, ii) a critic with privileged information, and iii)
mid-level representations, resulting in sample efficient and effective learning
for the problem of uncovering a target object occluded by a heap of unknown
objects. Our experiments show that our approach trains faster and converges to
more efficient uncovering solutions than baselines and ablations, and that our
uncovering policies lead to an average improvement in the graspability of the
target object, facilitating downstream retrieval applications