8,019 research outputs found
Deep Object-Centric Representations for Generalizable Robot Learning
Robotic manipulation in complex open-world scenarios requires both reliable
physical manipulation skills and effective and generalizable perception. In
this paper, we propose a method where general purpose pretrained visual models
serve as an object-centric prior for the perception system of a learned policy.
We devise an object-level attentional mechanism that can be used to determine
relevant objects from a few trajectories or demonstrations, and then
immediately incorporate those objects into a learned policy. A task-independent
meta-attention locates possible objects in the scene, and a task-specific
attention identifies which objects are predictive of the trajectories. The
scope of the task-specific attention is easily adjusted by showing
demonstrations with distractor objects or with diverse relevant objects. Our
results indicate that this approach exhibits good generalization across object
instances using very few samples, and can be used to learn a variety of
manipulation tasks using reinforcement learning
Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation
Manipulation of deformable objects, such as ropes and cloth, is an important
but challenging problem in robotics. We present a learning-based system where a
robot takes as input a sequence of images of a human manipulating a rope from
an initial to goal configuration, and outputs a sequence of actions that can
reproduce the human demonstration, using only monocular images as input. To
perform this task, the robot learns a pixel-level inverse dynamics model of
rope manipulation directly from images in a self-supervised manner, using about
60K interactions with the rope collected autonomously by the robot. The human
demonstration provides a high-level plan of what to do and the low-level
inverse model is used to execute the plan. We show that by combining the high
and low-level plans, the robot can successfully manipulate a rope into a
variety of target shapes using only a sequence of human-provided images for
direction.Comment: 8 pages, accepted to International Conference on Robotics and
Automation (ICRA) 201
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and
Automation 201
K-VIL: Keypoints-based Visual Imitation Learning
Visual imitation learning provides efficient and intuitive solutions for
robotic systems to acquire novel manipulation skills. However, simultaneously
learning geometric task constraints and control policies from visual inputs
alone remains a challenging problem. In this paper, we propose an approach for
keypoint-based visual imitation (K-VIL) that automatically extracts sparse,
object-centric, and embodiment-independent task representations from a small
number of human demonstration videos. The task representation is composed of
keypoint-based geometric constraints on principal manifolds, their associated
local frames, and the movement primitives that are then needed for the task
execution. Our approach is capable of extracting such task representations from
a single demonstration video, and of incrementally updating them when new
demonstrations become available. To reproduce manipulation skills using the
learned set of prioritized geometric constraints in novel scenes, we introduce
a novel keypoint-based admittance controller. We evaluate our approach in
several real-world applications, showcasing its ability to deal with cluttered
scenes, new instances of categorical objects, and large object pose and shape
variations, as well as its efficiency and robustness in both one-shot and
few-shot imitation learning settings. Videos and source code are available at
https://sites.google.com/view/k-vil
- …