7,091 research outputs found
One-Shot Observation Learning Using Visual Activity Features
Observation learning is the process of learning a task by observing an expert
demonstrator. Our principal contribution is a one-shot learning method for
robot manipulation tasks in which only a single demonstration is required. The
key idea is to encode the demonstration in an activity space defined as part of
a previously trained activity classifier. The distance between this encoding
and equivalent encodings from trials of a robot performing the same task
provides a reward function supporting iterative learning of task completion by
the robotic manipulator. We use reinforcement learning for experiments with a
simulated robotic manipulator, and stochastic trajectory optimisation for
experiments with a real robotic manipulator. We show that the proposed method
can be used to learn tasks from a single demonstration under varying viewpoint
of observation, object properties, scene background and morphology of the
manipulator. Videos of all results, including demonstrations, can be found on:
https://tinyurl.com/s2l-stage
High-level Reasoning and Low-level Learning for Grasping: A Probabilistic Logic Pipeline
While grasps must satisfy the grasping stability criteria, good grasps depend
on the specific manipulation scenario: the object, its properties and
functionalities, as well as the task and grasp constraints. In this paper, we
consider such information for robot grasping by leveraging manifolds and
symbolic object parts. Specifically, we introduce a new probabilistic logic
module to first semantically reason about pre-grasp configurations with respect
to the intended tasks. Further, a mapping is learned from part-related visual
features to good grasping points. The probabilistic logic module makes use of
object-task affordances and object/task ontologies to encode rules that
generalize over similar object parts and object/task categories. The use of
probabilistic logic for task-dependent grasping contrasts with current
approaches that usually learn direct mappings from visual perceptions to
task-dependent grasping points. We show the benefits of the full probabilistic
logic pipeline experimentally and on a real robot
Inferring 3D Shapes of Unknown Rigid Objects in Clutter through Inverse Physics Reasoning
We present a probabilistic approach for building, on the fly, 3-D models of
unknown objects while being manipulated by a robot. We specifically consider
manipulation tasks in piles of clutter that contain previously unseen objects.
Most manipulation algorithms for performing such tasks require known geometric
models of the objects in order to grasp or rearrange them robustly. One of the
novel aspects of this work is the utilization of a physics engine for verifying
hypothesized geometries in simulation. The evidence provided by physics
simulations is used in a probabilistic framework that accounts for the fact
that mechanical properties of the objects are uncertain. We present an
efficient algorithm for inferring occluded parts of objects based on their
observed motions and mutual interactions. Experiments using a robot show that
this approach is efficient for constructing physically realistic 3-D models,
which can be useful for manipulation planning. Experiments also show that the
proposed approach significantly outperforms alternative approaches in terms of
shape accuracy
Annotation Scaffolds for Object Modeling and Manipulation
We present and evaluate an approach for human-in-the-loop specification of
shape reconstruction with annotations for basic robot-object interactions. Our
method is based on the idea of model annotation: the addition of simple cues to
an underlying object model to specify shape and delineate a simple task. The
goal is to explore reducing the complexity of CAD-like interfaces so that
novice users can quickly recover an object's shape and describe a manipulation
task that is then carried out by a robot. The object modeling and interaction
annotation capabilities are tested with a user study and compared against
results obtained using existing approaches. The approach has been analyzed
using a variety of shape comparison, grasping, and manipulation metrics, and
tested with the PR2 robot platform, where it was shown to be successful.Comment: 31 pages, 46 Figure
PointNetGPD: Detecting Grasp Configurations from Point Sets
In this paper, we propose an end-to-end grasp evaluation model to address the
challenging problem of localizing robot grasp configurations directly from the
point cloud. Compared to recent grasp evaluation metrics that are based on
handcrafted depth features and a convolutional neural network (CNN), our
proposed PointNetGPD is lightweight and can directly process the 3D point cloud
that locates within the gripper for grasp evaluation. Taking the raw point
cloud as input, our proposed grasp evaluation network can capture the complex
geometric structure of the contact area between the gripper and the object even
if the point cloud is very sparse. To further improve our proposed model, we
generate a larger-scale grasp dataset with 350k real point cloud and grasps
with the YCB object set for training. The performance of the proposed model is
quantitatively measured both in simulation and on robotic hardware. Experiments
on object grasping and clutter removal show that our proposed model generalizes
well to novel objects and outperforms state-of-the-art methods. Code and video
are available at
\href{https://lianghongzhuo.github.io/PointNetGPD}{https://lianghongzhuo.github.io/PointNetGPD}Comment: Accepted to ICRA 2019. Hongzhuo Liang and Xiaojian Ma contributed
equally to this wor
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
Skilled robotic manipulation benefits from complex synergies between
non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing
can help rearrange cluttered objects to make space for arms and fingers;
likewise, grasping can help displace objects to make pushing movements more
precise and collision-free. In this work, we demonstrate that it is possible to
discover and learn these synergies from scratch through model-free deep
reinforcement learning. Our method involves training two fully convolutional
networks that map from visual observations to actions: one infers the utility
of pushes for a dense pixel-wise sampling of end effector orientations and
locations, while the other does the same for grasping. Both networks are
trained jointly in a Q-learning framework and are entirely self-supervised by
trial and error, where rewards are provided from successful grasps. In this
way, our policy learns pushing motions that enable future grasps, while
learning grasps that can leverage past pushes. During picking experiments in
both simulation and real-world scenarios, we find that our system quickly
learns complex behaviors amid challenging cases of clutter, and achieves better
grasping success rates and picking efficiencies than baseline alternatives
after only a few hours of training. We further demonstrate that our method is
capable of generalizing to novel objects. Qualitative results (videos), code,
pre-trained models, and simulation environments are available at
http://vpg.cs.princeton.eduComment: To appear at the International Conference On Intelligent Robots and
Systems (IROS) 2018. Project webpage: http://vpg.cs.princeton.edu Summary
video: https://youtu.be/-OkyX7Zlhi
Support Relation Analysis for Objects in Multiple View RGB-D Images
Understanding physical relations between objects, especially their support
relations, is crucial for robotic manipulation. There has been work on
reasoning about support relations and structural stability of simple
configurations in RGB-D images. In this paper, we propose a method for
extracting more detailed physical knowledge from a set of RGB-D images taken
from the same scene but from different views using qualitative reasoning and
intuitive physical models. Rather than providing a simple contact relation
graph and approximating stability over convex shapes, our method is able to
provide a detailed supporting relation analysis based on a volumetric
representation. Specifically, true supporting relations between objects (e.g.,
if an object supports another object by touching it on the side or if the
object above contributes to the stability of the object below) are identified.
We apply our method to real-world structures captured in warehouse scenarios
and show our method works as desired
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
Contact-rich manipulation tasks in unstructured environments often require
both haptic and visual feedback. However, it is non-trivial to manually design
a robot controller that combines modalities with very different
characteristics. While deep reinforcement learning has shown success in
learning control policies for high-dimensional inputs, these algorithms are
generally intractable to deploy on real robots due to sample complexity. We use
self-supervision to learn a compact and multimodal representation of our
sensory inputs, which can then be used to improve the sample efficiency of our
policy learning. We evaluate our method on a peg insertion task, generalizing
over different geometry, configurations, and clearances, while being robust to
external perturbations. Results for simulated and real robot experiments are
presented.Comment: ICRA 201
Improved Adversarial Systems for 3D Object Generation and Reconstruction
This paper describes a new approach for training generative adversarial
networks (GAN) to understand the detailed 3D shape of objects. While GANs have
been used in this domain previously, they are notoriously hard to train,
especially for the complex joint data distribution over 3D objects of many
categories and orientations. Our method extends previous work by employing the
Wasserstein distance normalized with gradient penalization as a training
objective. This enables improved generation from the joint object shape
distribution. Our system can also reconstruct 3D shape from 2D images and
perform shape completion from occluded 2.5D range scans. We achieve notable
quantitative improvements in comparison to existing baselinesComment: 10 pages, accepted at CORL. Figures are best view in color, and
details only appear when zoomed i
RevealNet: Seeing Behind Objects in RGB-D Scans
During 3D reconstruction, it is often the case that people cannot scan each
individual object from all views, resulting in missing geometry in the captured
scan. This missing geometry can be fundamentally limiting for many
applications, e.g., a robot needs to know the unseen geometry to perform a
precise grasp on an object. Thus, we introduce the task of semantic instance
completion: from an incomplete RGB-D scan of a scene, we aim to detect the
individual object instances and infer their complete object geometry. This will
open up new possibilities for interactions with objects in a scene, for
instance for virtual or robotic agents. We tackle this problem by introducing
RevealNet, a new data-driven approach that jointly detects object instances and
predicts their complete geometry. This enables a semantically meaningful
decomposition of a scanned scene into individual, complete 3D objects,
including hidden and unobserved object parts. RevealNet is an end-to-end 3D
neural network architecture that leverages joint color and geometry feature
learning. The fully-convolutional nature of our 3D network enables efficient
inference of semantic instance completion for 3D scans at scale of large indoor
environments in a single forward pass. We show that predicting complete object
geometry improves both 3D detection and instance segmentation performance. We
evaluate on both real and synthetic scan benchmark data for the new task, where
we outperform state-of-the-art approaches by over 15 in [email protected] on ScanNet, and
over 18 in [email protected] on SUNCG.Comment: CVPR 202
- …