9 research outputs found
Modeling Grasp Type Improves Learning-Based Grasp Planning
Different manipulation tasks require different types of grasps. For example,
holding a heavy tool like a hammer requires a multi-fingered power grasp
offering stability, while holding a pen to write requires a multi-fingered
precision grasp to impart dexterity on the object. In this paper, we propose a
probabilistic grasp planner that explicitly models grasp type for planning
high-quality precision and power grasps in real-time. We take a learning
approach in order to plan grasps of different types for previously unseen
objects when only partial visual information is available. Our work
demonstrates the first supervised learning approach to grasp planning that can
explicitly plan both power and precision grasps for a given object.
Additionally, we compare our learned grasp model with a model that does not
encode type and show that modeling grasp type improves the success rate of
generated grasps. Furthermore we show the benefit of learning a prior over
grasp configurations to improve grasp inference with a learned classifier
Planning Multi-Fingered Grasps as Probabilistic Inference in a Learned Deep Network
We propose a novel approach to multi-fingered grasp planning leveraging
learned deep neural network models. We train a convolutional neural network to
predict grasp success as a function of both visual information of an object and
grasp configuration. We can then formulate grasp planning as inferring the
grasp configuration which maximizes the probability of grasp success. We
efficiently perform this inference using a gradient-ascent optimization inside
the neural network using the backpropagation algorithm. Our work is the first
to directly plan high quality multifingered grasps in configuration space using
a deep neural network without the need of an external planner. We validate our
inference method performing both multifinger and two-finger grasps on real
robots. Our experimental results show that our planning method outperforms
existing planning methods for neural networks; while offering several other
benefits including being data-efficient in learning and fast enough to be
deployed in real robotic applications.Comment: International Symposium on Robotics Research (ISRR) 2017. Project
page: https://robot-learning.cs.utah.edu/project/grasp_inference . Video
link: https://youtu.be/7Sg1uw_szl
Increasing the Generalisation Capacity of Conditional VAEs
We address the problem of one-to-many mappings in supervised learning, where
a single instance has many different solutions of possibly equal cost. The
framework of conditional variational autoencoders describes a class of methods
to tackle such structured-prediction tasks by means of latent variables. We
propose to incentivise informative latent representations for increasing the
generalisation capacity of conditional variational autoencoders. To this end,
we modify the latent variable model by defining the likelihood as a function of
the latent variable only and introduce an expressive multimodal prior to enable
the model for capturing semantically meaningful features of the data. To
validate our approach, we train our model on the Cornell Robot Grasping
dataset, and modified versions of MNIST and Fashion-MNIST obtaining results
that show a significantly higher generalisation capability
Action Image Representation: Learning Scalable Deep Grasping Policies with Zero Real World Data
This paper introduces Action Image, a new grasp proposal representation that
allows learning an end-to-end deep-grasping policy. Our model achieves
grasp success on real world objects while being trained only in
simulation on objects with just naive domain randomization. Similar to
computer vision problems, such as object detection, Action Image builds on the
idea that object features are invariant to translation in image space.
Therefore, grasp quality is invariant when evaluating the object-gripper
relationship; a successful grasp for an object depends on its local context,
but is independent of the surrounding environment. Action Image represents a
grasp proposal as an image and uses a deep convolutional network to infer grasp
quality. We show that by using an Action Image representation, trained networks
are able to extract local, salient features of grasping tasks that generalize
across different objects and environments. We show that this representation
works on a variety of inputs, including color images (RGB), depth images (D),
and combined color-depth (RGB-D). Our experimental results demonstrate that
networks utilizing an Action Image representation exhibit strong domain
transfer between training on simulated data and inference on real-world sensor
streams. Finally, our experiments show that a network trained with Action Image
improves grasp success ( vs. ) over a baseline model with the same
structure, but using actions encoded as vectors.Comment: 7 pages, 10 figures, and 3 tables. To be published in International
Conference on Robotics and Automation, 202
AdaGrasp: Learning an Adaptive Gripper-Aware Grasping Policy
This paper aims to improve robots' versatility and adaptability by allowing
them to use a large variety of end-effector tools and quickly adapt to new
tools. We propose AdaGrasp, a method to learn a single grasping policy that
generalizes to novel grippers. By training on a large collection of grippers,
our algorithm is able to acquire generalizable knowledge of how different
grippers should be used in various tasks. Given a visual observation of the
scene and the gripper, AdaGrasp infers the possible grasp poses and their grasp
scores by computing the cross convolution between the shape encodings of the
gripper and scene. Intuitively, this cross convolution operation can be
considered as an efficient way of exhaustively matching the scene geometry with
gripper geometry under different grasp poses (i.e., translations and
orientations), where a good "match" of 3D geometry will lead to a successful
grasp. We validate our methods in both simulation and real-world environments.
Our experiment shows that AdaGrasp significantly outperforms the existing
multi-gripper grasping policy method, especially when handling cluttered
environments and partial observations. Video is available at
https://youtu.be/kknTYTbORfsComment: ICRA 2021. Project page: https://adagrasp.cs.columbia.ed
Multi-Fingered Grasp Planning via Inference in Deep Neural Networks
We propose a novel approach to multi-fingered grasp planning leveraging
learned deep neural network models. We train a voxel-based 3D convolutional
neural network to predict grasp success probability as a function of both
visual information of an object and grasp configuration. We can then formulate
grasp planning as inferring the grasp configuration which maximizes the
probability of grasp success. In addition, we learn a prior over grasp
configurations as a mixture density network conditioned on our voxel-based
object representation.
We show that this object conditional prior improves grasp inference when used
with the learned grasp success prediction network when compared to a learned,
object-agnostic prior, or an uninformed uniform prior. Our work is the first to
directly plan high quality multi-fingered grasps in configuration space using a
deep neural network without the need of an external planner. We validate our
inference method performing multi-finger grasping on a physical robot. Our
experimental results show that our planning method outperforms existing grasp
planning methods for neural networks
UniGrasp: Learning a Unified Model to Grasp with Multifingered Robotic Hands
To achieve a successful grasp, gripper attributes such as its geometry and
kinematics play a role as important as the object geometry. The majority of
previous work has focused on developing grasp methods that generalize over
novel object geometry but are specific to a certain robot hand. We propose
UniGrasp, an efficient data-driven grasp synthesis method that considers both
the object geometry and gripper attributes as inputs. UniGrasp is based on a
novel deep neural network architecture that selects sets of contact points from
the input point cloud of the object. The proposed model is trained on a large
dataset to produce contact points that are in force closure and reachable by
the robot hand. By using contact points as output, we can transfer between a
diverse set of multifingered robotic hands. Our model produces over 90% valid
contact points in Top10 predictions in simulation and more than 90% successful
grasps in real world experiments for various known two-fingered and
three-fingered grippers. Our model also achieves 93%, 83% and 90% successful
grasps in real world experiments for an unseen two-fingered gripper and two
unseen multi-fingered anthropomorphic robotic hands.Comment: Accepted to IEEE Robotics and Automation Letters with ICRA 2020
optio
Deep Dexterous Grasping of Novel Objects from a Single View
Dexterous grasping of a novel object given a single view is an open problem.
This paper makes several contributions to its solution. First, we present a
simulator for generating and testing dexterous grasps. Second we present a data
set, generated by this simulator, of 2.4 million simulated dexterous grasps of
variations of 294 base objects drawn from 20 categories. Third, we present a
basic architecture for generation and evaluation of dexterous grasps that may
be trained in a supervised manner. Fourth, we present three different
evaluative architectures, employing ResNet-50 or VGG16 as their visual
backbone. Fifth, we train, and evaluate seventeen variants of
generative-evaluative architectures on this simulated data set, showing
improvement from 69.53% grasp success rate to 90.49%. Finally, we present a
real robot implementation and evaluate the four most promising variants,
executing 196 real robot grasps in total. We show that our best architectural
variant achieves a grasp success rate of 87.8% on real novel objects seen from
a single view, improving on a baseline of 57.1%.Comment: Submitted for IEEE Transactions on Robotics (T-RO). 14 page
Learning better generative models for dexterous, single-view grasping of novel objects
This paper concerns the problem of how to learn to grasp dexterously, so as
to be able to then grasp novel objects seen only from a single view-point.
Recently, progress has been made in data-efficient learning of generative grasp
models which transfer well to novel objects. These generative grasp models are
learned from demonstration (LfD). One weakness is that, as this paper shall
show, grasp transfer under challenging single view conditions is unreliable.
Second, the number of generative model elements rises linearly in the number of
training examples. This, in turn, limits the potential of these generative
models for generalisation and continual improvement. In this paper, it is shown
how to address these problems. Several technical contributions are made: (i) a
view-based model of a grasp; (ii) a method for combining and compressing
multiple grasp models; (iii) a new way of evaluating contacts that is used both
to generate and to score grasps. These, together, improve both grasp
performance and reduce the number of models learned for grasp transfer. These
advances, in turn, also allow the introduction of autonomous training, in which
the robot learns from self-generated grasps. Evaluation on a challenging test
set shows that, with innovations (i)-(iii) deployed, grasp transfer success
rises from 55.1% to 81.6%. By adding autonomous training this rises to 87.8%.
These differences are statistically significant. In total, across all
experiments, 539 test grasps were executed on real objects.Comment: 19 pages, 15 figures, 7 table