2,807 research outputs found
Vision-based Robotic Grasping From Object Localization, Object Pose Estimation to Grasp Estimation for Parallel Grippers: A Review
This paper presents a comprehensive survey on vision-based robotic grasping.
We conclude three key tasks during vision-based robotic grasping, which are
object localization, object pose estimation and grasp estimation. In detail,
the object localization task contains object localization without
classification, object detection and object instance segmentation. This task
provides the regions of the target object in the input data. The object pose
estimation task mainly refers to estimating the 6D object pose and includes
correspondence-based methods, template-based methods and voting-based methods,
which affords the generation of grasp poses for known objects. The grasp
estimation task includes 2D planar grasp methods and 6DoF grasp methods, where
the former is constrained to grasp from one direction. These three tasks could
accomplish the robotic grasping with different combinations. Lots of object
pose estimation methods need not object localization, and they conduct object
localization and object pose estimation jointly. Lots of grasp estimation
methods need not object localization and object pose estimation, and they
conduct grasp estimation in an end-to-end manner. Both traditional methods and
latest deep learning-based methods based on the RGB-D image inputs are reviewed
elaborately in this survey. Related datasets and comparisons between
state-of-the-art methods are summarized as well. In addition, challenges about
vision-based robotic grasping and future directions in addressing these
challenges are also pointed out.Comment: This is a pre-print of an article published in Artificial
Intelligence Review. The final authenticated version is available online at:
https://doi.org/10.1007/s10462-020-09888-5. Related refs are summarized at:
https://github.com/GeorgeDu/vision-based-robotic-graspin
Combining RGB and Points to Predict Grasping Region for Robotic Bin-Picking
This paper focuses on a robotic picking tasks in cluttered scenario. Because
of the diversity of objects and clutter by placing, it is much difficult to
recognize and estimate their pose before grasping. Here, we use U-net, a
special Convolution Neural Networks (CNN), to combine RGB images and depth
information to predict picking region without recognition and pose estimation.
The efficiency of diverse visual input of the network were compared, including
RGB, RGB-D and RGB-Points. And we found the RGB-Points input could get a
precision of 95.74%.Comment: 5 pages, 6 figure
Suction Grasp Region Prediction using Self-supervised Learning for Object Picking in Dense Clutter
This paper focuses on robotic picking tasks in cluttered scenario. Because of
the diversity of poses, types of stack and complicated background in bin
picking situation, it is much difficult to recognize and estimate their pose
before grasping them. Here, this paper combines Resnet with U-net structure, a
special framework of Convolution Neural Networks (CNN), to predict picking
region without recognition and pose estimation. And it makes robotic picking
system learn picking skills from scratch. At the same time, we train the
network end to end with online samples. In the end of this paper, several
experiments are conducted to demonstrate the performance of our methods.Comment: 6 pages, 7 figures, conferenc
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Bridging the 'reality gap' that separates simulated robotics from experiments
on hardware could accelerate robotic research through improved data
availability. This paper explores domain randomization, a simple technique for
training models on simulated images that transfer to real images by randomizing
rendering in the simulator. With enough variability in the simulator, the real
world may appear to the model as just another variation. We focus on the task
of object localization, which is a stepping stone to general robotic
manipulation skills. We find that it is possible to train a real-world object
detector that is accurate to cm and robust to distractors and partial
occlusions using only data from a simulator with non-realistic random textures.
To demonstrate the capabilities of our detectors, we show they can be used to
perform grasping in a cluttered environment. To our knowledge, this is the
first successful transfer of a deep neural network trained only on simulated
RGB images (without pre-training on real images) to the real world for the
purpose of robotic control.Comment: 8 pages, 7 figures. Submitted to 2017 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2017
Learning to Grasp Without Seeing
Can a robot grasp an unknown object without seeing it? In this paper, we
present a tactile-sensing based approach to this challenging problem of
grasping novel objects without prior knowledge of their location or physical
properties. Our key idea is to combine touch based object localization with
tactile based re-grasping. To train our learning models, we created a
large-scale grasping dataset, including more than 30 RGB frames and over 2.8
million tactile samples from 7800 grasp interactions of 52 objects. To learn a
representation of tactile signals, we propose an unsupervised auto-encoding
scheme, which shows a significant improvement of 4-9% over prior methods on a
variety of tactile perception tasks. Our system consists of two steps. First,
our touch localization model sequentially 'touch-scans' the workspace and uses
a particle filter to aggregate beliefs from multiple hits of the target. It
outputs an estimate of the object's location, from which an initial grasp is
established. Next, our re-grasping model learns to progressively improve grasps
with tactile feedback based on the learned features. This network learns to
estimate grasp stability and predict adjustment for the next grasp. Re-grasping
thus is performed iteratively until our model identifies a stable grasp.
Finally, we demonstrate extensive experimental results on grasping a large set
of novel objects using tactile sensing alone. Furthermore, when applied on top
of a vision-based policy, our re-grasping model significantly boosts the
overall accuracy by 10.6%. We believe this is the first attempt at learning to
grasp with only tactile sensing and without any prior object knowledge
Design and Development of an automated Robotic Pick & Stow System for an e-Commerce Warehouse
In this paper, we provide details of a robotic system that can automate the
task of picking and stowing objects from and to a rack in an e-commerce
fulfillment warehouse. The system primarily comprises of four main modules: (1)
Perception module responsible for recognizing query objects and localizing them
in the 3-dimensional robot workspace; (2) Planning module generates necessary
paths that the robot end- effector has to take for reaching the objects in the
rack or in the tote; (3) Calibration module that defines the physical workspace
for the robot visible through the on-board vision system; and (4) Gripping and
suction system for picking and stowing different kinds of objects. The
perception module uses a faster region-based Convolutional Neural Network
(R-CNN) to recognize objects. We designed a novel two finger gripper that
incorporates pneumatic valve based suction effect to enhance its ability to
pick different kinds of objects. The system was developed by IITK-TCS team for
participation in the Amazon Picking Challenge 2016 event. The team secured a
fifth place in the stowing task in the event. The purpose of this article is to
share our experiences with students and practicing engineers and enable them to
build similar systems. The overall efficacy of the system is demonstrated
through several simulation as well as real-world experiments with actual
robots.Comment: 15 Pages, 25 Figures, 4 Tables, Journal Pape
A Survey on Deep Learning Methods for Robot Vision
Deep learning has allowed a paradigm shift in pattern recognition, from using
hand-crafted features together with statistical classifiers to using
general-purpose learning procedures for learning data-driven representations,
features, and classifiers together. The application of this new paradigm has
been particularly successful in computer vision, in which the development of
deep learning methods for vision applications has become a hot research topic.
Given that deep learning has already attracted the attention of the robot
vision community, the main purpose of this survey is to address the use of deep
learning in robot vision. To achieve this, a comprehensive overview of deep
learning and its usage in computer vision is given, that includes a description
of the most frequently used neural models and their main application areas.
Then, the standard methodology and tools used for designing deep-learning based
vision systems are presented. Afterwards, a review of the principal work using
deep learning in robot vision is presented, as well as current and future
trends related to the use of deep learning in robotics. This survey is intended
to be a guide for the developers of robot vision systems
Experiments on Learning Based Industrial Bin-picking with Iterative Visual Recognition
This paper shows experimental results on learning based randomized
bin-picking combined with iterative visual recognition. We use the random
forest to predict whether or not a robot will successfully pick an object for
given depth images of the pile taking the collision between a finger and a
neighboring object into account. For the discriminator to be accurate, we
consider estimating objects' poses by merging multiple depth images of the pile
captured from different points of view by using a depth sensor attached at the
wrist. We show that, even if a robot is predicted to fail in picking an object
with a single depth image due to its large occluded area, it is finally
predicted as success after merging multiple depth images. In addition, we show
that the random forest can be trained with the small number of training data.Comment: This paper is to appear Industrial Robots: an International Journa
The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints
A robot can now grasp an object more effectively than ever before, but once
it has the object what happens next? We show that a mild relaxation of the task
and workspace constraints implicit in existing object grasping datasets can
cause neural network based grasping algorithms to fail on even a simple block
stacking task when executed under more realistic circumstances.
To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD),
where a robot interacts with 5.1 cm colored blocks to complete an
order-fulfillment style block stacking task. It contains dynamic scenes and
real time-series data in a less constrained environment than comparable
datasets. There are nearly 12,000 stacking attempts and over 2 million frames
of real data. We discuss the ways in which this dataset provides a valuable
resource for a broad range of other topics of investigation.
We find that hand-designed neural networks that work on prior datasets do not
generalize to this task. Thus, to establish a baseline for this dataset, we
demonstrate an automated search of neural network based models using a novel
multiple-input HyperTree MetaModel, and find a final model which makes
reasonable 3D pose predictions for grasping and stacking on our dataset.
The CoSTAR BSD, code, and instructions are available at
https://sites.google.com/site/costardataset.Comment: This is a major revision refocusing the topic towards the JHU CoSTAR
Block Stacking Dataset, workspace constraints, and a comparison of HyperTrees
with hand-designed algorithms. 12 pages, 10 figures, and 3 table
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
We describe a learning-based approach to hand-eye coordination for robotic
grasping from monocular images. To learn hand-eye coordination for grasping, we
trained a large convolutional neural network to predict the probability that
task-space motion of the gripper will result in successful grasps, using only
monocular camera images and independently of camera calibration or the current
robot pose. This requires the network to observe the spatial relationship
between the gripper and objects in the scene, thus learning hand-eye
coordination. We then use this network to servo the gripper in real time to
achieve successful grasps. To train our network, we collected over 800,000
grasp attempts over the course of two months, using between 6 and 14 robotic
manipulators at any given time, with differences in camera placement and
hardware. Our experimental evaluation demonstrates that our method achieves
effective real-time control, can successfully grasp novel objects, and corrects
mistakes by continuous servoing.Comment: This is an extended version of "Learning Hand-Eye Coordination for
Robotic Grasping with Large-Scale Data Collection," ISER 2016. Draft modified
to correct typo in Algorithm 1 and add a link to the publicly available
datase
- …