2,935 research outputs found
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Comprehension of spoken natural language is an essential component for robots
to communicate with human effectively. However, handling unconstrained spoken
instructions is challenging due to (1) complex structures including a wide
variety of expressions used in spoken language and (2) inherent ambiguity in
interpretation of human instructions. In this paper, we propose the first
comprehensive system that can handle unconstrained spoken language and is able
to effectively resolve ambiguity in spoken instructions. Specifically, we
integrate deep-learning-based object detection together with natural language
processing technologies to handle unconstrained spoken instructions, and
propose a method for robots to resolve instruction ambiguity through dialogue.
Through our experiments on both a simulated environment as well as a physical
industrial robot arm, we demonstrate the ability of our system to understand
natural instructions from human operators effectively, and how higher success
rates of the object picking task can be achieved through an interactive
clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA)
2018. Accompanying videos are available at the following links:
https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and
http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission
Interactive Perception Based on Gaussian Process Classification for House-Hold Objects Recognition and Sorting
We present an interactive perception model for
object sorting based on Gaussian Process (GP) classification
that is capable of recognizing objects categories from point
cloud data. In our approach, FPFH features are extracted from
point clouds to describe the local 3D shape of objects and
a Bag-of-Words coding method is used to obtain an object-level
vocabulary representation. Multi-class Gaussian Process
classification is employed to provide and probable estimation of
the identity of the object and serves a key role in the interactive
perception cycle – modelling perception confidence. We show
results from simulated input data on both SVM and GP based
multi-class classifiers to validate the recognition accuracy of our
proposed perception model. Our results demonstrate that by
using a GP-based classifier, we obtain true positive classification
rates of up to 80%. Our semi-autonomous object sorting
experiments show that the proposed GP based interactive
sorting approach outperforms random sorting by up to 30%
when applied to scenes comprising configurations of household
objects
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
Deep Learning for Object Recognition in picking tasks
Treball de Final de Mà ster Universitari Erasmus Mundus en Robòtica Avançada. Curs acadèmic 2016-2017In the light of current advancement in deep learning, robot vision is not an exception. Many popular
machine learning algorithms has already been proposed and implemented to solve intricate computer
vision problems. The same has not been in the case of robot vision. Due to real time constraints and
dynamic nature of environment such as illumination and processing power, very few algorithms are
able to solve the object recognition problem at large.
The primary objective of the thesis project is to converge into an accurate working algorithm for
object recognition in a cluttered scene and subsequently helping the BAXTER robot to pick up
the correct object among the clutter. Feature matching algorithms usually fail to identify most of
the object having no texture, hence deep learning has been employed for better performance. The
next step is to look for the object and localize it within the image frame. Although basic shallow
Convolutional Neural Network easily identifies the presence of an object within a frame, it is very
difficult to localize the object location within the frame. This work primarily focuses on finding
a solution for an accurate localization. The first solution which comes to mind is to produce a
bounding box surrounding the object. In literature, YOLO is found to be providing a very robust
result on existing datasets. But this was not the case when it was tried on new objects belonging
to the current thesis project work. Due to high inaccuracy and presence of a huge redundant area
within the bounding box, an algorithm was needed which will segment the object accurately and
make the picking task easier. This was done through semantic segmentation using deep CNNs.
Although time consuming, RESNET has been found to be very efficient as its post processed output
helps to identify items in a significantly difficult task environment. This work has been done in light
of upcoming AMAZON robotic challenge where the robot successfully classified and distinguished
everyday items from a cluttered scenario. In addition to this, a performance analysis study has also
been done comparing YOLO and RESNET justifying the usage of the later algorithm with the help
of performance metrics such IOU and ViG
- …