3,366 research outputs found
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
What Can I Do Around Here? Deep Functional Scene Understanding for Cognitive Robots
For robots that have the capability to interact with the physical environment
through their end effectors, understanding the surrounding scenes is not merely
a task of image classification or object recognition. To perform actual tasks,
it is critical for the robot to have a functional understanding of the visual
scene. Here, we address the problem of localizing and recognition of functional
areas from an arbitrary indoor scene, formulated as a two-stage deep learning
based detection pipeline. A new scene functionality testing-bed, which is
complied from two publicly available indoor scene datasets, is used for
evaluation. Our method is evaluated quantitatively on the new dataset,
demonstrating the ability to perform efficient recognition of functional areas
from arbitrary indoor scenes. We also demonstrate that our detection model can
be generalized onto novel indoor scenes by cross validating it with the images
from two different datasets
Multi-Modal Trip Hazard Affordance Detection On Construction Sites
Trip hazards are a significant contributor to accidents on construction and
manufacturing sites, where over a third of Australian workplace injuries occur
[1]. Current safety inspections are labour intensive and limited by human
fallibility,making automation of trip hazard detection appealing from both a
safety and economic perspective. Trip hazards present an interesting challenge
to modern learning techniques because they are defined as much by affordance as
by object type; for example wires on a table are not a trip hazard, but can be
if lying on the ground. To address these challenges, we conduct a comprehensive
investigation into the performance characteristics of 11 different colour and
depth fusion approaches, including 4 fusion and one non fusion approach; using
colour and two types of depth images. Trained and tested on over 600 labelled
trip hazards over 4 floors and 2000m in an active construction
site,this approach was able to differentiate between identical objects in
different physical configurations (see Figure 1). Outperforming a colour-only
detector, our multi-modal trip detector fuses colour and depth information to
achieve a 4% absolute improvement in F1-score. These investigative results and
the extensive publicly available dataset moves us one step closer to assistive
or fully automated safety inspection systems on construction sites.Comment: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation
Letters (RA-L
- …