23,348 research outputs found
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene
The goal of this paper is to take a single 2D image of a scene and recover
the 3D structure in terms of a small set of factors: a layout representing the
enclosing surfaces as well as a set of objects represented in terms of shape
and pose. We propose a convolutional neural network-based approach to predict
this representation and benchmark it on a large dataset of indoor scenes. Our
experiments evaluate a number of practical design questions, demonstrate that
we can infer this representation, and quantitatively and qualitatively
demonstrate its merits compared to alternate representations.Comment: Project url with code: https://shubhtuls.github.io/factored3
SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes
The objective of this paper is 3D shape understanding from single and
multiple images. To this end, we introduce a new deep-learning architecture and
loss function, SilNet, that can handle multiple views in an order-agnostic
manner. The architecture is fully convolutional, and for training we use a
proxy task of silhouette prediction, rather than directly learning a mapping
from 2D images to 3D shape as has been the target in most recent work.
We demonstrate that with the SilNet architecture there is generalisation over
the number of views -- for example, SilNet trained on 2 views can be used with
3 or 4 views at test-time; and performance improves with more views.
We introduce two new synthetics datasets: a blobby object dataset useful for
pre-training, and a challenging and realistic sculpture dataset; and
demonstrate on these datasets that SilNet has indeed learnt 3D shape. Finally,
we show that SilNet exceeds the state of the art on the ShapeNet benchmark
dataset, and use SilNet to generate novel views of the sculpture dataset.Comment: BMVC 2017; Best Poste
Learning sparse representations of depth
This paper introduces a new method for learning and inferring sparse
representations of depth (disparity) maps. The proposed algorithm relaxes the
usual assumption of the stationary noise model in sparse coding. This enables
learning from data corrupted with spatially varying noise or uncertainty,
typically obtained by laser range scanners or structured light depth cameras.
Sparse representations are learned from the Middlebury database disparity maps
and then exploited in a two-layer graphical model for inferring depth from
stereo, by including a sparsity prior on the learned features. Since they
capture higher-order dependencies in the depth structure, these priors can
complement smoothness priors commonly used in depth inference based on Markov
Random Field (MRF) models. Inference on the proposed graph is achieved using an
alternating iterative optimization technique, where the first layer is solved
using an existing MRF-based stereo matching algorithm, then held fixed as the
second layer is solved using the proposed non-stationary sparse coding
algorithm. This leads to a general method for improving solutions of state of
the art MRF-based depth estimation algorithms. Our experimental results first
show that depth inference using learned representations leads to state of the
art denoising of depth maps obtained from laser range scanners and a time of
flight camera. Furthermore, we show that adding sparse priors improves the
results of two depth estimation methods: the classical graph cut algorithm by
Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page
- …