23,348 research outputs found

    Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

    Full text link
    This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses a category-agnostic affordance prediction algorithm to select and execute among four different grasping primitive behaviors. It then recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT-Princeton Team system that took 1st place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video: https://youtu.be/6fG7zwGfIk

    Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

    Full text link
    The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments evaluate a number of practical design questions, demonstrate that we can infer this representation, and quantitatively and qualitatively demonstrate its merits compared to alternate representations.Comment: Project url with code: https://shubhtuls.github.io/factored3

    SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes

    Full text link
    The objective of this paper is 3D shape understanding from single and multiple images. To this end, we introduce a new deep-learning architecture and loss function, SilNet, that can handle multiple views in an order-agnostic manner. The architecture is fully convolutional, and for training we use a proxy task of silhouette prediction, rather than directly learning a mapping from 2D images to 3D shape as has been the target in most recent work. We demonstrate that with the SilNet architecture there is generalisation over the number of views -- for example, SilNet trained on 2 views can be used with 3 or 4 views at test-time; and performance improves with more views. We introduce two new synthetics datasets: a blobby object dataset useful for pre-training, and a challenging and realistic sculpture dataset; and demonstrate on these datasets that SilNet has indeed learnt 3D shape. Finally, we show that SilNet exceeds the state of the art on the ShapeNet benchmark dataset, and use SilNet to generate novel views of the sculpture dataset.Comment: BMVC 2017; Best Poste

    Learning sparse representations of depth

    Full text link
    This paper introduces a new method for learning and inferring sparse representations of depth (disparity) maps. The proposed algorithm relaxes the usual assumption of the stationary noise model in sparse coding. This enables learning from data corrupted with spatially varying noise or uncertainty, typically obtained by laser range scanners or structured light depth cameras. Sparse representations are learned from the Middlebury database disparity maps and then exploited in a two-layer graphical model for inferring depth from stereo, by including a sparsity prior on the learned features. Since they capture higher-order dependencies in the depth structure, these priors can complement smoothness priors commonly used in depth inference based on Markov Random Field (MRF) models. Inference on the proposed graph is achieved using an alternating iterative optimization technique, where the first layer is solved using an existing MRF-based stereo matching algorithm, then held fixed as the second layer is solved using the proposed non-stationary sparse coding algorithm. This leads to a general method for improving solutions of state of the art MRF-based depth estimation algorithms. Our experimental results first show that depth inference using learned representations leads to state of the art denoising of depth maps obtained from laser range scanners and a time of flight camera. Furthermore, we show that adding sparse priors improves the results of two depth estimation methods: the classical graph cut algorithm by Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page
    • …
    corecore