2,935 research outputs found

    Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

    Full text link
    This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses a category-agnostic affordance prediction algorithm to select and execute among four different grasping primitive behaviors. It then recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT-Princeton Team system that took 1st place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video: https://youtu.be/6fG7zwGfIk

    Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions

    Full text link
    Comprehension of spoken natural language is an essential component for robots to communicate with human effectively. However, handling unconstrained spoken instructions is challenging due to (1) complex structures including a wide variety of expressions used in spoken language and (2) inherent ambiguity in interpretation of human instructions. In this paper, we propose the first comprehensive system that can handle unconstrained spoken language and is able to effectively resolve ambiguity in spoken instructions. Specifically, we integrate deep-learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve instruction ambiguity through dialogue. Through our experiments on both a simulated environment as well as a physical industrial robot arm, we demonstrate the ability of our system to understand natural instructions from human operators effectively, and how higher success rates of the object picking task can be achieved through an interactive clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA) 2018. Accompanying videos are available at the following links: https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission

    Interactive Perception Based on Gaussian Process Classification for House-Hold Objects Recognition and Sorting

    Get PDF
    We present an interactive perception model for object sorting based on Gaussian Process (GP) classification that is capable of recognizing objects categories from point cloud data. In our approach, FPFH features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide and probable estimation of the identity of the object and serves a key role in the interactive perception cycle – modelling perception confidence. We show results from simulated input data on both SVM and GP based multi-class classifiers to validate the recognition accuracy of our proposed perception model. Our results demonstrate that by using a GP-based classifier, we obtain true positive classification rates of up to 80%. Our semi-autonomous object sorting experiments show that the proposed GP based interactive sorting approach outperforms random sorting by up to 30% when applied to scenes comprising configurations of household objects

    RGB-D-based Action Recognition Datasets: A Survey

    Get PDF
    Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-\'{a}-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols

    Deep Learning for Object Recognition in picking tasks

    Get PDF
    Treball de Final de Màster Universitari Erasmus Mundus en Robòtica Avançada. Curs acadèmic 2016-2017In the light of current advancement in deep learning, robot vision is not an exception. Many popular machine learning algorithms has already been proposed and implemented to solve intricate computer vision problems. The same has not been in the case of robot vision. Due to real time constraints and dynamic nature of environment such as illumination and processing power, very few algorithms are able to solve the object recognition problem at large. The primary objective of the thesis project is to converge into an accurate working algorithm for object recognition in a cluttered scene and subsequently helping the BAXTER robot to pick up the correct object among the clutter. Feature matching algorithms usually fail to identify most of the object having no texture, hence deep learning has been employed for better performance. The next step is to look for the object and localize it within the image frame. Although basic shallow Convolutional Neural Network easily identifies the presence of an object within a frame, it is very difficult to localize the object location within the frame. This work primarily focuses on finding a solution for an accurate localization. The first solution which comes to mind is to produce a bounding box surrounding the object. In literature, YOLO is found to be providing a very robust result on existing datasets. But this was not the case when it was tried on new objects belonging to the current thesis project work. Due to high inaccuracy and presence of a huge redundant area within the bounding box, an algorithm was needed which will segment the object accurately and make the picking task easier. This was done through semantic segmentation using deep CNNs. Although time consuming, RESNET has been found to be very efficient as its post processed output helps to identify items in a significantly difficult task environment. This work has been done in light of upcoming AMAZON robotic challenge where the robot successfully classified and distinguished everyday items from a cluttered scenario. In addition to this, a performance analysis study has also been done comparing YOLO and RESNET justifying the usage of the later algorithm with the help of performance metrics such IOU and ViG
    • …
    corecore