1,534 research outputs found

    Enabling Depth-driven Visual Attention on the iCub Humanoid Robot: Instructions for Use and New Perspectives

    Get PDF
    The importance of depth perception in the interactions that humans have within their nearby space is a well established fact. Consequently, it is also well known that the possibility of exploiting good stereo information would ease and, in many cases, enable, a large variety of attentional and interactive behaviors on humanoid robotic platforms. However, the difficulty of computing real-time and robust binocular disparity maps from moving stereo cameras often prevents from relying on this kind of cue to visually guide robots' attention and actions in real-world scenarios. The contribution of this paper is two-fold: first, we show that the Efficient Large-scale Stereo Matching algorithm (ELAS) by A. Geiger et al. 2010 for computation of the disparity map is well suited to be used on a humanoid robotic platform as the iCub robot; second, we show how, provided with a fast and reliable stereo system, implementing relatively challenging visual behaviors in natural settings can require much less effort. As a case of study we consider the common situation where the robot is asked to focus the attention on one object close in the scene, showing how a simple but effective disparity-based segmentation solves the problem in this case. Indeed this example paves the way to a variety of other similar applications

    Microsoft COCO: Common Objects in Context

    Get PDF
    We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model

    OpenPTrack: Open Source Multi-Camera Calibration and People Tracking for RGB-D Camera Networks

    Get PDF
    OpenPTrack is an open source software for multi-camera calibration and people tracking in RGB-D camera networks. It allows to track people in big volumes at sensor frame rate and currently supports a heterogeneous set of 3D sensors. In this work, we describe its user-friendly calibration procedure, which consists of simple steps with real-time feedback that allow to obtain accurate results in estimating the camera poses that are then used for tracking people. On top of a calibration based on moving a checkerboard within the tracking space and on a global optimization of cameras and checkerboards poses, a novel procedure which aligns people detections coming from all sensors in a x-y-time space is used for refining camera poses. While people detection is executed locally, in the machines connected to each sensor, tracking is performed by a single node which takes into account detections from all over the network. Here we detail how a cascade of algorithms working on depth point clouds and color, infrared and disparity images is used to perform people detection from different types of sensors and in any indoor light condition. We present experiments showing that a considerable improvement can be obtained with the proposed calibration refinement procedure that exploits people detections and we compare Kinect v1, Kinect v2 and Mesa SR4500 performance for people tracking applications. OpenPTrack is based on the Robot Operating System and the Point Cloud Library and has already been adopted in networks composed of up to ten imagers for interactive arts, education, culture and human\u2013robot interaction applications
    • …
    corecore