8,159 research outputs found

    Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

    Full text link
    Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel approach to object-class segmentation from multiple RGB-D views using deep learning. We train a deep neural network to predict object-class semantics that is consistent from several view points in a semi-supervised way. At test time, the semantics predictions of our network can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion.Comment: the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017

    Creating Simplified 3D Models with High Quality Textures

    Get PDF
    This paper presents an extension to the KinectFusion algorithm which allows creating simplified 3D models with high quality RGB textures. This is achieved through (i) creating model textures using images from an HD RGB camera that is calibrated with Kinect depth camera, (ii) using a modified scheme to update model textures in an asymmetrical colour volume that contains a higher number of voxels than that of the geometry volume, (iii) simplifying dense polygon mesh model using quadric-based mesh decimation algorithm, and (iv) creating and mapping 2D textures to every polygon in the output 3D model. The proposed method is implemented in real-time by means of GPU parallel processing. Visualization via ray casting of both geometry and colour volumes provides users with a real-time feedback of the currently scanned 3D model. Experimental results show that the proposed method is capable of keeping the model texture quality even for a heavily decimated model and that, when reconstructing small objects, photorealistic RGB textures can still be reconstructed.Comment: 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Page 1 -

    A Factor Graph Approach to Multi-Camera Extrinsic Calibration on Legged Robots

    Full text link
    Legged robots are becoming popular not only in research, but also in industry, where they can demonstrate their superiority over wheeled machines in a variety of applications. Either when acting as mobile manipulators or just as all-terrain ground vehicles, these machines need to precisely track the desired base and end-effector trajectories, perform Simultaneous Localization and Mapping (SLAM), and move in challenging environments, all while keeping balance. A crucial aspect for these tasks is that all onboard sensors must be properly calibrated and synchronized to provide consistent signals for all the software modules they feed. In this paper, we focus on the problem of calibrating the relative pose between a set of cameras and the base link of a quadruped robot. This pose is fundamental to successfully perform sensor fusion, state estimation, mapping, and any other task requiring visual feedback. To solve this problem, we propose an approach based on factor graphs that jointly optimizes the mutual position of the cameras and the robot base using kinematics and fiducial markers. We also quantitatively compare its performance with other state-of-the-art methods on the hydraulic quadruped robot HyQ. The proposed approach is simple, modular, and independent from external devices other than the fiducial marker.Comment: To appear on "The Third IEEE International Conference on Robotic Computing (IEEE IRC 2019)

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
    • …
    corecore