8,164 research outputs found
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Visual scene understanding is an important capability that enables robots to
purposefully act in their environment. In this paper, we propose a novel
approach to object-class segmentation from multiple RGB-D views using deep
learning. We train a deep neural network to predict object-class semantics that
is consistent from several view points in a semi-supervised way. At test time,
the semantics predictions of our network can be fused more consistently in
semantic keyframe maps than predictions of a network trained on individual
views. We base our network architecture on a recent single-view deep learning
approach to RGB and depth fusion for semantic object-class segmentation and
enhance it with multi-scale loss minimization. We obtain the camera trajectory
using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth
annotated frames in order to enforce multi-view consistency during training. At
test time, predictions from multiple views are fused into keyframes. We propose
and analyze several methods for enforcing multi-view consistency during
training and testing. We evaluate the benefit of multi-view consistency
training and demonstrate that pooling of deep features and fusion over multiple
views outperforms single-view baselines on the NYUDv2 benchmark for semantic
segmentation. Our end-to-end trained network achieves state-of-the-art
performance on the NYUDv2 dataset in single-view segmentation as well as
multi-view semantic fusion.Comment: the 2017 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS 2017
Creating Simplified 3D Models with High Quality Textures
This paper presents an extension to the KinectFusion algorithm which allows
creating simplified 3D models with high quality RGB textures. This is achieved
through (i) creating model textures using images from an HD RGB camera that is
calibrated with Kinect depth camera, (ii) using a modified scheme to update
model textures in an asymmetrical colour volume that contains a higher number
of voxels than that of the geometry volume, (iii) simplifying dense polygon
mesh model using quadric-based mesh decimation algorithm, and (iv) creating and
mapping 2D textures to every polygon in the output 3D model. The proposed
method is implemented in real-time by means of GPU parallel processing.
Visualization via ray casting of both geometry and colour volumes provides
users with a real-time feedback of the currently scanned 3D model. Experimental
results show that the proposed method is capable of keeping the model texture
quality even for a heavily decimated model and that, when reconstructing small
objects, photorealistic RGB textures can still be reconstructed.Comment: 2015 International Conference on Digital Image Computing: Techniques
and Applications (DICTA), Page 1 -
A Factor Graph Approach to Multi-Camera Extrinsic Calibration on Legged Robots
Legged robots are becoming popular not only in research, but also in
industry, where they can demonstrate their superiority over wheeled machines in
a variety of applications. Either when acting as mobile manipulators or just as
all-terrain ground vehicles, these machines need to precisely track the desired
base and end-effector trajectories, perform Simultaneous Localization and
Mapping (SLAM), and move in challenging environments, all while keeping
balance. A crucial aspect for these tasks is that all onboard sensors must be
properly calibrated and synchronized to provide consistent signals for all the
software modules they feed. In this paper, we focus on the problem of
calibrating the relative pose between a set of cameras and the base link of a
quadruped robot. This pose is fundamental to successfully perform sensor
fusion, state estimation, mapping, and any other task requiring visual
feedback. To solve this problem, we propose an approach based on factor graphs
that jointly optimizes the mutual position of the cameras and the robot base
using kinematics and fiducial markers. We also quantitatively compare its
performance with other state-of-the-art methods on the hydraulic quadruped
robot HyQ. The proposed approach is simple, modular, and independent from
external devices other than the fiducial marker.Comment: To appear on "The Third IEEE International Conference on Robotic
Computing (IEEE IRC 2019)
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
- …