764 research outputs found

    SurfelMeshing: Online Surfel-Based Mesh Reconstruction

    Full text link
    We address the problem of mesh reconstruction from live RGB-D video, assuming a calibrated camera and poses provided externally (e.g., by a SLAM system). In contrast to most existing approaches, we do not fuse depth measurements in a volume but in a dense surfel cloud. We asynchronously (re)triangulate the smoothed surfels to reconstruct a surface mesh. This novel approach enables to maintain a dense surface representation of the scene during SLAM which can quickly adapt to loop closures. This is possible by deforming the surfel cloud and asynchronously remeshing the surface where necessary. The surfel-based representation also naturally supports strongly varying scan resolution. In particular, it reconstructs colors at the input camera's resolution. Moreover, in contrast to many volumetric approaches, ours can reconstruct thin objects since objects do not need to enclose a volume. We demonstrate our approach in a number of experiments, showing that it produces reconstructions that are competitive with the state-of-the-art, and we discuss its advantages and limitations. The algorithm (excluding loop closure functionality) is available as open source at https://github.com/puzzlepaint/surfelmeshing .Comment: Version accepted to IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Cross-calibration of Time-of-flight and Colour Cameras

    Get PDF
    Time-of-flight cameras provide depth information, which is complementary to the photometric appearance of the scene in ordinary images. It is desirable to merge the depth and colour information, in order to obtain a coherent scene representation. However, the individual cameras will have different viewpoints, resolutions and fields of view, which means that they must be mutually calibrated. This paper presents a geometric framework for this multi-view and multi-modal calibration problem. It is shown that three-dimensional projective transformations can be used to align depth and parallax-based representations of the scene, with or without Euclidean reconstruction. A new evaluation procedure is also developed; this allows the reprojection error to be decomposed into calibration and sensor-dependent components. The complete approach is demonstrated on a network of three time-of-flight and six colour cameras. The applications of such a system, to a range of automatic scene-interpretation problems, are discussed.Comment: 18 pages, 12 figures, 3 table

    Sensing of complex buildings and reconstruction into photo-realistic 3D models

    Get PDF
    The 3D reconstruction of indoor and outdoor environments has received an interest only recently, as companies began to recognize that using reconstructed models is a way to generate revenue through location-based services and advertisements. A great amount of research has been done in the field of 3D reconstruction, and one of the latest and most promising applications is Kinect Fusion, which was developed by Microsoft Research. Its strong points are the real-time intuitive 3D reconstruction, interactive frame rate, the level of detail in the models, and the availability of the hardware and software for researchers and enthusiasts. A representative effort towards 3D reconstruction is the Point Cloud Library (PCL). PCL is a large scale, open project for 2D/3D image and point cloud processing. On December 2011, PCL made available an implementation of Kinect Fusion, namely KinFu. KinFu emulates the functionality provided in Kinect Fusion. However, both implementations have two major limitations: 1. The real-time reconstruction takes place only within a cube with a size of 3 meters per axis. The cube's position is fixed at the start of execution, and any object outside of this cube is not integrated into the reconstructed model. Therefore the volume that can be scanned is always limited by the size of the cube. It is possible to manually align many small-size cubes into a single large model, however this is a time-consuming and difficult task, especially when the meshes have complex topologies and high polygon count, as is the case with the meshes obtained from KinFu. 2. The output mesh does not have any color textures. There are some at-tempts to add color in the output point cloud; however, the resulting effect is not photo-realistic. Applying photo-realistic textures to a model can enhance the user experience, even when the model has a simple topology. The main goal of this project is to design and implement a system that captures large indoor environments and generates 3D photo-realistic large indoor models in real time. This report describes an extended version of the KinFu system. The extensions overcome the scalability and texture reconstruction limitations using commodity hardware and open-source software. The complete hardware setup used in this project is worth €2,000, which is comparable to the cost of a single professional laser scanner. The software is released under BSD license, which makes it completely free to use and commercialize. The system has been integrated into the open-source PCL project. The immediate benefits are three-fold: the system becomes a potential industry standard, it is maintained and extended by many developers around the world with no addition-al cost to the VCA group, and it can reduce the application development time by reusing numerous state-of-the-art algorithms

    Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

    Full text link
    Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition (CVPR), 201
    corecore