15 research outputs found

    3D scanning of cultural heritage with consumer depth cameras

    Get PDF
    Three dimensional reconstruction of cultural heritage objects is an expensive and time-consuming process. Recent consumer real-time depth acquisition devices, like Microsoft Kinect, allow very fast and simple acquisition of 3D views. However 3D scanning with such devices is a challenging task due to the limited accuracy and reliability of the acquired data. This paper introduces a 3D reconstruction pipeline suited to use consumer depth cameras as hand-held scanners for cultural heritage objects. Several new contributions have been made to achieve this result. They include an ad-hoc filtering scheme that exploits the model of the error on the acquired data and a novel algorithm for the extraction of salient points exploiting both depth and color data. Then the salient points are used within a modified version of the ICP algorithm that exploits both geometry and color distances to precisely align the views even when geometry information is not sufficient to constrain the registration. The proposed method, although applicable to generic scenes, has been tuned to the acquisition of sculptures and in this connection its performance is rather interesting as the experimental results indicate

    Sensing of complex buildings and reconstruction into photo-realistic 3D models

    Get PDF
    The 3D reconstruction of indoor and outdoor environments has received an interest only recently, as companies began to recognize that using reconstructed models is a way to generate revenue through location-based services and advertisements. A great amount of research has been done in the field of 3D reconstruction, and one of the latest and most promising applications is Kinect Fusion, which was developed by Microsoft Research. Its strong points are the real-time intuitive 3D reconstruction, interactive frame rate, the level of detail in the models, and the availability of the hardware and software for researchers and enthusiasts. A representative effort towards 3D reconstruction is the Point Cloud Library (PCL). PCL is a large scale, open project for 2D/3D image and point cloud processing. On December 2011, PCL made available an implementation of Kinect Fusion, namely KinFu. KinFu emulates the functionality provided in Kinect Fusion. However, both implementations have two major limitations: 1. The real-time reconstruction takes place only within a cube with a size of 3 meters per axis. The cube's position is fixed at the start of execution, and any object outside of this cube is not integrated into the reconstructed model. Therefore the volume that can be scanned is always limited by the size of the cube. It is possible to manually align many small-size cubes into a single large model, however this is a time-consuming and difficult task, especially when the meshes have complex topologies and high polygon count, as is the case with the meshes obtained from KinFu. 2. The output mesh does not have any color textures. There are some at-tempts to add color in the output point cloud; however, the resulting effect is not photo-realistic. Applying photo-realistic textures to a model can enhance the user experience, even when the model has a simple topology. The main goal of this project is to design and implement a system that captures large indoor environments and generates 3D photo-realistic large indoor models in real time. This report describes an extended version of the KinFu system. The extensions overcome the scalability and texture reconstruction limitations using commodity hardware and open-source software. The complete hardware setup used in this project is worth €2,000, which is comparable to the cost of a single professional laser scanner. The software is released under BSD license, which makes it completely free to use and commercialize. The system has been integrated into the open-source PCL project. The immediate benefits are three-fold: the system becomes a potential industry standard, it is maintained and extended by many developers around the world with no addition-al cost to the VCA group, and it can reduce the application development time by reusing numerous state-of-the-art algorithms

    Localization of Autonomous Vehicles in Urban Environments

    Full text link
    The future of applications such as last-mile delivery, infrastructure inspection and surveillance bets big on employing small autonomous drones and ground robots in cluttered urban settings where precise positioning is critical. However, when navigating close to buildings, GPS-based localisation of robotic platforms is noisy due to obscured reception and multi-path reflection. Localisation methods using introspective sensors like monocular and stereo cameras mounted on the platforms offer a better alternative as they are suitable for both indoor and outdoor operations. However, the inherent drift in the estimated trajectory is often evident in the 7 degrees of freedom that captures scaling, rotation and translation motion, and needs to be corrected. The theme of the thesis is to use a pre-existing 3D model to supplement the pose estimation from a visual navigation system, reducing incremental drift and thereby improving localisation accuracy. The novel framework developed for the monocular camera first extracts the geometric relationship between the pixels of the calibrated camera and the 3D points on the model. These geometric constraints, when used in addition to the relative pose constraints typically used in Simultaneous Localisation and Mapping (SLAM) algorithms, provide superior trajectory estimation. Further, scale drift correction is proposed using a novel SIM3SIM_3 optimisation procedure and successfully demonstrated using a unique dataset that embodies many urban localisation challenges. Techniques developed for Stereo camera localisation aligns the textured 3D stereo scans with respect to a 3D model and estimates the associated camera pose. The idea is to solve the image registration problem between the projection of the 3D scan and images whose poses are accurately known with respect to the 3D model. The 2D motion parameters are then mapped to the 3D space for camera pose estimation. Novel image registration techniques are developed which use image edge information combined with traditional approaches to show successful results

    High-level environment representations for mobile robots

    Get PDF
    In most robotic applications we are faced with the problem of building a digital representation of the environment that allows the robot to autonomously complete its tasks. This internal representation can be used by the robot to plan a motion trajectory for its mobile base and/or end-effector. For most man-made environments we do not have a digital representation or it is inaccurate. Thus, the robot must have the capability of building it autonomously. This is done by integrating into an internal data structure incoming sensor measurements. For this purpose, a common solution consists in solving the Simultaneous Localization and Mapping (SLAM) problem. The map obtained by solving a SLAM problem is called ``metric'' and it describes the geometric structure of the environment. A metric map is typically made up of low-level primitives (like points or voxels). This means that even though it represents the shape of the objects in the robot workspace it lacks the information of which object a surface belongs to. Having an object-level representation of the environment has the advantage of augmenting the set of possible tasks that a robot may accomplish. To this end, in this thesis we focus on two aspects. We propose a formalism to represent in a uniform manner 3D scenes consisting of different geometric primitives, including points, lines and planes. Consequently, we derive a local registration and a global optimization algorithm that can exploit this representation for robust estimation. Furthermore, we present a Semantic Mapping system capable of building an \textit{object-based} map that can be used for complex task planning and execution. Our system exploits effective reconstruction and recognition techniques that require no a-priori information about the environment and can be used under general conditions

    Novel robust computer vision algorithms for micro autonomous systems

    Get PDF
    People detection and tracking are an essential component of many autonomous platforms, interactive systems and intelligent vehicles used in various search and rescues operations and similar humanitarian applications. Currently, researchers are focusing on the use of vision sensors such as cameras due to their advantages over other sensor types. Cameras are information rich, relatively inexpensive and easily available. Additionally, 3D information is obtained from stereo vision, or by triangulating over several frames in monocular configurations. Another method to obtain 3D data is by using RGB-D sensors (e.g. Kinect) that provide both image and depth data. This method is becoming more attractive over the past few years due to its affordable price and availability for researchers. The aim of this research was to find robust multi-target detection and tracking algorithms for Micro Autonomous Systems (MAS) that incorporate the use of the RGB-D sensor. Contributions include the discovery of novel robust computer vision algorithms. It proposed a new framework for human body detection, from video file, to detect a single person adapted from Viola and Jones framework. The 2D Multi Targets Detection and Tracking (MTDT) algorithm applied the Gaussian Mixture Model (GMM) to reduce noise in the pre-processing stage. Blob analysis was used to detect targets, and Kalman filter was used to track targets. The 3D MTDT extends beyond 2D with the use of depth data from the RGB-D sensor in the pre-processing stage. Bayesian model was employed to provide multiple cues. It includes detection of the upper body, face, skin colour, motion and shape. Kalman filter proved for speed and robustness of the track management. Simultaneous Localisation and Mapping (SLAM) fusing with 3D information was investigated. The new framework introduced front end and back end processing. The front end consists of localisation steps, post refinement and loop closing system. The back-end focus on the post-graph optimisation to eliminate errors.The proposed computer vision algorithms proved for better speed and robustness. The frameworks produced impressive results. New algorithms can be used to improve performances in real time applications including surveillance, vision navigation, environmental perception and vision-based control system on MAS

    Efficient 3D Segmentation, Registration and Mapping for Mobile Robots

    Get PDF
    Sometimes simple is better! For certain situations and tasks, simple but robust methods can achieve the same or better results in the same or less time than related sophisticated approaches. In the context of robots operating in real-world environments, key challenges are perceiving objects of interest and obstacles as well as building maps of the environment and localizing therein. The goal of this thesis is to carefully analyze such problem formulations, to deduce valid assumptions and simplifications, and to develop simple solutions that are both robust and fast. All approaches make use of sensors capturing 3D information, such as consumer RGBD cameras. Comparative evaluations show the performance of the developed approaches. For identifying objects and regions of interest in manipulation tasks, a real-time object segmentation pipeline is proposed. It exploits several common assumptions of manipulation tasks such as objects being on horizontal support surfaces (and well separated). It achieves real-time performance by using particularly efficient approximations in the individual processing steps, subsampling the input data where possible, and processing only relevant subsets of the data. The resulting pipeline segments 3D input data with up to 30Hz. In order to obtain complete segmentations of the 3D input data, a second pipeline is proposed that approximates the sampled surface, smooths the underlying data, and segments the smoothed surface into coherent regions belonging to the same geometric primitive. It uses different primitive models and can reliably segment input data into planes, cylinders and spheres. A thorough comparative evaluation shows state-of-the-art performance while computing such segmentations in near real-time. The second part of the thesis addresses the registration of 3D input data, i.e., consistently aligning input captured from different view poses. Several methods are presented for different types of input data. For the particular application of mapping with micro aerial vehicles where the 3D input data is particularly sparse, a pipeline is proposed that uses the same approximate surface reconstruction to exploit the measurement topology and a surface-to-surface registration algorithm that robustly aligns the data. Optimization of the resulting graph of determined view poses then yields globally consistent 3D maps. For sequences of RGBD data this pipeline is extended to include additional subsampling steps and an initial alignment of the data in local windows in the pose graph. In both cases, comparative evaluations show a robust and fast alignment of the input data
    corecore