1,174 research outputs found

    Active Image-based Modeling with a Toy Drone

    Full text link
    Image-based modeling techniques can now generate photo-realistic 3D models from images. But it is up to users to provide high quality images with good coverage and view overlap, which makes the data capturing process tedious and time consuming. We seek to automate data capturing for image-based modeling. The core of our system is an iterative linear method to solve the multi-view stereo (MVS) problem quickly and plan the Next-Best-View (NBV) effectively. Our fast MVS algorithm enables online model reconstruction and quality assessment to determine the NBVs on the fly. We test our system with a toy unmanned aerial vehicle (UAV) in simulated, indoor and outdoor experiments. Results show that our system improves the efficiency of data acquisition and ensures the completeness of the final model.Comment: To be published on International Conference on Robotics and Automation 2018, Brisbane, Australia. Project Page: https://huangrui815.github.io/active-image-based-modeling/ The author's personal page: http://www.sfu.ca/~rha55

    Development of a probabilistic perception system for camera-lidar sensor fusion

    Get PDF
    La estimación de profundidad usando diferentes sensores es uno de los desafíos clave para dotar a las máquinas autónomas de sólidas capacidades de percepción robótica. Ha habido un avance sobresaliente en el desarrollo de técnicas de estimación de profundidad unimodales basadas en cámaras monoculares, debido a su alta resolución o sensores LiDAR, debido a los datos geométricos precisos que proporcionan. Sin embargo, cada uno de ellos presenta inconvenientes inherentes, como la alta sensibilidad a los cambios en las condiciones de iluminación en el caso delas cámaras y la resolución limitada de los sensores LiDAR. La fusión de sensores se puede utilizar para combinar los méritos y compensar las desventajas de estos dos tipos de sensores. Sin embargo, los métodos de fusión actuales funcionan a un alto nivel. Procesan los flujos de datos de los sensores de forma independiente y combinan las estimaciones de alto nivel obtenidas para cada sensor. En este proyecto, abordamos el problema en un nivel bajo, fusionando los flujos de sensores sin procesar, obteniendo así estimaciones de profundidad que son densas y precisas, y pueden usarse como una fuente de datos multimodal unificada para problemas de estimación de nivel superior. Este trabajo propone un modelo de campo aleatorio condicional (CRF) con múltiples potenciales de geometría y apariencia que representa a la perfección el problema de estimar mapas de profundidad densos a partir de datos de cámara y LiDAR. El modelo se puede optimizar de manera eficiente utilizando el algoritmo Conjúgate Gradient Squared (CGS). El método propuesto se evalúa y compara utilizando el conjunto de datos proporcionado por KITTI Datset. Adicionalmente, se evalúa cualitativamente el modelo, usando datos adquiridos por el autor de esté trabajoMulti-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There has been an outstanding advance in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution or LiDAR sensors due to the precise geometric data they provide. However, each of them suffers from some inherent drawbacks like high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They processes sensor data streams independently and combine the high level estimates obtained for each sensor. In this thesis, I tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field (CRF) model with multiple geometry and appearance potentials that seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared (CGS) algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset. In addition, the model is qualitatively evaluated using data acquired by the author of this work.MaestríaMagíster en Ingeniería de Desarrollo de Producto

    Automatic Dense 3D Scene Mapping from Non-overlapping Passive Visual Sensors for Future Autonomous Systems

    Get PDF
    The ever increasing demand for higher levels of autonomy for robots and vehicles means there is an ever greater need for such systems to be aware of their surroundings. Whilst solutions already exist for creating 3D scene maps, many are based on active scanning devices such as laser scanners and depth cameras that are either expensive, unwieldy, or do not function well under certain environmental conditions. As a result passive cameras are a favoured sensor due their low cost, small size, and ability to work in a range of lighting conditions. In this work we address some of the remaining research challenges within the problem of 3D mapping around a moving platform. We utilise prior work in dense stereo imaging, Stereo Visual Odometry (SVO) and extend Structure from Motion (SfM) to create a pipeline optimised for on vehicle sensing. Using forward facing stereo cameras, we use state of the art SVO and dense stereo techniques to map the scene in front of the vehicle. With significant amounts of prior research in dense stereo, we addressed the issue of selecting an appropriate method by creating a novel evaluation technique. Visual 3D mapping of dynamic scenes from a moving platform result in duplicated scene objects. We extend the prior work on mapping by introducing a generalized dynamic object removal process. Unlike other approaches that rely on computationally expensive segmentation or detection, our method utilises existing data from the mapping stage and the findings from our dense stereo evaluation. We introduce a new SfM approach that exploits our platform motion to create a novel dense mapping process that exceeds the 3D data generation rate of state of the art alternatives. Finally, we combine dense stereo, SVO, and our SfM approach to automatically align point clouds from non-overlapping views to create a rotational and scale consistent global 3D model

    A multisensor SLAM for dense maps of large scale environments under poor lighting conditions

    Get PDF
    This thesis describes the development and implementation of a multisensor large scale autonomous mapping system for surveying tasks in underground mines. The hazardous nature of the underground mining industry has resulted in a push towards autonomous solutions to the most dangerous operations, including surveying tasks. Many existing autonomous mapping techniques rely on approaches to the Simultaneous Localization and Mapping (SLAM) problem which are not suited to the extreme characteristics of active underground mining environments. Our proposed multisensor system has been designed from the outset to address the unique challenges associated with underground SLAM. The robustness, self-containment and portability of the system maximize the potential applications.The multisensor mapping solution proposed as a result of this work is based on a fusion of omnidirectional bearing-only vision-based localization and 3D laser point cloud registration. By combining these two SLAM techniques it is possible to achieve some of the advantages of both approaches – the real-time attributes of vision-based SLAM and the dense, high precision maps obtained through 3D lasers. The result is a viable autonomous mapping solution suitable for application in challenging underground mining environments.A further improvement to the robustness of the proposed multisensor SLAM system is a consequence of incorporating colour information into vision-based localization. Underground mining environments are often dominated by dynamic sources of illumination which can cause inconsistent feature motion during localization. Colour information is utilized to identify and remove features resulting from illumination artefacts and to improve the monochrome based feature matching between frames.Finally, the proposed multisensor mapping system is implemented and evaluated in both above ground and underground scenarios. The resulting large scale maps contained a maximum offset error of ±30mm for mapping tasks with lengths over 100m

    Building with Drones: Accurate 3D Facade Reconstruction using MAVs

    Full text link
    Automatic reconstruction of 3D models from images using multi-view Structure-from-Motion methods has been one of the most fruitful outcomes of computer vision. These advances combined with the growing popularity of Micro Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools ubiquitous for large number of Architecture, Engineering and Construction applications among audiences, mostly unskilled in computer vision. However, to obtain high-resolution and accurate reconstructions from a large-scale object using SfM, there are many critical constraints on the quality of image data, which often become sources of inaccuracy as the current 3D reconstruction pipelines do not facilitate the users to determine the fidelity of input data during the image acquisition. In this paper, we present and advocate a closed-loop interactive approach that performs incremental reconstruction in real-time and gives users an online feedback about the quality parameters like Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We also propose a novel multi-scale camera network design to prevent scene drift caused by incremental map building, and release the first multi-scale image sequence dataset as a benchmark. Further, we evaluate our system on real outdoor scenes, and show that our interactive pipeline combined with a multi-scale camera network approach provides compelling accuracy in multi-view reconstruction tasks when compared against the state-of-the-art methods.Comment: 8 Pages, 2015 IEEE International Conference on Robotics and Automation (ICRA '15), Seattle, WA, US

    Development of a probabilistic perception system for camera-lidar sensor fusion

    Get PDF
    La estimación de profundidad usando diferentes sensores es uno de los desafíos clave para dotar a las máquinas autónomas de sólidas capacidades de percepción robótica. Ha habido un avance sobresaliente en el desarrollo de técnicas de estimación de profundidad unimodales basadas en cámaras monoculares, debido a su alta resolución o sensores LiDAR, debido a los datos geométricos precisos que proporcionan. Sin embargo, cada uno de ellos presenta inconvenientes inherentes, como la alta sensibilidad a los cambios en las condiciones de iluminación en el caso delas cámaras y la resolución limitada de los sensores LiDAR. La fusión de sensores se puede utilizar para combinar los méritos y compensar las desventajas de estos dos tipos de sensores. Sin embargo, los métodos de fusión actuales funcionan a un alto nivel. Procesan los flujos de datos de los sensores de forma independiente y combinan las estimaciones de alto nivel obtenidas para cada sensor. En este proyecto, abordamos el problema en un nivel bajo, fusionando los flujos de sensores sin procesar, obteniendo así estimaciones de profundidad que son densas y precisas, y pueden usarse como una fuente de datos multimodal unificada para problemas de estimación de nivel superior. Este trabajo propone un modelo de campo aleatorio condicional (CRF) con múltiples potenciales de geometría y apariencia que representa a la perfección el problema de estimar mapas de profundidad densos a partir de datos de cámara y LiDAR. El modelo se puede optimizar de manera eficiente utilizando el algoritmo Conjúgate Gradient Squared (CGS). El método propuesto se evalúa y compara utilizando el conjunto de datos proporcionado por KITTI Datset. Adicionalmente, se evalúa cualitativamente el modelo, usando datos adquiridos por el autor de esté trabajoMulti-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There has been an outstanding advance in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution or LiDAR sensors due to the precise geometric data they provide. However, each of them suffers from some inherent drawbacks like high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They processes sensor data streams independently and combine the high level estimates obtained for each sensor. In this thesis, I tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field (CRF) model with multiple geometry and appearance potentials that seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared (CGS) algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset. In addition, the model is qualitatively evaluated using data acquired by the author of this work.MaestríaMagíster en Ingeniería de Desarrollo de Producto

    DESIGNING AND EVALUATING A PORTABLE LIDAR-BASED SLAM SYSTEM

    Get PDF
    Mobile Mapping Technology (MMT) has evolved rapidly over the past few decades, especially in using low-cost sensors. This progress is primarily attributed to the appearance of innovative simultaneous localization and mapping (SLAM) algorithms. This article focuses on evaluating the efficiency of a new LiDAR-based portable SLAM system for mapping in dynamic real-world environments. The work proposed a technical solution based on a Livox Avia LiDAR sensor enhanced by gimbal stabilization. The system, named Portable Livox-based Mapping system (PoLiMap), is compared to other similar solutions by acquiring data from various environments, including urban sceneries, underground tunnels and forested areas, and processing them using a modified FAST-LIO-SLAM algorithm. The research presented in the article contributes to the understanding of the capabilities of PoLiMap systems under various conditions and offers significant insight into its potential applications. Accuracy evaluation results prove that the proposed MMT system can successfully tackle various demanding environments and challenge the results of other more costly state-of-the-art portable mobile laser scanning methods

    Building a dense surface map incrementally from semi-dense point cloud and RGBimages

    Full text link
    © 2015, Journal of Zhejiang University Science Editorial Office and Springer-Verlag Berlin Heidelberg. Building and using maps is a fundamental issue for bionic robots in field applications. A dense surface map, which offers rich visual and geometric information, is an ideal representation of the environment for indoor/outdoor localization, navigation, and recognition tasks of these robots. Since most bionic robots can use only small light-weight laser scanners and cameras to acquire semi-dense point cloud and RGB images, we propose a method to generate a consistent and dense surface map from this kind of semi-dense point cloud and RGB images. The method contains two main steps: (1) generate a dense surface for every single scan of point cloud and its corresponding image(s) and (2) incrementally fuse the dense surface of a new scan into the whole map. In step (1) edge-aware resampling is realized by segmenting the scan of a point cloud in advance and resampling each sub-cloud separately. Noise within the scan is reduced and a dense surface is generated. In step (2) the average surface is estimated probabilistically and the non-coincidence of different scans is eliminated. Experiments demonstrate that our method works well in both indoor and outdoor semi-structured environments where there are regularly shaped objects

    Multi-environment Georeferencing of RGB-D Panoramic Images from Portable Mobile Mapping – a Perspective for Infrastructure Management

    Get PDF
    Hochaufgelöste, genau georeferenzierte RGB-D-Bilder sind die Grundlage für 3D-Bildräume bzw. 3D Street-View-Webdienste, welche bereits kommerziell für das Infrastrukturmanagement eingesetzt werden. MMS ermöglichen eine schnelle und effiziente Datenerfassung von Infrastrukturen. Die meisten im Aussenraum eingesetzten MMS beruhen auf direkter Georeferenzierung. Diese ermöglicht in offenen Bereichen absolute Genauigkeiten im Zentimeterbereich. Bei GNSS-Abschattung fällt die Genauigkeit der direkten Georeferenzierung jedoch schnell in den Dezimeter- oder sogar in den Meterbereich. In Innenräumen eingesetzte MMS basieren hingegen meist auf SLAM. Die meisten SLAM-Algorithmen wurden jedoch für niedrige Latenzzeiten und für Echtzeitleistung optimiert und nehmen daher Abstriche bei der Genauigkeit, der Kartenqualität und der maximalen Ausdehnung in Kauf. Das Ziel dieser Arbeit ist, hochaufgelöste RGB-D-Bilder in verschiedenen Umgebungen zu erfassen und diese genau und zuverlässig zu georeferenzieren. Für die Datenerfassung wurde ein leistungsstarkes, bildfokussiertes und rucksackgetragenes MMS entwickelt. Dieses besteht aus einer Mehrkopf-Panoramakamera, zwei Multi-Beam LiDAR-Scannern und einer GNSS- und IMU-kombinierten Navigationseinheit der taktischen Leistungsklasse. Alle Sensoren sind präzise synchronisiert und ermöglichen Zugriff auf die Rohdaten. Das Gesamtsystem wurde in Testfeldern mit bündelblockbasierten sowie merkmalsbasierten Methoden kalibriert, was eine Voraussetzung für die Integration kinematischer Sensordaten darstellt. Für eine genaue und zuverlässige Georeferenzierung in verschiedenen Umgebungen wurde ein mehrstufiger Georeferenzierungsansatz entwickelt, welcher verschiedene Sensordaten und Georeferenzierungsmethoden vereint. Direkte und LiDAR SLAM-basierte Georeferenzierung liefern Initialposen für die nachträgliche bildbasierte Georeferenzierung mittels erweiterter SfM-Pipeline. Die bildbasierte Georeferenzierung führt zu einer präzisen aber spärlichen Trajektorie, welche sich für die Georeferenzierung von Bildern eignet. Um eine dichte Trajektorie zu erhalten, die sich auch für die Georeferenzierung von LiDAR-Daten eignet, wurde die direkte Georeferenzierung mit Posen der bildbasierten Georeferenzierung gestützt. Umfassende Leistungsuntersuchungen in drei weiträumigen anspruchsvollen Testgebieten zeigen die Möglichkeiten und Grenzen unseres Georeferenzierungsansatzes. Die drei Testgebiete im Stadtzentrum, im Wald und im Gebäude repräsentieren reale Bedingungen mit eingeschränktem GNSS-Empfang, schlechter Beleuchtung, sich bewegenden Objekten und sich wiederholenden geometrischen Mustern. Die bildbasierte Georeferenzierung erzielte die besten Genauigkeiten, wobei die mittlere Präzision im Bereich von 5 mm bis 7 mm lag. Die absolute Genauigkeit betrug 85 mm bis 131 mm, was einer Verbesserung um Faktor 2 bis 7 gegenüber der direkten und LiDAR SLAM-basierten Georeferenzierung entspricht. Die direkte Georeferenzierung mit CUPT-Stützung von Bildposen der bildbasierten Georeferenzierung, führte zu einer leicht verschlechterten mittleren Präzision im Bereich von 13 mm bis 16 mm, wobei sich die mittlere absolute Genauigkeit nicht signifikant von der bildbasierten Georeferenzierung unterschied. Die in herausfordernden Umgebungen erzielten Genauigkeiten bestätigen frühere Untersuchungen unter optimalen Bedingungen und liegen in derselben Grössenordnung wie die Resultate anderer Forschungsgruppen. Sie können für die Erstellung von Street-View-Services in herausfordernden Umgebungen für das Infrastrukturmanagement verwendet werden. Genau und zuverlässig georeferenzierte RGB-D-Bilder haben ein grosses Potenzial für zukünftige visuelle Lokalisierungs- und AR-Anwendungen
    corecore