2,137 research outputs found

    Fast Image and LiDAR alignment based on 3D rendering in sensor topology

    Get PDF
    Mobile Mapping Systems are now commonly used in large urban acquisition campaigns. They are often equiped with LiDAR sensors and optical cameras, providing very large multimodal datasets. The fusion of both modalities serves different purposes such as point cloud colorization, geometry enhancement or object detection. However, this fusion task cannot be done directly as both modalities are only coarsely registered. This paper presents a fully automatic approach for LiDAR projection and optical image registration refinement based on LiDAR point cloud 3D renderings. First, a coarse 3D mesh is generated from the LiDAR point cloud using the sensor topology. Then, the mesh is rendered in the image domain. After that, a variational approach is used to align the rendering with the optical image. This method achieves high quality results while performing in very low computational time. Results on real data demonstrate the efficiency of the model for aligning LiDAR projections and optical images

    RGB-D And Thermal Sensor Fusion: A Systematic Literature Review

    Full text link
    In the last decade, the computer vision field has seen significant progress in multimodal data fusion and learning, where multiple sensors, including depth, infrared, and visual, are used to capture the environment across diverse spectral ranges. Despite these advancements, there has been no systematic and comprehensive evaluation of fusing RGB-D and thermal modalities to date. While autonomous driving using LiDAR, radar, RGB, and other sensors has garnered substantial research interest, along with the fusion of RGB and depth modalities, the integration of thermal cameras and, specifically, the fusion of RGB-D and thermal data, has received comparatively less attention. This might be partly due to the limited number of publicly available datasets for such applications. This paper provides a comprehensive review of both, state-of-the-art and traditional methods used in fusing RGB-D and thermal camera data for various applications, such as site inspection, human tracking, fault detection, and others. The reviewed literature has been categorised into technical areas, such as 3D reconstruction, segmentation, object detection, available datasets, and other related topics. Following a brief introduction and an overview of the methodology, the study delves into calibration and registration techniques, then examines thermal visualisation and 3D reconstruction, before discussing the application of classic feature-based techniques as well as modern deep learning approaches. The paper concludes with a discourse on current limitations and potential future research directions. It is hoped that this survey will serve as a valuable reference for researchers looking to familiarise themselves with the latest advancements and contribute to the RGB-DT research field.Comment: 33 pages, 20 figure

    When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs)

    Get PDF
    Registration is the process that computes the transformation that aligns sets of data. Commonly, a registration process can be divided into four main steps: target selection, feature extraction, feature matching, and transform computation for the alignment. The accuracy of the result depends on multiple factors, the most significant are the quantity of input data, the presence of noise, outliers and occlusions, the quality of the extracted features, real-time requirements and the type of transformation, especially those ones defined by multiple parameters, like non-rigid deformations. Recent advancements in machine learning could be a turning point in these issues, particularly with the development of deep learning (DL) techniques, which are helping to improve multiple computer vision problems through an abstract understanding of the input data. In this paper, a review of deep learning-based registration methods is presented. We classify the different papers proposing a framework extracted from the traditional registration pipeline to analyse the new learning-based proposal strengths. Deep Registration Networks (DRNs) try to solve the alignment task either replacing part of the traditional pipeline with a network or fully solving the registration problem. The main conclusions extracted are, on the one hand, 1) learning-based registration techniques cannot always be clearly classified in the traditional pipeline. 2) These approaches allow more complex inputs like conceptual models as well as the traditional 3D datasets. 3) In spite of the generality of learning, the current proposals are still ad hoc solutions. Finally, 4) this is a young topic that still requires a large effort to reach general solutions able to cope with the problems that affect traditional approaches.Comment: Submitted to Pattern Recognitio

    Planar Shape Based Registration for Multi-modal Geometry

    Get PDF
    International audienceWe present a global registration algorithm for multi-modal geometric data, typically 3D point clouds and meshes. Existing feature-based methods and recent deep learning based approaches typically rely upon point-to-point matching strategies that often fail to deliver accurate results from defect-laden data. In contrast, we reason at the scale of planar shapes whose detection from input data offers robustness on a range of defects, from noise to outliers through heterogeneous sampling. The detected planar shapes are projected into an accumulation space from which a rotational alignment is operated. A second step then refines the result with a local continuous optimization which also estimates the scale. We demonstrate the robustness and efficacy of our algorithm on challenging real-world data. In particular, we show that our algorithm competes well against state-of-the-art methods, especially on piece-wise planar objects and scenes

    Learning to extract features for 2D – 3D multimodal registration

    Get PDF
    The ability to capture depth information form an scene has greatly increased in the recent years. 3D sensors, traditionally high cost and low resolution sensors, are being democratized and 3D scans of indoor and outdoor scenes are becoming more and more common. However, there is still a great data gap between the amount of captures being performed with 2D and 3D sensors. Although the 3D sensors provide more information about the scene, 2D sensors are still more accessible and widely used. This trade-off between availability and information between sensors brings us to a multimodal scenario of mixed 2D and 3D data. This thesis explores the fundamental block of this multimodal scenario: the registration between a single 2D image and a single unorganized point cloud. An unorganized 3D point cloud is the basic representation of a 3D capture. In this representation the surveyed points are represented only by their real word coordinates and, optionally, by their colour information. This simplistic representation brings multiple challenges to the registration, since most of the state of the art works leverage the existence of metadata about the scene or prior knowledges. Two different techniques are explored to perform the registration: a keypoint-based technique and an edge-based technique. The keypoint-based technique estimates the transformation by means of correspondences detected using Deep Learning, whilst the edge-based technique refines a transformation using a multimodal edge detection to establish anchor points to perform the estimation. An extensive evaluation of the proposed methodologies is performed. Albeit further research is needed to achieve adequate performances, the obtained results show the potential of the usage of deep learning techniques to learn 2D and 3D similarities. The results also show the good performance of the proposed 2D-3D iterative refinement, up to the state of the art on 3D-3D registration.La capacitat de captar informació de profunditat d’una escena ha augmentat molt els darrers anys. Els sensors 3D, tradicionalment d’alt cost i baixa resolució, s’estan democratitzant i escànners 3D d’escents interiors i exteriors són cada vegada més comuns. Tot i això, encara hi ha una gran bretxa entre la quantitat de captures que s’estan realitzant amb sensors 2D i 3D. Tot i que els sensors 3D proporcionen més informació sobre l’escena, els sensors 2D encara són més accessibles i àmpliament utilitzats. Aquesta diferència entre la disponibilitat i la informació entre els sensors ens porta a un escenari multimodal de dades mixtes 2D i 3D. Aquesta tesi explora el bloc fonamental d’aquest escenari multimodal: el registre entre una sola imatge 2D i un sol núvol de punts no organitzat. Un núvol de punts 3D no organitzat és la representació bàsica d’una captura en 3D. En aquesta representació, els punts mesurats es representen només per les seves coordenades i, opcionalment, per la informació de color. Aquesta representació simplista aporta múltiples reptes al registre, ja que la majoria dels algoritmes aprofiten l’existència de metadades sobre l’escena o coneixements previs. Per realitzar el registre s’exploren dues tècniques diferents: una tècnica basada en punts clau i una tècnica basada en contorns. La tècnica basada en punts clau estima la transformació mitjançant correspondències detectades mitjançant Deep Learning, mentre que la tècnica basada en contorns refina una transformació mitjançant una detecció multimodal de la vora per establir punts d’ancoratge per realitzar l’estimació. Es fa una avaluació àmplia de les metodologies proposades. Tot i que es necessita més investigació per obtenir un rendiment adequat, els resultats obtinguts mostren el potencial de l’ús de tècniques d’aprenentatge profund per aprendre similituds 2D i 3D. Els resultats també mostren l’excel·lent rendiment del perfeccionament iteratiu 2D-3D proposat, similar al dels algoritmes de registre 3D-3D.La capacidad de captar información de profundidad de una escena ha aumentado mucho en los últimos años. Los sensores 3D, tradicionalmente de alto costo y baja resolución, se están democratizando y escáneres 3D de escents interiores y exteriores son cada vez más comunes. Sin embargo, todavía hay una gran brecha entre la cantidad de capturas que se están realizando con sensores 2D y 3D. Aunque los sensores 3D proporcionan más información sobre la escena, los sensores 2D todavía son más accesibles y ampliamente utilizados. Esta diferencia entre la disponibilidad y la información entre los sensores nos lleva a un escenario multimodal de datos mixtos 2D y 3D. Esta tesis explora el bloque fundamental de este escenario multimodal: el registro entre una sola imagen 2D y una sola nube de puntos no organizado. Una nube de puntos 3D no organizado es la representación básica de una captura en 3D. En esta representación, los puntos medidos se representan sólo por sus coordenadas y, opcionalmente, por la información de color. Esta representación simplista aporta múltiples retos en el registro, ya que la mayoría de los algoritmos aprovechan la existencia de metadatos sobre la escena o conocimientos previos. Para realizar el registro se exploran dos técnicas diferentes: una técnica basada en puntos clave y una técnica basada en contornos. La técnica basada en puntos clave estima la transformación mediante correspondencias detectadas mediante Deep Learning, mientras que la técnica basada en contornos refina una transformación mediante una detección multimodal del borde para establecer puntos de anclaje para realizar la estimación. Se hace una evaluación amplia de las metodologías propuestas. Aunque se necesita más investigación para obtener un rendimiento adecuado, los resultados obtenidos muestran el potencial del uso de técnicas de aprendizaje profundo para aprender similitudes 2D y 3D. Los resultados también muestran el excelente rendimiento del perfeccionamiento iterativo 2D-3D propuesto, similar al de los algoritmos de registro 3D-3D
    • …