57 research outputs found

    Visual-LiDAR SLAM Based on Unsupervised Multi-channel Deep Neural Networks

    Get PDF
    Recently, deep learning techniques have been applied to solve visual or light detection and ranging (LiDAR) simultaneous localization and mapping (SLAM) problems. Supervised deep learning SLAM methods need ground truth data for training, but collecting such data is costly and labour-intensive. Unsupervised training strategies have been adopted by some visual or LiDAR SLAM methods. However, these methods only exploit the potential of single-sensor modalities, which do not take the complementary advantages of LiDAR and visual data. In this paper, we propose a novel unsupervised multi-channel visual-LiDAR SLAM method (MVL-SLAM) which can fuse visual and LiDAR data together. Our SLAM system consists of an unsupervised multi-channel visual-LiDAR odometry (MVLO) component, a deep learning–based loop closure detection component, and a 3D mapping component. The visual-LiDAR odometry component adopts a multi-channel recurrent convolutional neural network (RCNN). Its input consists of front, left, and right view depth images generated from 360 ∘ 3D LiDAR data and RGB images. We use the features from a deep convolutional neural network (CNN) for the loop closure detection component. Our SLAM method does not require ground truth data for training and can directly construct environmental 3D maps from the 3D mapping component. Experiments conducted on the KITTI odometry dataset have shown the rotation and translation errors are lower than some of the other unsupervised methods, including UnMono, SfmLearner, DeepSLAM, and UnDeepVO. Experimental results show that our methods have good performance. By fusing visual and LiDAR data, MVL-SLAM has higher accuracy and robustness of the pose estimation compared with other single-modal SLAM systems

    SIVO: Semantically Informed Visual Odometry and Mapping

    Get PDF
    Accurate localization is a requirement for any autonomous mobile robot. In recent years, cameras have proven to be a reliable, cheap, and effective sensor to achieve this goal. Visual simultaneous localization and mapping (SLAM) algorithms determine camera motion by tracking the motion of reference points from the scene. However, these references must be static, as well as viewpoint, scale, and rotation invariant in order to ensure accurate localization. This is especially paramount for long-term robot operation, where we require our references to be stable over long durations and also require careful point selection to maintain the runtime and storage complexity of the algorithm while the robot navigates through its environment. In this thesis, we present SIVO (Semantically Informed Visual Odometry and Mapping), a novel feature selection method for visual SLAM which incorporates machine learning and neural network uncertainty into an information-theoretic approach to feature selection. The emergence of deep learning techniques has resulted in remarkable advances in scene understanding, and our method supplements traditional visual SLAM with this contextual knowledge. Our algorithm selects points which provide significant information to reduce the uncertainty of the state estimate while ensuring that the feature is detected to be a static object repeatedly, with a high confidence. This is done by evaluating the reduction in Shannon entropy between the current state entropy, and the joint entropy of the state given the addition of the new feature with the classification entropy of the feature from a Bayesian neural network. Our method is evaluated against ORB SLAM2 and the ground truth of the KITTI odometry dataset. Overall, SIVO performs comparably to ORB SLAM2 (average of 0.17% translation error difference, 6.2 × 10 −5 deg/m rotation error difference) while removing 69% of the map points on average. As the reference points selected are from static objects (building, traffic signs, etc.), the map generated using our algorithm is suitable for long-term localization

    Object-level dynamic SLAM

    Get PDF
    Visual Simultaneous Localisation and Mapping (SLAM) can estimate a camera's pose in an unknown environment and reconstruct an online map of it. Despite the advances in many real-time dense SLAM systems, most still assume a static environment, which is not a valid assumption in many real-world scenarios. This thesis aims to enable dense visual SLAM to run robustly in a dynamic environment, knowing where the sensor is in the environment, and, also importantly, what and where objects are in the surrounding environment for better scene understanding. The contributions in this thesis are threefold. The first one presents one of the first object-level dynamic SLAM systems that robustly track camera pose while detecting, tracking, and reconstructing all the objects in dynamic scenes. It can continuously fuse geometric, semantic, and motion information for each object into an octree-based volumetric representation. One of the challenges in tracking moving objects is that the object motion can easily break the illumination constancy assumption. In our second contribution, we address this issue by proposing a dense feature-metric alignment to robustly estimate camera and object poses. We will show how to learn dense feature maps and feature-metric uncertainties in a self-supervised way. They formulate a probabilistic feature-metric residual, which can be efficiently solved using Gauss-Newton optimisation and easily coupled with other residuals. So far, we can only reconstruct objects' geometry from the sensor data. Our third contribution further incorporates category-level shape prior to the object mapping. Conditioning on the depth measurement, the learned implicit function completes the unseen part while reconstructing the observed part accurately. It can yield better reconstruction completeness and more accurate object pose estimation. These three contributions in this thesis have advanced the state of the art in visual SLAM. We hope such object-level dynamic SLAM systems will help robots intelligently interact with the human-existing world.Open Acces

    Long-Term Simultaneous Localization and Mapping in Dynamic Environments.

    Full text link
    One of the core competencies required for autonomous mobile robotics is the ability to use sensors to perceive the environment. From this noisy sensor data, the robot must build a representation of the environment and localize itself within this representation. This process, known as simultaneous localization and mapping (SLAM), is a prerequisite for almost all higher-level autonomous behavior in mobile robotics. By associating the robot's sensory observations as it moves through the environment, and by observing the robot's ego-motion through proprioceptive sensors, constraints are placed on the trajectory of the robot and the configuration of the environment. This results in a probabilistic optimization problem to find the most likely robot trajectory and environment configuration given all of the robot's previous sensory experience. SLAM has been well studied under the assumptions that the robot operates for a relatively short time period and that the environment is essentially static during operation. However, performing SLAM over long time periods while modeling the dynamic changes in the environment remains a challenge. The goal of this thesis is to extend the capabilities of SLAM to enable long-term autonomous operation in dynamic environments. The contribution of this thesis has three main components: First, we propose a framework for controlling the computational complexity of the SLAM optimization problem so that it does not grow unbounded with exploration time. Second, we present a method to learn visual feature descriptors that are more robust to changes in lighting, allowing for improved data association in dynamic environments. Finally, we use the proposed tools in SLAM systems that explicitly models the dynamics of the environment in the map by representing each location as a set of example views that capture how the location changes with time. We experimentally demonstrate that the proposed methods enable long-term SLAM in dynamic environments using a large, real-world vision and LIDAR dataset collected over the course of more than a year. This dataset captures a wide variety of dynamics: from short-term scene changes including moving people, cars, changing lighting, and weather conditions; to long-term dynamics including seasonal conditions and structural changes caused by construction.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111538/1/carlevar_1.pd

    Simultaneous Localization and Mapping (SLAM) for Autonomous Driving: Concept and Analysis

    Get PDF
    The Simultaneous Localization and Mapping (SLAM) technique has achieved astonishing progress over the last few decades and has generated considerable interest in the autonomous driving community. With its conceptual roots in navigation and mapping, SLAM outperforms some traditional positioning and localization techniques since it can support more reliable and robust localization, planning, and controlling to meet some key criteria for autonomous driving. In this study the authors first give an overview of the different SLAM implementation approaches and then discuss the applications of SLAM for autonomous driving with respect to different driving scenarios, vehicle system components and the characteristics of the SLAM approaches. The authors then discuss some challenging issues and current solutions when applying SLAM for autonomous driving. Some quantitative quality analysis means to evaluate the characteristics and performance of SLAM systems and to monitor the risk in SLAM estimation are reviewed. In addition, this study describes a real-world road test to demonstrate a multi-sensor-based modernized SLAM procedure for autonomous driving. The numerical results show that a high-precision 3D point cloud map can be generated by the SLAM procedure with the integration of Lidar and GNSS/INS. Online four–five cm accuracy localization solution can be achieved based on this pre-generated map and online Lidar scan matching with a tightly fused inertial system

    Deep Learning for 3D Visual Perception

    Get PDF
    La percepción visual 3D se refiere al conjunto de problemas que engloban la reunión de información a través de un sensor visual y la estimación la posición tridimensional y estructura de los objetos y formaciones al rededor del sensor. Algunas funcionalidades como la estimación de la ego moción o construcción de mapas are esenciales para otras tareas de más alto nivel como conducción autónoma o realidad aumentada. En esta tesis se han atacado varios desafíos en la percepción 3D, todos ellos útiles desde la perspectiva de SLAM (Localización y Mapeo Simultáneos) que en si es un problema de percepción 3D.Localización y Mapeo Simultáneos –SLAM– busca realizar el seguimiento de la posición de un dispositivo (por ejemplo de un robot, un teléfono o unas gafas de realidad virtual) con respecto al mapa que está construyendo simultáneamente mientras la plataforma explora el entorno. SLAM es una tecnología muy relevante en distintas aplicaciones como realidad virtual, realidad aumentada o conducción autónoma. SLAM Visual es el termino utilizado para referirse al problema de SLAM resuelto utilizando unicamente sensores visuales. Muchas de las piezas del sistema ideal de SLAM son, hoy en día, bien conocidas, maduras y en muchos casos presentes en aplicaciones. Sin embargo, hay otras piezas que todavía presentan desafíos de investigación significantes. En particular, en los que hemos trabajado en esta tesis son la estimación de la estructura 3D al rededor de una cámara a partir de una sola imagen, reconocimiento de lugares ya visitados bajo cambios de apariencia drásticos, reconstrucción de alto nivel o SLAM en entornos dinámicos; todos ellos utilizando redes neuronales profundas.Estimación de profundidad monocular is la tarea de percibir la distancia a la cámara de cada uno de los pixeles en la imagen, utilizando solo la información que obtenemos de una única imagen. Este es un problema mal condicionado, y por lo tanto es muy difícil de inferir la profundidad exacta de los puntos en una sola imagen. Requiere conocimiento de lo que se ve y del sensor que utilizamos. Por ejemplo, si podemos saber que un modelo de coche tiene cierta altura y también sabemos el tipo de cámara que hemos utilizado (distancia focal, tamaño de pixel...); podemos decir que si ese coche tiene cierta altura en la imagen, por ejemplo 50 pixeles, esta a cierta distancia de la cámara. Para ello nosotros presentamos el primer trabajo capaz de estimar profundidad a partir de una sola vista que es capaz de obtener un funcionamiento razonable con múltiples tipos de cámara; como un teléfono o una cámara de video.También presentamos como estimar, utilizando una sola imagen, la estructura de una habitación o el plan de la habitación. Para este segundo trabajo, aprovechamos imágenes esféricas tomadas por una cámara panorámica utilizando una representación equirectangular. Utilizando estas imágenes recuperamos el plan de la habitación, nuestro objetivo es reconocer las pistas en la imagen que definen la estructura de una habitación. Nos centramos en recuperar la versión más simple, que son las lineas que separan suelo, paredes y techo.Localización y mapeo a largo plazo requiere dar solución a los cambios de apariencia en el entorno; el efecto que puede tener en una imagen tomarla en invierno o verano puede ser muy grande. Introducimos un modelo multivista invariante a cambios de apariencia que resuelve el problema de reconocimiento de lugares de forma robusta. El reconocimiento de lugares visual trata de identificar un lugar que ya hemos visitado asociando pistas visuales que se ven en las imágenes; la tomada en el pasado y la tomada en el presente. Lo preferible es ser invariante a cambios en punto de vista, iluminación, objetos dinámicos y cambios de apariencia a largo plazo como el día y la noche, las estaciones o el clima.Para tener funcionalidad a largo plazo también presentamos DynaSLAM, un sistema de SLAM que distingue las partes estáticas y dinámicas de la escena. Se asegura de estimar su posición unicamente basándose en las partes estáticas y solo reconstruye el mapa de las partes estáticas. De forma que si visitamos una escena de nuevo, nuestro mapa no se ve afectado por la presencia de nuevos objetos dinámicos o la desaparición de los anteriores.En resumen, en esta tesis contribuimos a diferentes problemas de percepción 3D; todos ellos resuelven problemas del SLAM Visual.<br /

    Deep Learning for Depth, Ego-Motion, Optical Flow Estimation, and Semantic Segmentation

    Get PDF
    Visual Simultaneous Localization and Mapping (SLAM) is crucial for robot perception. Visual odometry (VO) is one of the essential components for SLAM, which can estimate the depth map of scenes and the ego-motion of a camera in unknown environments. Most previous work in this area uses geometry-based approaches. Recently, deep learning methods have opened a new door for this area. At present, most research under deep learning frameworks focuses on improving the accuracy of estimation results and reducing the dependence of enormous labelled training data. This thesis presents the work for exploring the deep learning technologies to estimate different tasks, such as depth, ego-motion, optical flow, and semantic segmentation, under the VO framework. Firstly, a stacked generative adversarial network is proposed to estimate the depth and ego-motion. It consists of a stack of GAN layers, of which the lowest layer estimates the depth and ego-motion while the higher layers estimate the spatial features. It can also capture the temporal dynamics due to the use of a recurrent representation across the layers. Secondly, digging into the internal network structure design, a novel recurrent spatial-temporal network(RSTNet)is proposed to estimate depth and ego-motion and optical flow and dynamic objects. This network can extract and retain more spatial and temporal features. Thedynamicobjectsaredetectedbyusingopticalflowdifferencebetweenfullflow and rigid flow. Finally, a semantic segmentation network is proposed, producing semantic segmentation results together with depth and ego-motion estimation results. All of the proposed contributions are tested and evaluated on open public datasets. The comparisons with other methods are provided. The results show that our proposed networks outperform the state-of-the-art methods of depth, ego-motion, and dynamic objects estimations

    Pre-Trained Driving in Localized Surroundings with Semantic Radar Information and Machine Learning

    Get PDF
    Entlang der Signalverarbeitungskette von Radar Detektionen bis zur Fahrzeugansteuerung, diskutiert diese Arbeit eine semantischen Radar Segmentierung, einen darauf aufbauenden Radar SLAM, sowie eine im Verbund realisierte autonome Parkfunktion. Die Radarsegmentierung der (statischen) Umgebung wird durch ein Radar-spezifisches neuronales Netzwerk RadarNet erreicht. Diese Segmentierung ermöglicht die Entwicklung des semantischen Radar Graph-SLAM SERALOC. Auf der Grundlage der semantischen Radar SLAM Karte wird eine beispielhafte autonome Parkfunktionalität in einem realen Versuchsträger umgesetzt. Entlang eines aufgezeichneten Referenzfades parkt die Funktion ausschließlich auf Basis der Radar Wahrnehmung mit bisher unerreichter Positioniergenauigkeit. Im ersten Schritt wird ein Datensatz von 8.2 · 10^6 punktweise semantisch gelabelten Radarpunktwolken über eine Strecke von 2507.35m generiert. Es sind keine vergleichbaren Datensätze dieser Annotationsebene und Radarspezifikation öffentlich verfügbar. Das überwachte Training der semantischen Segmentierung RadarNet erreicht 28.97% mIoU auf sechs Klassen. Außerdem wird ein automatisiertes Radar-Labeling-Framework SeRaLF vorgestellt, welches das Radarlabeling multimodal mittels Referenzkameras und LiDAR unterstützt. Für die kohärente Kartierung wird ein Radarsignal-Vorfilter auf der Grundlage einer Aktivierungskarte entworfen, welcher Rauschen und andere dynamische Mehrwegreflektionen unterdrückt. Ein speziell für Radar angepasstes Graph-SLAM-Frontend mit Radar-Odometrie Kanten zwischen Teil-Karten und semantisch separater NDT Registrierung setzt die vorgefilterten semantischen Radarscans zu einer konsistenten metrischen Karte zusammen. Die Kartierungsgenauigkeit und die Datenassoziation werden somit erhöht und der erste semantische Radar Graph-SLAM für beliebige statische Umgebungen realisiert. Integriert in ein reales Testfahrzeug, wird das Zusammenspiel der live RadarNet Segmentierung und des semantischen Radar Graph-SLAM anhand einer rein Radar-basierten autonomen Parkfunktionalität evaluiert. Im Durchschnitt über 42 autonome Parkmanöver (∅3.73 km/h) bei durchschnittlicher Manöverlänge von ∅172.75m wird ein Median absoluter Posenfehler von 0.235m und End-Posenfehler von 0.2443m erreicht, der vergleichbare Radar-Lokalisierungsergebnisse um ≈ 50% übertrifft. Die Kartengenauigkeit von veränderlichen, neukartierten Orten über eine Kartierungsdistanz von ∅165m ergibt eine ≈ 56%-ige Kartenkonsistenz bei einer Abweichung von ∅0.163m. Für das autonome Parken wurde ein gegebener Trajektorienplaner und Regleransatz verwendet
    • …
    corecore