238 research outputs found
Present and Future of SLAM in Extreme Underground Environments
This paper reports on the state of the art in underground SLAM by discussing
different SLAM strategies and results across six teams that participated in the
three-year-long SubT competition. In particular, the paper has four main goals.
First, we review the algorithms, architectures, and systems adopted by the
teams; particular emphasis is put on lidar-centric SLAM solutions (the go-to
approach for virtually all teams in the competition), heterogeneous multi-robot
operation (including both aerial and ground robots), and real-world underground
operation (from the presence of obscurants to the need to handle tight
computational constraints). We do not shy away from discussing the dirty
details behind the different SubT SLAM systems, which are often omitted from
technical papers. Second, we discuss the maturity of the field by highlighting
what is possible with the current SLAM systems and what we believe is within
reach with some good systems engineering. Third, we outline what we believe are
fundamental open problems, that are likely to require further research to break
through. Finally, we provide a list of open-source SLAM implementations and
datasets that have been produced during the SubT challenge and related efforts,
and constitute a useful resource for researchers and practitioners.Comment: 21 pages including references. This survey paper is submitted to IEEE
Transactions on Robotics for pre-approva
G3Reg: Pyramid Graph-based Global Registration using Gaussian Ellipsoid Model
This study introduces a novel framework, G3Reg, for fast and robust global
registration of LiDAR point clouds. In contrast to conventional complex
keypoints and descriptors, we extract fundamental geometric primitives
including planes, clusters, and lines (PCL) from the raw point cloud to obtain
low-level semantic segments. Each segment is formulated as a unified Gaussian
Ellipsoid Model (GEM) by employing a probability ellipsoid to ensure the ground
truth centers are encompassed with a certain degree of probability. Utilizing
these GEMs, we then present a distrust-and-verify scheme based on a Pyramid
Compatibility Graph for Global Registration (PAGOR). Specifically, we establish
an upper bound, which can be traversed based on the confidence level for
compatibility testing to construct the pyramid graph. Gradually, we solve
multiple maximum cliques (MAC) for each level of the graph, generating numerous
transformation candidates. In the verification phase, we adopt a precise and
efficient metric for point cloud alignment quality, founded on geometric
primitives, to identify the optimal candidate. The performance of the algorithm
is extensively validated on three publicly available datasets and a
self-collected multi-session dataset, without changing any parameter settings
in the experimental evaluation. The results exhibit superior robustness and
real-time performance of the G3Reg framework compared to state-of-the-art
methods. Furthermore, we demonstrate the potential for integrating individual
GEM and PAGOR components into other algorithmic frameworks to enhance their
efficacy. To advance further research and promote community understanding, we
have publicly shared the source code.Comment: Under revie
The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection
Where am I? This is one of the most critical questions that any intelligent
system should answer to decide whether it navigates to a previously visited
area. This problem has long been acknowledged for its challenging nature in
simultaneous localization and mapping (SLAM), wherein the robot needs to
correctly associate the incoming sensory data to the database allowing
consistent map generation. The significant advances in computer vision achieved
over the last 20 years, the increased computational power, and the growing
demand for long-term exploration contributed to efficiently performing such a
complex task with inexpensive perception sensors. In this article, visual loop
closure detection, which formulates a solution based solely on appearance input
data, is surveyed. We start by briefly introducing place recognition and SLAM
concepts in robotics. Then, we describe a loop closure detection system's
structure, covering an extensive collection of topics, including the feature
extraction, the environment representation, the decision-making step, and the
evaluation process. We conclude by discussing open and new research challenges,
particularly concerning the robustness in dynamic environments, the
computational complexity, and scalability in long-term operations. The article
aims to serve as a tutorial and a position paper for newcomers to visual loop
closure detection.Comment: 25 pages, 15 figure
DeepTIO: a deep thermal-inertial odometry with visual hallucination
This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordVisual odometry shows excellent performance in a wide range of environments. However, in visually-denied scenarios (e.g. heavy smoke or darkness), pose estimates degrade or even fail. Thermal cameras are commonly used for perception and inspection when the environment has low visibility. However, their use in odometry estimation is hampered by the lack of robust visual features. In part, this is as a result of the sensor measuring the ambient temperature profile rather than scene appearance and geometry. To overcome this issue, we propose a Deep Neural Network model for thermal-inertial odometry (DeepTIO) by incorporating a visual hallucination network to provide the thermal network with complementary information. The hallucination network is taught to predict fake visual features from thermal images by using Huber loss. We also employ selective fusion to attentively fuse the features from three different modalities, i.e thermal, hallucination, and inertial features. Extensive experiments are performed in hand-held and mobile robot data in benign and smoke-filled environments, showing the efficacy of the proposed model
Event-Based Algorithms For Geometric Computer Vision
Event cameras are novel bio-inspired sensors which mimic the function of the human retina. Rather than directly capturing intensities to form synchronous images as in traditional cameras, event cameras asynchronously detect changes in log image intensity. When such a change is detected at a given pixel, the change is immediately sent to the host computer, where each event consists of the x,y pixel position of the change, a timestamp, accurate to tens of microseconds, and a polarity, indicating whether the pixel got brighter or darker. These cameras provide a number of useful benefits over traditional cameras, including the ability to track extremely fast motions, high dynamic range, and low power consumption.
However, with a new sensing modality comes the need to develop novel algorithms. As these cameras do not capture photometric intensities, novel loss functions must be developed to replace the photoconsistency assumption which serves as the backbone of many classical computer vision algorithms. In addition, the relative novelty of these sensors means that there does not exist the wealth of data available for traditional images with which we can train learning based methods such as deep neural networks.
In this work, we address both of these issues with two foundational principles. First, we show that the motion blur induced when the events are projected into the 2D image plane can be used as a suitable substitute for the classical photometric loss function. Second, we develop self-supervised learning methods which allow us to train convolutional neural networks to estimate motion without any labeled training data. We apply these principles to solve classical perception problems such as feature tracking, visual inertial odometry, optical flow and stereo depth estimation, as well as recognition tasks such as object detection and human pose estimation. We show that these solutions are able to utilize the benefits of event cameras, allowing us to operate in fast moving scenes with challenging lighting which would be incredibly difficult for traditional cameras
Visual slam in dynamic environments
El problema de localización y construcción visual simultánea de mapas (visual SLAM por sus siglas en inglés Simultaneous Localization and Mapping) consiste en localizar una cámara en un mapa que se construye de manera online. Esta tecnología permite la localización de robots en entornos desconocidos y la creación de un mapa de la zona con los sensores que lleva incorporados, es decir, sin contar con ninguna infraestructura externa. A diferencia de los enfoques de odometría en los cuales el movimiento incremental es integrado en el tiempo, un mapa permite que el sensor se localice continuamente en el mismo entorno sin acumular deriva.Asumir que la escena observada es estática es común en los algoritmos de SLAM visual. Aunque la suposición estática es válida para algunas aplicaciones, limita su utilidad en escenas concurridas del mundo real para la conducción autónoma, los robots de servicio o realidad aumentada y virtual entre otros. La detección y el estudio de objetos dinámicos es un requisito para estimar con precisión la posición del sensor y construir mapas estables, útiles para aplicaciones robóticas que operan a largo plazo.Las contribuciones principales de esta tesis son tres: 1. Somos capaces de detectar objetos dinámicos con la ayuda del uso de la segmentación semántica proveniente del aprendizaje profundo y el uso de enfoques de geometría multivisión. Esto nos permite lograr una precisión en la estimación de la trayectoria de la cámara en escenas altamente dinámicas comparable a la que se logra en entornos estáticos, así como construir mapas en 3D que contienen sólo la estructura del entorno estático y estable. 2. Logramos alucinar con imágenes realistas la estructura estática de la escena detrás de los objetos dinámicos. Esto nos permite ofrecer mapas completos con una representación plausible de la escena sin discontinuidades o vacíos ocasionados por las oclusiones de los objetos dinámicos. El reconocimiento visual de lugares también se ve impulsado por estos avances en el procesamiento de imágenes. 3. Desarrollamos un marco conjunto tanto para resolver el problema de SLAM como el seguimiento de múltiples objetos con el fin de obtener un mapa espacio-temporal con información de la trayectoria del sensor y de los alrededores. La comprensión de los objetos dinámicos circundantes es de crucial importancia para los nuevos requisitos de las aplicaciones emergentes de realidad aumentada/virtual o de la navegación autónoma. Estas tres contribuciones hacen avanzar el estado del arte en SLAM visual. Como un producto secundario de nuestra investigación y para el beneficio de la comunidad científica, hemos liberado el código que implementa las soluciones propuestas.<br /
- …