65 research outputs found
HGI-SLAM: Loop Closure With Human and Geometric Importance Features
We present Human and Geometric Importance SLAM (HGI-SLAM), a novel approach
to loop closure using salient and geometric features. Loop closure is a key
element of SLAM, with many established methods for this problem. However,
current methods are narrow, using either geometric or salient based features.
We merge their successes into a model that outperforms both types of methods
alone. Our method utilizes inexpensive monocular cameras and does not depend on
depth sensors nor Lidar. HGI-SLAM utilizes geometric and salient features,
processes them into descriptors, and optimizes them for a bag of words
algorithm. By using a concurrent thread and combing our loop closure detection
with ORB-SLAM2, our system is a complete SLAM framework. We present extensive
evaluations of HGI loop detection and HGI-SLAM on the KITTI and EuRoC datasets.
We also provide a qualitative analysis of our features. Our method runs in real
time, and is robust to large viewpoint changes while staying accurate in
organic environments. HGI-SLAM is an end-to-end SLAM system that only requires
monocular vision and is comparable in performance to state-of-the-art SLAM
methods.Comment: 7 pages, 4 figure
Development of Semantic Scene Conversion Model for Image-based Localization at Night
Developing an autonomous vehicle navigation system invariant to illumination change is one of the biggest challenges in vision-based localization field due to the fact that the appearance of an image becomes inconsistent under different light conditions even with the same location. In particular, the night scene images have greatest change in appearance compared to the according day scenes. Moreover, the night images do not have enough information in Image-based localization. To deal with illumination change, image conversion methods have been researched. However, these methods could lose the detail of objects and add fake objects into the output images. In this thesis, we proposed the semantic objects conversion model using the change of local semantic objects by categories at night. This enables the proposed model to obtain the detail of local semantic objects in image conversion. As a result, it is expected that the proposed model has a better result in image-based localization. Our model uses local semantic objects (i.e., traffic signs and street lamps) as categories. The model is composed of two phases as (1) instance segmentation and (2) semantic objects conversion. Instance segmentation is utilized as a detector for local semantic objects. In translation phase, the detected local semantic objects are translated from the appearance of the night image to day image. In evaluation, we prove that models using a set of paired images show higher accuracy compared to the models using a set of unpaired images. Our proposed method will be compared with pix2pix and ToDayGAN. Moreover, the result quantitatively evaluates the best matching score with a query image and the converted images using ORB matching descriptor
Visual slam in dynamic environments
El problema de localización y construcción visual simultánea de mapas (visual SLAM por sus siglas en inglés Simultaneous Localization and Mapping) consiste en localizar una cámara en un mapa que se construye de manera online. Esta tecnología permite la localización de robots en entornos desconocidos y la creación de un mapa de la zona con los sensores que lleva incorporados, es decir, sin contar con ninguna infraestructura externa. A diferencia de los enfoques de odometría en los cuales el movimiento incremental es integrado en el tiempo, un mapa permite que el sensor se localice continuamente en el mismo entorno sin acumular deriva.Asumir que la escena observada es estática es común en los algoritmos de SLAM visual. Aunque la suposición estática es válida para algunas aplicaciones, limita su utilidad en escenas concurridas del mundo real para la conducción autónoma, los robots de servicio o realidad aumentada y virtual entre otros. La detección y el estudio de objetos dinámicos es un requisito para estimar con precisión la posición del sensor y construir mapas estables, útiles para aplicaciones robóticas que operan a largo plazo.Las contribuciones principales de esta tesis son tres: 1. Somos capaces de detectar objetos dinámicos con la ayuda del uso de la segmentación semántica proveniente del aprendizaje profundo y el uso de enfoques de geometría multivisión. Esto nos permite lograr una precisión en la estimación de la trayectoria de la cámara en escenas altamente dinámicas comparable a la que se logra en entornos estáticos, así como construir mapas en 3D que contienen sólo la estructura del entorno estático y estable. 2. Logramos alucinar con imágenes realistas la estructura estática de la escena detrás de los objetos dinámicos. Esto nos permite ofrecer mapas completos con una representación plausible de la escena sin discontinuidades o vacíos ocasionados por las oclusiones de los objetos dinámicos. El reconocimiento visual de lugares también se ve impulsado por estos avances en el procesamiento de imágenes. 3. Desarrollamos un marco conjunto tanto para resolver el problema de SLAM como el seguimiento de múltiples objetos con el fin de obtener un mapa espacio-temporal con información de la trayectoria del sensor y de los alrededores. La comprensión de los objetos dinámicos circundantes es de crucial importancia para los nuevos requisitos de las aplicaciones emergentes de realidad aumentada/virtual o de la navegación autónoma. Estas tres contribuciones hacen avanzar el estado del arte en SLAM visual. Como un producto secundario de nuestra investigación y para el beneficio de la comunidad científica, hemos liberado el código que implementa las soluciones propuestas.<br /
Towards Quantitative Endoscopy with Vision Intelligence
In this thesis, we work on topics related to quantitative endoscopy with vision-based intelligence. Specifically, our works revolve around the topic of video reconstruction in endoscopy, where many challenges exist, such as texture scarceness, illumination variation, multimodality, etc., and these prevent prior works from working effectively and robustly. To this end, we propose to combine the strength of expressivity of deep learning approaches and the rigorousness and accuracy of non-linear optimization algorithms to develop a series of methods to confront such challenges towards quantitative endoscopy. We first propose a retrospective sparse reconstruction method that can estimate a high-accuracy and density point cloud and high-completeness camera trajectory from a monocular endoscopic video with state-of-the-art performance. To enable this, replacing the role of a hand-crafted local descriptor, a deep image feature descriptor is developed to boost the feature matching performance in a typical sparse reconstruction algorithm. A retrospective surface reconstruction pipeline is then proposed to estimate a textured surface model from a monocular endoscopic video, where self-supervised depth and descriptor learning and surface fusion technique is involved. We show that the proposed method performs superior to a popular dense reconstruction method and the estimate reconstructions are in good agreement with the surface models obtained from CT scans. To align video-reconstructed surface models with pre-operative imaging such as CT, we introduce a global point cloud registration algorithm that is robust to resolution mismatch that often happens in such multi-modal scenarios. Specifically, a geometric feature descriptor is developed where a novel network normalization technique is used to help a 3D network produce more consistent and distinctive geometric features for samples with different resolutions. The proposed geometric descriptor achieves state-of-the-art performance, based on our evaluation. Last but not least, a real-time SLAM system that estimates a surface geometry and camera trajectory from a monocular endoscopic video is developed, where deep representations for geometry and appearance and non-linear factor graph optimization are used. We show that the proposed SLAM system performs favorably compared with a state-of-the-art feature-based SLAM system
The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection
Where am I? This is one of the most critical questions that any intelligent
system should answer to decide whether it navigates to a previously visited
area. This problem has long been acknowledged for its challenging nature in
simultaneous localization and mapping (SLAM), wherein the robot needs to
correctly associate the incoming sensory data to the database allowing
consistent map generation. The significant advances in computer vision achieved
over the last 20 years, the increased computational power, and the growing
demand for long-term exploration contributed to efficiently performing such a
complex task with inexpensive perception sensors. In this article, visual loop
closure detection, which formulates a solution based solely on appearance input
data, is surveyed. We start by briefly introducing place recognition and SLAM
concepts in robotics. Then, we describe a loop closure detection system's
structure, covering an extensive collection of topics, including the feature
extraction, the environment representation, the decision-making step, and the
evaluation process. We conclude by discussing open and new research challenges,
particularly concerning the robustness in dynamic environments, the
computational complexity, and scalability in long-term operations. The article
aims to serve as a tutorial and a position paper for newcomers to visual loop
closure detection.Comment: 25 pages, 15 figure
Visual and Camera Sensors
This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors
Visual place recognition for improved open and uncertain navigation
Visual place recognition localises a query place image by comparing it against a reference database of known place images, a fundamental element of robotic navigation.
Recent work focuses on using deep learning to learn image descriptors for this task
that are invariant to appearance changes from dynamic lighting, weather and seasonal
conditions. However, these descriptors: require greater computational resources than
are available on robotic hardware, have few SLAM frameworks designed to utilise
them, return a relative comparison between image descriptors which is difficult to interpret, cannot be used for appearance invariance in other navigation tasks such as
scene classification and are unable to identify query images from an open environment that have no true match in the reference database. This thesis addresses these
challenges with three contributions. The first is a lightweight visual place recognition
descriptor combined with a probabilistic filter to address a subset of the visual SLAM
problem in real-time. The second contribution combines visual place recognition and
scene classification for appearance invariant scene classification, which is extended
to recognise unknown scene classes when navigating an open environment. The final contribution uses comparisons between query and reference image descriptors to
classify whether they result in a true, or false positive localisation and whether a true
match for the query image exists in the reference database.Edinburgh Centre for Robotics and Engineering and Physical Sciences Research Council (EPSRC) fundin
Visual Place Recognition: A Tutorial
Localization is an essential capability for mobile robots. A rapidly growing
field of research in this area is Visual Place Recognition (VPR), which is the
ability to recognize previously seen places in the world based solely on
images. This present work is the first tutorial paper on visual place
recognition. It unifies the terminology of VPR and complements prior research
in two important directions: 1) It provides a systematic introduction for
newcomers to the field, covering topics such as the formulation of the VPR
problem, a general-purpose algorithmic pipeline, an evaluation methodology for
VPR approaches, and the major challenges for VPR and how they may be addressed.
2) As a contribution for researchers acquainted with the VPR problem, it
examines the intricacies of different VPR problem types regarding input, data
processing, and output. The tutorial also discusses the subtleties behind the
evaluation of VPR algorithms, e.g., the evaluation of a VPR system that has to
find all matching database images per query, as opposed to just a single match.
Practical code examples in Python illustrate to prospective practitioners and
researchers how VPR is implemented and evaluated.Comment: IEEE Robotics & Automation Magazine (RAM
- …