65 research outputs found

    HGI-SLAM: Loop Closure With Human and Geometric Importance Features

    Full text link
    We present Human and Geometric Importance SLAM (HGI-SLAM), a novel approach to loop closure using salient and geometric features. Loop closure is a key element of SLAM, with many established methods for this problem. However, current methods are narrow, using either geometric or salient based features. We merge their successes into a model that outperforms both types of methods alone. Our method utilizes inexpensive monocular cameras and does not depend on depth sensors nor Lidar. HGI-SLAM utilizes geometric and salient features, processes them into descriptors, and optimizes them for a bag of words algorithm. By using a concurrent thread and combing our loop closure detection with ORB-SLAM2, our system is a complete SLAM framework. We present extensive evaluations of HGI loop detection and HGI-SLAM on the KITTI and EuRoC datasets. We also provide a qualitative analysis of our features. Our method runs in real time, and is robust to large viewpoint changes while staying accurate in organic environments. HGI-SLAM is an end-to-end SLAM system that only requires monocular vision and is comparable in performance to state-of-the-art SLAM methods.Comment: 7 pages, 4 figure

    Development of Semantic Scene Conversion Model for Image-based Localization at Night

    Get PDF
    Developing an autonomous vehicle navigation system invariant to illumination change is one of the biggest challenges in vision-based localization field due to the fact that the appearance of an image becomes inconsistent under different light conditions even with the same location. In particular, the night scene images have greatest change in appearance compared to the according day scenes. Moreover, the night images do not have enough information in Image-based localization. To deal with illumination change, image conversion methods have been researched. However, these methods could lose the detail of objects and add fake objects into the output images. In this thesis, we proposed the semantic objects conversion model using the change of local semantic objects by categories at night. This enables the proposed model to obtain the detail of local semantic objects in image conversion. As a result, it is expected that the proposed model has a better result in image-based localization. Our model uses local semantic objects (i.e., traffic signs and street lamps) as categories. The model is composed of two phases as (1) instance segmentation and (2) semantic objects conversion. Instance segmentation is utilized as a detector for local semantic objects. In translation phase, the detected local semantic objects are translated from the appearance of the night image to day image. In evaluation, we prove that models using a set of paired images show higher accuracy compared to the models using a set of unpaired images. Our proposed method will be compared with pix2pix and ToDayGAN. Moreover, the result quantitatively evaluates the best matching score with a query image and the converted images using ORB matching descriptor

    Visual slam in dynamic environments

    Get PDF
    El problema de localización y construcción visual simultánea de mapas (visual SLAM por sus siglas en inglés Simultaneous Localization and Mapping) consiste en localizar una cámara en un mapa que se construye de manera online. Esta tecnología permite la localización de robots en entornos desconocidos y la creación de un mapa de la zona con los sensores que lleva incorporados, es decir, sin contar con ninguna infraestructura externa. A diferencia de los enfoques de odometría en los cuales el movimiento incremental es integrado en el tiempo, un mapa permite que el sensor se localice continuamente en el mismo entorno sin acumular deriva.Asumir que la escena observada es estática es común en los algoritmos de SLAM visual. Aunque la suposición estática es válida para algunas aplicaciones, limita su utilidad en escenas concurridas del mundo real para la conducción autónoma, los robots de servicio o realidad aumentada y virtual entre otros. La detección y el estudio de objetos dinámicos es un requisito para estimar con precisión la posición del sensor y construir mapas estables, útiles para aplicaciones robóticas que operan a largo plazo.Las contribuciones principales de esta tesis son tres: 1. Somos capaces de detectar objetos dinámicos con la ayuda del uso de la segmentación semántica proveniente del aprendizaje profundo y el uso de enfoques de geometría multivisión. Esto nos permite lograr una precisión en la estimación de la trayectoria de la cámara en escenas altamente dinámicas comparable a la que se logra en entornos estáticos, así como construir mapas en 3D que contienen sólo la estructura del entorno estático y estable. 2. Logramos alucinar con imágenes realistas la estructura estática de la escena detrás de los objetos dinámicos. Esto nos permite ofrecer mapas completos con una representación plausible de la escena sin discontinuidades o vacíos ocasionados por las oclusiones de los objetos dinámicos. El reconocimiento visual de lugares también se ve impulsado por estos avances en el procesamiento de imágenes. 3. Desarrollamos un marco conjunto tanto para resolver el problema de SLAM como el seguimiento de múltiples objetos con el fin de obtener un mapa espacio-temporal con información de la trayectoria del sensor y de los alrededores. La comprensión de los objetos dinámicos circundantes es de crucial importancia para los nuevos requisitos de las aplicaciones emergentes de realidad aumentada/virtual o de la navegación autónoma. Estas tres contribuciones hacen avanzar el estado del arte en SLAM visual. Como un producto secundario de nuestra investigación y para el beneficio de la comunidad científica, hemos liberado el código que implementa las soluciones propuestas.<br /

    Towards Quantitative Endoscopy with Vision Intelligence

    Get PDF
    In this thesis, we work on topics related to quantitative endoscopy with vision-based intelligence. Specifically, our works revolve around the topic of video reconstruction in endoscopy, where many challenges exist, such as texture scarceness, illumination variation, multimodality, etc., and these prevent prior works from working effectively and robustly. To this end, we propose to combine the strength of expressivity of deep learning approaches and the rigorousness and accuracy of non-linear optimization algorithms to develop a series of methods to confront such challenges towards quantitative endoscopy. We first propose a retrospective sparse reconstruction method that can estimate a high-accuracy and density point cloud and high-completeness camera trajectory from a monocular endoscopic video with state-of-the-art performance. To enable this, replacing the role of a hand-crafted local descriptor, a deep image feature descriptor is developed to boost the feature matching performance in a typical sparse reconstruction algorithm. A retrospective surface reconstruction pipeline is then proposed to estimate a textured surface model from a monocular endoscopic video, where self-supervised depth and descriptor learning and surface fusion technique is involved. We show that the proposed method performs superior to a popular dense reconstruction method and the estimate reconstructions are in good agreement with the surface models obtained from CT scans. To align video-reconstructed surface models with pre-operative imaging such as CT, we introduce a global point cloud registration algorithm that is robust to resolution mismatch that often happens in such multi-modal scenarios. Specifically, a geometric feature descriptor is developed where a novel network normalization technique is used to help a 3D network produce more consistent and distinctive geometric features for samples with different resolutions. The proposed geometric descriptor achieves state-of-the-art performance, based on our evaluation. Last but not least, a real-time SLAM system that estimates a surface geometry and camera trajectory from a monocular endoscopic video is developed, where deep representations for geometry and appearance and non-linear factor graph optimization are used. We show that the proposed SLAM system performs favorably compared with a state-of-the-art feature-based SLAM system

    The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection

    Full text link
    Where am I? This is one of the most critical questions that any intelligent system should answer to decide whether it navigates to a previously visited area. This problem has long been acknowledged for its challenging nature in simultaneous localization and mapping (SLAM), wherein the robot needs to correctly associate the incoming sensory data to the database allowing consistent map generation. The significant advances in computer vision achieved over the last 20 years, the increased computational power, and the growing demand for long-term exploration contributed to efficiently performing such a complex task with inexpensive perception sensors. In this article, visual loop closure detection, which formulates a solution based solely on appearance input data, is surveyed. We start by briefly introducing place recognition and SLAM concepts in robotics. Then, we describe a loop closure detection system's structure, covering an extensive collection of topics, including the feature extraction, the environment representation, the decision-making step, and the evaluation process. We conclude by discussing open and new research challenges, particularly concerning the robustness in dynamic environments, the computational complexity, and scalability in long-term operations. The article aims to serve as a tutorial and a position paper for newcomers to visual loop closure detection.Comment: 25 pages, 15 figure

    Visual and Camera Sensors

    Get PDF
    This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors

    Visual place recognition for improved open and uncertain navigation

    Get PDF
    Visual place recognition localises a query place image by comparing it against a reference database of known place images, a fundamental element of robotic navigation. Recent work focuses on using deep learning to learn image descriptors for this task that are invariant to appearance changes from dynamic lighting, weather and seasonal conditions. However, these descriptors: require greater computational resources than are available on robotic hardware, have few SLAM frameworks designed to utilise them, return a relative comparison between image descriptors which is difficult to interpret, cannot be used for appearance invariance in other navigation tasks such as scene classification and are unable to identify query images from an open environment that have no true match in the reference database. This thesis addresses these challenges with three contributions. The first is a lightweight visual place recognition descriptor combined with a probabilistic filter to address a subset of the visual SLAM problem in real-time. The second contribution combines visual place recognition and scene classification for appearance invariant scene classification, which is extended to recognise unknown scene classes when navigating an open environment. The final contribution uses comparisons between query and reference image descriptors to classify whether they result in a true, or false positive localisation and whether a true match for the query image exists in the reference database.Edinburgh Centre for Robotics and Engineering and Physical Sciences Research Council (EPSRC) fundin

    Visual Place Recognition: A Tutorial

    Full text link
    Localization is an essential capability for mobile robots. A rapidly growing field of research in this area is Visual Place Recognition (VPR), which is the ability to recognize previously seen places in the world based solely on images. This present work is the first tutorial paper on visual place recognition. It unifies the terminology of VPR and complements prior research in two important directions: 1) It provides a systematic introduction for newcomers to the field, covering topics such as the formulation of the VPR problem, a general-purpose algorithmic pipeline, an evaluation methodology for VPR approaches, and the major challenges for VPR and how they may be addressed. 2) As a contribution for researchers acquainted with the VPR problem, it examines the intricacies of different VPR problem types regarding input, data processing, and output. The tutorial also discusses the subtleties behind the evaluation of VPR algorithms, e.g., the evaluation of a VPR system that has to find all matching database images per query, as opposed to just a single match. Practical code examples in Python illustrate to prospective practitioners and researchers how VPR is implemented and evaluated.Comment: IEEE Robotics & Automation Magazine (RAM
    corecore