Search CORE

6 research outputs found

Place and Object Recognition for Real-time Visual Mapping

Author: Gálvez López Dorian
Tardós Solano Juan Domingo
Publication venue: Universidad de Zaragoza, Prensas de la Universidad
Publication date: 01/01/2013
Field of study

Este trabajo aborda dos de las principales dificultades presentes en los sistemas actuales de localización y creación de mapas de forma simultánea (del inglés Simultaneous Localization And Mapping, SLAM): el reconocimiento de lugares ya visitados para cerrar bucles en la trajectoria y crear mapas precisos, y el reconocimiento de objetos para enriquecer los mapas con estructuras de alto nivel y mejorar la interación entre robots y personas. En SLAM visual, las características que se extraen de las imágenes de una secuencia de vídeo se van acumulando con el tiempo, haciendo más laboriosos dos de los aspectos de la detección de bucles: la eliminación de los bucles incorrectos que se detectan entre lugares que tienen una apariencia muy similar, y conseguir un tiempo de ejecución bajo y factible en trayectorias largas. En este trabajo proponemos una técnica basada en vocabularios visuales y en bolsas de palabras para detectar bucles de manera robusta y eficiente, centrándonos en dos ideas principales: 1) aprovechar el origen secuencial de las imágenes de vídeo, y 2) hacer que todo el proceso pueda funcionar a frecuencia de vídeo. Para beneficiarnos del origen secuencial de las imágenes, presentamos una métrica de similaridad normalizada para medir el parecido entre imágenes e incrementar la distintividad de las detecciones correctas. A su vez, agrupamos los emparejamientos de imágenes candidatas a ser bucle para evitar que éstas compitan cuando realmente fueron tomadas desde el mismo lugar. Finalmente, incorporamos una restricción temporal para comprobar la coherencia entre detecciones consecutivas. La eficiencia se logra utilizando índices inversos y directos y características binarias. Un índice inverso acelera la comparación entre imágenes de lugares, y un índice directo, el cálculo de correspondencias de puntos entre éstas. Por primera vez, en este trabajo se han utilizado características binarias para detectar bucles, dando lugar a una solución viable incluso hasta para decenas de miles de imágenes. Los bucles se verifican comprobando la coherencia de la geometría de las escenas emparejadas. Para ello utilizamos varios métodos robustos que funcionan tanto con una como con múltiples cámaras. Presentamos resultados competitivos y sin falsos positivos en distintas secuencias, con imágenes adquiridas tanto a alta como a baja frecuencia, con cámaras frontales y laterales, y utilizando el mismo vocabulario y la misma configuración. Con descriptores binarios, el sistema completo requiere 22 milisegundos por imagen en una secuencia de 26.300 imágenes, resultando un orden de magnitud más rápido que otras técnicas actuales. Se puede utilizar un algoritmo similar al de reconocimiento de lugares para resolver el reconocimiento de objetos en SLAM visual. Detectar objetos en este contexto es particularmente complicado debido a que las distintas ubicaciones, posiciones y tamaños en los que se puede ver un objeto en una imagen son potencialmente infinitos, por lo que suelen ser difíciles de distinguir. Además, esta complejidad se multiplica cuando la comparación ha de hacerse contra varios objetos 3D. Nuestro esfuerzo en este trabajo está orientado a: 1) construir el primer sistema de SLAM visual que puede colocar objectos 3D reales en el mapa, y 2) abordar los problemas de escalabilidad resultantes al tratar con múltiples objetos y vistas de éstos. En este trabajo, presentamos el primer sistema de SLAM monocular que reconoce objetos 3D, los inserta en el mapa y refina su posición en el espacio 3D a medida que el mapa se va construyendo, incluso cuando los objetos dejan de estar en el campo de visión de la cámara. Esto se logra en tiempo real con modelos de objetos compuestos por información tridimensional y múltiples imágenes representando varios puntos de vista del objeto. Después nos centramos en la escalabilidad de la etapa del reconocimiento de los objetos 3D. Presentamos una técnica rápida para segmentar imágenes en regiones de interés para detectar objetos pequeños o lejanos. Tras ello, proponemos sustituir el modelo de objetos de vistas independientes por un modelado con una única bolsa de palabras de características binarias asociadas a puntos 3D. Creamos también una base de datos que incorpora índices inversos y directos para aprovechar sus ventajas a la hora de recuperar rápidamente tanto objetos candidatos a ser detectados como correspondencias de puntos, tal y como hacían en el caso de la detección de bucles. Los resultados experimentales muestran que nuestro sistema funciona en tiempo real en un entorno de escritorio con cámara en mano y en una habitación con una cámara montada sobre un robot autónomo. Las mejoras en el proceso de reconocimiento obtienen resultados satisfactorios, sin detecciones erróneas y con un tiempo de ejecución medio de 28 milisegundos por imagen con una base de datos de 20 objetos 3D

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Universidad de Zaragoza

Scalable Life-long Visual Place Recognition

Author: Doan Anh-Dzung
Publication venue
Publication date: 01/01/2022
Field of study

Visual place recognition (VPR) is the task of using visual inputs to determine if mobile robots are visiting a previously observed place or exploring new regions. To perform convincingly, a practical VPR algorithm must be robust against appearance changes, due to not only short-term (e.g., weather, lighting) and long-term (e.g., seasons, vegetation growth, etc) environmental variations, but also "less cyclical" changes (construction and roadworks, updating of signage, facades and billboards, etc). Such appearance changes invariably occur in real life. It motivates our thesis to fill this research gap. To this end, we firstly investigate probabilistic frameworks to effectively exploit the temporal information from visual data which is in the form of videos. Inspired by Bayes Filter, we propose two VPR methods that respectively perform filtering on discrete and continuous domains, where the temporal information is efficiently used to improve VPR accuracy under appearance changes. Given the fact that the appearance of operational environments uninterruptedly and indefinitely changes, a promising solution for VPR to deal with appearance changes is to continuously accumulate images to incorporate new changes into the internal environmental representation. This demands a VPR technique that is scalable on an ever growing dataset. To this end, inspired by Hidden Markov Models (HMM), we develop novel VPR techniques, that can be efficiently updated and compressed, such that the recognition of new queries can exploit all available data (including recent changes) without suffering from the linear growth in time and space complexity. Another approach to address the scalability issue in VPR is map summarization, which only keeps informative 3D points in a topometric map, according to predefined constraints. In this thesis, we define timestamp as another constraint. Accordingly, we formulate a repeatability predictor (RP) as a regressor, that predicts the repeatability of an interest point as a function of time. We show that the RP can be used to significantly alleviate the degeneration of VPR accuracy from map summarization. The contributions of this thesis not only fill the gap within current state of VPR research; but, more importantly, also enable a wide range of applications, such as, self-driving cars, autonomous robots, augmented reality, and so on.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

Adelaide Research & Scholarship

Distributed scene reconstruction from multiple mobile platforms

Author: Cavestany Pedro
Publication venue: Cranfield University
Publication date: 01/05/2015
Field of study

Recent research on mobile robotics has produced new designs that provide house-hold robots with omnidirectional motion. The image sensor embedded in these devices motivates the application of 3D vision techniques on them for navigation and mapping purposes. In addition to this, distributed cheapsensing systems acting as unitary entity have recently been discovered as an efficient alternative to expensive mobile equipment. In this work we present an implementation of a visual reconstruction method, structure from motion (SfM), on a low-budget, omnidirectional mobile platform, and extend this method to distributed 3D scene reconstruction with several instances of such a platform. Our approach overcomes the challenges yielded by the plaform. The unprecedented levels of noise produced by the image compression typical of the platform is processed by our feature filtering methods, which ensure suitable feature matching populations for epipolar geometry estimation by means of a strict quality-based feature selection. The robust pose estimation algorithms implemented, along with a novel feature tracking system, enable our incremental SfM approach to novelly deal with ill-conditioned inter-image configurations provoked by the omnidirectional motion. The feature tracking system developed efficiently manages the feature scarcity produced by noise and outputs quality feature tracks, which allow robust 3D mapping of a given scene even if - due to noise - their length is shorter than what it is usually assumed for performing stable 3D reconstructions. The distributed reconstruction from multiple instances of SfM is attained by applying loop-closing techniques. Our multiple reconstruction system merges individual 3D structures and resolves the global scale problem with minimal overlaps, whereas in the literature 3D mapping is obtained by overlapping stretches of sequences. The performance of this system is demonstrated in the 2-session case. The management of noise, the stability against ill-configurations and the robustness of our SfM system is validated on a number of experiments and compared with state-of-the-art approaches. Possible future research areas are also discussed

Cranfield CERES

Topological Localization using Wi-Fi and Vision merged into FABMAP framework

Author: Dalibard Sébastien
Garcia Nicolas
Joly Cyril
Moutarde Fabien
Nowakowski Mathieu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/09/2017
Field of study

International audience— This paper introduces a topological localization algorithm that uses visual and Wi-Fi data. Its main contribution is a novel way of merging data from these sensors. By making Wi-Fi signature suited to FABMAP algorithm, it develops an early-fusion framework that solves global localization and kidnapped robot problem. The resulting algorithm is tested and compared to FABMAP visual localization, over data acquired by a Pepper robot in an office building. Several constraints were applied during acquisition to make the experiment fitted to real-life scenarios. Without any tuning, early-fusion surpasses the performances of visual localization by a significant margin: 94% of estimated localizations are less than 5m away from ground truth compared to 81% with visual localization

Crossref

HAL-MINES ParisTech

User-oriented markerless augmented reality framework based on 3D reconstruction and loop closure detection

Author: Gao Yuqing
Publication venue
Publication date: 01/07/2017
Field of study

An augmented reality (AR) system needs to track the user-view to perform an accurate augmentation registration. The present research proposes a conceptual marker-less, natural feature-based AR framework system, the process for which is divided into two stages - an offline database training session for the application developers, and an online AR tracking and display session for the final users. In the offline session, two types of 3D reconstruction application, RGBD-SLAM and SfM are integrated into the development framework for building the reference template of a target environment. The performance and applicable conditions of these two methods are presented in the present thesis, and the application developers can choose which method to apply for their developmental demands. A general developmental user interface is provided to the developer for interaction, including a simple GUI tool for augmentation configuration. The present proposal also applies a Bag of Words strategy to enable a rapid "loop-closure detection" in the online session, for efficiently querying the application user-view from the trained database to locate the user pose. The rendering and display process of augmentation is currently implemented within an OpenGL window, which is one result of the research that is worthy of future detailed investigation and development

University of Birmingham Research Archive, E-theses Repository