    Increasing the Efficiency of 6-DoF Visual Localization Using Multi-Modal Sensory Data

    Localization is a key requirement for mobile robot autonomy and human-robot interaction. Vision-based localization is accurate and flexible, however, it incurs a high computational burden which limits its application on many resource-constrained platforms. In this paper, we address the problem of performing real-time localization in large-scale 3D point cloud maps of ever-growing size. While most systems using multi-modal information reduce localization time by employing side-channel information in a coarse manner (eg. WiFi for a rough prior position estimate), we propose to inter-weave the map with rich sensory data. This multi-modal approach achieves two key goals simultaneously. First, it enables us to harness additional sensory data to localise against a map covering a vast area in real-time; and secondly, it also allows us to roughly localise devices which are not equipped with a camera. The key to our approach is a localization policy based on a sequential Monte Carlo estimator. The localiser uses this policy to attempt point-matching only in nodes where it is likely to succeed, significantly increasing the efficiency of the localization process. The proposed multi-modal localization system is evaluated extensively in a large museum building. The results show that our multi-modal approach not only increases the localization accuracy but significantly reduces computational time.Comment: Presented at IEEE-RAS International Conference on Humanoid Robots (Humanoids) 201

    User-oriented markerless augmented reality framework based on 3D reconstruction and loop closure detection

    An augmented reality (AR) system needs to track the user-view to perform an accurate augmentation registration. The present research proposes a conceptual marker-less, natural feature-based AR framework system, the process for which is divided into two stages - an offline database training session for the application developers, and an online AR tracking and display session for the final users. In the offline session, two types of 3D reconstruction application, RGBD-SLAM and SfM are integrated into the development framework for building the reference template of a target environment. The performance and applicable conditions of these two methods are presented in the present thesis, and the application developers can choose which method to apply for their developmental demands. A general developmental user interface is provided to the developer for interaction, including a simple GUI tool for augmentation configuration. The present proposal also applies a Bag of Words strategy to enable a rapid "loop-closure detection" in the online session, for efficiently querying the application user-view from the trained database to locate the user pose. The rendering and display process of augmentation is currently implemented within an OpenGL window, which is one result of the research that is worthy of future detailed investigation and development

    Topological local-metric framework for mobile robots navigation: a long term perspective

    © 2018, Springer Science+Business Media, LLC, part of Springer Nature. Long term mapping and localization are the primary components for mobile robots in real world application deployment, of which the crucial challenge is the robustness and stability. In this paper, we introduce a topological local-metric framework (TLF), aiming at dealing with environmental changes, erroneous measurements and achieving constant complexity. TLF organizes the sensor data collected by the robot in a topological graph, of which the geometry is only encoded in the edge, i.e. the relative poses between adjacent nodes, relaxing the global consistency to local consistency. Therefore the TLF is more robust to unavoidable erroneous measurements from sensor information matching since the error is constrained in the local. Based on TLF, as there is no global coordinate, we further propose the localization and navigation algorithms by switching across multiple local metric coordinates. Besides, a lifelong memorizing mechanism is presented to memorize the environmental changes in the TLF with constant complexity, as no global optimization is required. In experiments, the framework and algorithms are evaluated on 21-session data collected by stereo cameras, which are sensitive to illumination, and compared with the state-of-art global consistent framework. The results demonstrate that TLF can achieve similar localization accuracy with that from global consistent framework, but brings higher robustness with lower cost. The localization performance can also be improved from sessions because of the memorizing mechanism. Finally, equipped with TLF, the robot navigates itself in a 1 km session autonomously

    Active Mapping and Robot Exploration: A Survey

    Simultaneous localization and mapping responds to the problem of building a map of the environment without any prior information and based on the data obtained from one or more sensors. In most situations, the robot is driven by a human operator, but some systems are capable of navigating autonomously while mapping, which is called native simultaneous localization and mapping. This strategy focuses on actively calculating the trajectories to explore the environment while building a map with a minimum error. In this paper, a comprehensive review of the research work developed in this field is provided, targeting the most relevant contributions in indoor mobile robotics.This research was funded by the ELKARTEK project ELKARBOT KK-2020/00092 of the Basque Government

    Probabilistic techniques in semantic mapping for mobile robotics

    Los mapas semánticos son representaciones del mundo que permiten a un robot entender no sólo los aspectos espaciales de su lugar de trabajo, sino también el significado de sus elementos (objetos, habitaciones, etc.) y como los humanos interactúan con ellos (e.g. funcionalidades, eventos y relaciones). Para conseguirlo, un mapa semántico añade a las representaciones puramente espaciales, tales como mapas geométricos o topológicos, meta-información sobre los tipos de elementos y relaciones que pueden encontrarse en el entorno de trabajo. Esta meta-información, denominada conocimiento semántico o de sentido común, se codifica típicamente en Bases de Conocimiento. Un ejemplo de este tipo de información podría ser: "los frigoríficos son objetos grandes, con forma rectangular, colocados normalmente en las cocinas, y que pueden contener comida perecedera y medicación". Codificar y manejar este conocimiento semántico permite al robot razonar acerca de la información obtenida de un cierto lugar de trabajo, así como inferir nueva información con el fin de ejecutar eficientemente tareas de alto nivel como "¡hola robot! llévale la medicación a la abuela, por favor". La presente tesis propone la utilización de técnicas probabilísticas para construir y mantener mapas semánticos, lo cual presenta tres ventajas principales en comparación con los enfoques tradicionales: i) permite manejar incertidumbre (proveniente de los sensores imprecisos del robot y de los modelos empleados), ii) provee representaciones del entorno coherentes por medio del aprovechamiento de las relaciones contextuales entre los elementos observados (e.g. los frigoríficos usualmente se encuentran en las cocinas) desde un punto de vista holístico, y iii) produce valores de certidumbre que reflejan el grado de exactitud de la comprensión del robot acerca de su entorno. Específicamente, las contribuciones presentadas pueden agruparse en dos temas principales. El primer conjunto de contribuciones se basa en el problema del reconocimiento de objetos y/o habitaciones, ya que los sistemas de mapeo semántico deben contar con algoritmos de reconocimiento fiables para la construcción de representaciones válidas. Para ello se ha explorado la utilización de Modelos Gráficos Probabilísticos (Probabilistic Graphical Models o PGMs en inglés) con el fin de aprovechar las relaciones de contexto entre objetos y/o habitaciones a la vez que se maneja la incertidumbre inherente al problema de reconocimiento, y el empleo de Bases de Conocimiento para mejorar su desempeño de distintos modos, e.g., detectando resultados incoherentes, proveyendo información a priori, reduciendo la complejidad de los algoritmos de inferencia probabilística, generando ejemplos de entrenamiento sintéticos, habilitando el aprendizaje a partir de experiencias pasadas, etc. El segundo grupo de contribuciones acomoda los resultados probabilísticos provenientes de los algoritmos de reconocimiento desarrollados en una nueva representación semántica, denominada Multiversal Semantic Map (MvSmap). Este mapa gestiona múltiples interpretaciones del espacio de trabajo del robot, llamadas universos, los cuales son anotados con la probabilidad de ser los correctos de acuerdo con el conocimiento actual del robot. Así, este enfoque proporciona una creencia fundamentada sobre la exactitud de la comprensión del robot sobre su entorno, lo que le permite operar de una manera más eficiente y coherente. Los algoritmos probabilísticos propuestos han sido testeados concienzudamente y comparados con otros enfoques actuales e innovadores empleando conjuntos de datos del estado del arte. De manera adicional, esta tesis también contribuye con dos conjuntos de datos, UMA-Offices and Robot@Home, los cuales contienen información sensorial capturada en distintos entornos de oficinas y casas, así como dos herramientas software, la librería Undirected Probabilistic Graphical Models in C++ (UPGMpp), y el conjunto de herramientas Object Labeling Toolkit (OLT), para el trabajo con Modelos Gráficos Probabilísticos y el procesamiento de conjuntos de datos respectivamente

    Dense Visual Simultaneous Localisation and Mapping in Collaborative and Outdoor Scenarios

    Dense visual simultaneous localisation and mapping (SLAM) systems can produce 3D reconstructions that are digital facsimiles of the physical space they describe. Systems that can produce dense maps with this level of fidelity in real time provide foundational spatial reasoning capabilities for many downstream tasks in autonomous robotics. Over the past 15 years, mapping small scale, indoor environments, such as desks and buildings, with a single slow moving, hand-held sensor has been one of the central focuses of dense visual SLAM research. However, most dense visual SLAM systems exhibit a number of limitations which mean they cannot be directly applied in collaborative or outdoors settings. The contribution of this thesis is to address these limitations with the development of new systems and algorithms for collaborative dense mapping, efficient dense alternation and outdoors operation with fast camera motion and wide field of view (FOV) cameras. We use ElasticFusion, a state-of-the-art dense SLAM system, as our starting point where each of these contributions is implemented as a novel extension to the system. We first present a collaborative dense SLAM system that allows a number of cameras starting with unknown initial relative positions to maintain local maps with the original ElasticFusion algorithm. Visual place recognition across local maps results in constraints that allow maps to be aligned into a common global reference frame, facilitating collaborative mapping and tracking of multiple cameras within a shared map. Within dense alternation based SLAM systems, the standard approach is to fuse every frame into the dense model without considering whether the information contained within the frame is already captured by the dense map and therefore redundant. As the number of cameras or the scale of the map increases, this approach becomes inefficient. In our second contribution, we address this inefficiency by introducing a novel information theoretic approach to keyframe selection that allows the system to avoid processing redundant information. We implement the procedure within ElasticFusion, demonstrating a marked reduction in the number of frames required by the system to estimate an accurate, denoised surface reconstruction. Before dense SLAM techniques can be applied in outdoor scenarios we must first address their reliance on active depth cameras, and their lack of suitability to fast camera motion. In our third contribution we present an outdoor dense SLAM system. The system overcomes the need for an active sensor by employing neural network-based depth inference to predict the geometry of the scene as it appears in each image. To address the issue of camera tracking during fast motion we employ a hybrid architecture, combining elements of both dense and sparse SLAM systems to perform camera tracking and to achieve globally consistent dense mapping. Automotive applications present a particularly important setting for dense visual SLAM systems. Such applications are characterised by their use of wide FOV cameras and are therefore not accurately modelled by the standard pinhole camera model. The fourth contribution of this thesis is to extend the above hybrid sparse-dense monocular SLAM system to cater for large FOV fisheye imagery. This is achieved by reformulating the mapping pipeline in terms of the Kannala-Brandt fisheye camera model. To estimate depth, we introduce a new version of the PackNet depth estimation neural network (Guizilini et al., 2020) adapted for fisheye inputs. To demonstrate the effectiveness of our contributions, we present experimental results, computed by processing the synthetic ICL-NUIM dataset of Handa et al. (2014) as well as the real-world TUM-RGBD dataset of Sturm et al. (2012). For outdoor SLAM we show the results of our system processing the autonomous driving KITTI and KITTI-360 datasets of Geiger et al. (2012a) and Liao et al. (2021) respectively

    Robotic Maintenance and ROS - Appearance Based SLAM and Navigation With a Mobile Robot Prototype

    Robotic maintenance has been a topic in several master's theses and specialization projects at the Department of Engineering Cybernetics (ITK) at NTNU over many years. This thesis continues on the same topic, with special focus on camera-based mapping and navigation in conjunction with automated maintenance, and automated maintenance in general. The objective of this thesis is to implement one or more functionalities based on camera-based sensors in a mobile autonomous robot. This is accomplished by acquiring knowledge of existing solutions and future requirements within automated maintenance. A mobile robot prototype has been configured to run ROS (Robot Operating System), a middleware framework that is suited to the development of robotic systems. The system uses RTAB-Map (Real-Time Appearance Based Mapping) to survey the surroundings and a built navigation stack in ROS to navigate autonomously against easy targets in the map. The method uses a Kinect for Xbox 360 as the main sensor and a 2D laser scanner to the surveying and odometry. It is also developed functional concepts for two support functions, an Android application for remote control over Bluetooth and a remote central (OCS) developed in Qt. Remote Central is a skeletal implementation that is able to remotely control the robot via WiFi, as well as to display video from the robot's camera. Test results, obtained from both live and simulated trials, indicate that the robot is able to form 3D and 2D map of the surroundings. The method has weaknesses that are related to the ability to find visual features. Laser Based odometry can be tricked when the environment is changing, and when there are few unique features. Further testing has demonstrated that the robot can navigate autonomously, but there is still room for improvement. Better results can be achieved with a new movable platform and further tuning of the system. In conclusion, ROS works well as a development tools for robots, and the current system is suitable for further development. RTAB-Maps suitability for use on an industrial installation is still uncertain and requires further testing

    Submap Matching for Stereo-Vision Based Indoor/Outdoor SLAM

    Autonomous robots operating in semi- or unstructured environments, e.g. during search and rescue missions, require methods for online on-board creation of maps to support path planning and obstacle avoidance. Perception based on stereo cameras is well suited for mixed indoor/outdoor environments. The creation of full 3D maps in GPS-denied areas however is still a challenging task for current robot systems, in particular due to depth errors resulting from stereo reconstruction. State-of-the-art 6D SLAM approaches employ graph-based optimization on the relative transformations between keyframes or local submaps. To achieve loop closures, correct data association is crucial, in particular for sensor input received at different points in time. In order to approach this challenge, we propose a novel method for submap matching. It is based on robust keypoints, which we derive from local obstacle classification. By describing geometrical 3D features, we achieve invariance to changing viewpoints and varying light conditions. We performed experiments in indoor, outdoor and mixed environments. In all three scenarios we achieved a final 3D position error of less than 0.23% of the full trajectory. In addition, we compared our approach with a 3D RBPF SLAM from previous work, achieving an improvement of at least 27% in mean 2D localization accuracy in different scenarios

    Automatic and adaptable registration of live RGBD video streams sharing partially overlapping views

    In this thesis, we introduce DeReEs-4v, an algorithm for unsupervised and automatic registration of two video frames captured depth-sensing cameras. DeReEs-4V receives two RGBD video streams from two depth-sensing cameras arbitrary located in an indoor space that share a minimum amount of 25% overlap between their captured scenes. The motivation of this research is to employ multiple depth-sensing cameras to enlarge the field of view and acquire a more complete and accurate 3D information of the environment. A typical way to combine multiple views from different cameras is through manual calibration. However, this process is time-consuming and may require some technical knowledge. Moreover, calibration has to be repeated when the location or position of the cameras change. In this research, we demonstrate how DeReEs-4V registration can be used to find the transformation of the view of one camera with respect to the other at interactive rates. Our algorithm automatically finds the 3D transformation to match the views from two cameras, requires no human interference, and is robust to camera movements while capturing. To validate this approach, a thorough examination of the system performance under different scenarios is presented. The system presented here supports any application that might benefit from the wider field-of-view provided by the combined scene from both cameras, including applications in 3D telepresence, gaming, people tracking, videoconferencing and computer vision