1,625 research outputs found

    Dense Visual Simultaneous Localisation and Mapping in Collaborative and Outdoor Scenarios

    Get PDF
    Dense visual simultaneous localisation and mapping (SLAM) systems can produce 3D reconstructions that are digital facsimiles of the physical space they describe. Systems that can produce dense maps with this level of fidelity in real time provide foundational spatial reasoning capabilities for many downstream tasks in autonomous robotics. Over the past 15 years, mapping small scale, indoor environments, such as desks and buildings, with a single slow moving, hand-held sensor has been one of the central focuses of dense visual SLAM research. However, most dense visual SLAM systems exhibit a number of limitations which mean they cannot be directly applied in collaborative or outdoors settings. The contribution of this thesis is to address these limitations with the development of new systems and algorithms for collaborative dense mapping, efficient dense alternation and outdoors operation with fast camera motion and wide field of view (FOV) cameras. We use ElasticFusion, a state-of-the-art dense SLAM system, as our starting point where each of these contributions is implemented as a novel extension to the system. We first present a collaborative dense SLAM system that allows a number of cameras starting with unknown initial relative positions to maintain local maps with the original ElasticFusion algorithm. Visual place recognition across local maps results in constraints that allow maps to be aligned into a common global reference frame, facilitating collaborative mapping and tracking of multiple cameras within a shared map. Within dense alternation based SLAM systems, the standard approach is to fuse every frame into the dense model without considering whether the information contained within the frame is already captured by the dense map and therefore redundant. As the number of cameras or the scale of the map increases, this approach becomes inefficient. In our second contribution, we address this inefficiency by introducing a novel information theoretic approach to keyframe selection that allows the system to avoid processing redundant information. We implement the procedure within ElasticFusion, demonstrating a marked reduction in the number of frames required by the system to estimate an accurate, denoised surface reconstruction. Before dense SLAM techniques can be applied in outdoor scenarios we must first address their reliance on active depth cameras, and their lack of suitability to fast camera motion. In our third contribution we present an outdoor dense SLAM system. The system overcomes the need for an active sensor by employing neural network-based depth inference to predict the geometry of the scene as it appears in each image. To address the issue of camera tracking during fast motion we employ a hybrid architecture, combining elements of both dense and sparse SLAM systems to perform camera tracking and to achieve globally consistent dense mapping. Automotive applications present a particularly important setting for dense visual SLAM systems. Such applications are characterised by their use of wide FOV cameras and are therefore not accurately modelled by the standard pinhole camera model. The fourth contribution of this thesis is to extend the above hybrid sparse-dense monocular SLAM system to cater for large FOV fisheye imagery. This is achieved by reformulating the mapping pipeline in terms of the Kannala-Brandt fisheye camera model. To estimate depth, we introduce a new version of the PackNet depth estimation neural network (Guizilini et al., 2020) adapted for fisheye inputs. To demonstrate the effectiveness of our contributions, we present experimental results, computed by processing the synthetic ICL-NUIM dataset of Handa et al. (2014) as well as the real-world TUM-RGBD dataset of Sturm et al. (2012). For outdoor SLAM we show the results of our system processing the autonomous driving KITTI and KITTI-360 datasets of Geiger et al. (2012a) and Liao et al. (2021) respectively

    Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade

    Full text link
    Camera pose estimation is an important problem in computer vision. Common techniques either match the current image against keyframes with known poses, directly regress the pose, or establish correspondences between keypoints in the image and points in the scene to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but have traditionally needed to be trained offline on the target scene, preventing relocalisation in new environments. Recently, we showed how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. The adapted forests achieved relocalisation performance that was on par with that of offline forests, and our approach was able to estimate the camera pose in close to real time. In this paper, we present an extension of this work that achieves significantly better relocalisation performance whilst running fully in real time. To achieve this, we make several changes to the original approach: (i) instead of accepting the camera pose hypothesis without question, we make it possible to score the final few hypotheses using a geometric approach and select the most promising; (ii) we chain several instantiations of our relocaliser together in a cascade, allowing us to try faster but less accurate relocalisation first, only falling back to slower, more accurate relocalisation as necessary; and (iii) we tune the parameters of our cascade to achieve effective overall performance. These changes allow us to significantly improve upon the performance our original state-of-the-art method was able to achieve on the well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional contributions, we present a way of visualising the internal behaviour of our forests and show how to entirely circumvent the need to pre-train a forest on a generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin assert joint first authorshi

    Collaborative Dense SLAM

    Get PDF
    In this paper, we present a new system for live collaborative dense surface reconstruction. Cooperative robotics, multi participant augmented reality and human-robot interaction are all examples of situations where collaborative mapping can be leveraged for greater agent autonomy. Our system builds on ElasticFusion to allow a number of cameras starting with unknown initial relative positions to maintain local maps utilising the original algorithm. Carrying out visual place recognition across these local maps the system can identify when two maps overlap in space, providing an inter-map constraint from which the system can derive the relative poses of the two maps. Using these resulting pose constraints, our system performs map merging, allowing multiple cameras to fuse their measurements into a single shared reconstruction. The advantage of this approach is that it avoids replication of structures subsequent to loop closures, where multiple cameras traverse the same regions of the environment. Furthermore, it allows cameras to directly exploit and update regions of the environment previously mapped by other cameras within the system. We provide both quantitative and qualitative analyses using the syntethic ICL-NUIM dataset and the realworld Freiburg dataset including the impact of multi-camera mapping on surface reconstruction accuracy, camera pose estimation accuracy and overall processing time. We also include qualitative results in the form of sample reconstructions of room sized environments with up to 3 cameras undergoing intersecting and loopy trajectories

    Collaborative Dense SLAM

    Get PDF
    In this paper, we present a new system for live collaborative dense surface reconstruction. Cooperative robotics, multi participant augmented reality and human-robot interaction are all examples of situations where collaborative mapping can be leveraged for greater agent autonomy. Our system builds on ElasticFusion to allow a number of cameras starting with unknown initial relative positions to maintain local maps utilising the original algorithm. Carrying out visual place recognition across these local maps the system can identify when two maps overlap in space, providing an inter-map constraint from which the system can derive the relative poses of the two maps. Using these resulting pose constraints, our system performs map merging, allowing multiple cameras to fuse their measurements into a single shared reconstruction. The advantage of this approach is that it avoids replication of structures subsequent to loop closures, where multiple cameras traverse the same regions of the environment. Furthermore, it allows cameras to directly exploit and update regions of the environment previously mapped by other cameras within the system. We provide both quantitative and qualitative analyses using the syntethic ICL-NUIM dataset and the realworld Freiburg dataset including the impact of multi-camera mapping on surface reconstruction accuracy, camera pose estimation accuracy and overall processing time. We also include qualitative results in the form of sample reconstructions of room sized environments with up to 3 cameras undergoing intersecting and loopy trajectories

    Sparse octree algorithms for scalable dense volumetric tracking and mapping

    Get PDF
    This thesis is concerned with the problem of Simultaneous Localisation and Mapping (SLAM), the task of localising an agent within an unknown environment and at the same time building a representation of it. In particular, we tackle the fundamental scalability limitations of dense volumetric SLAM systems. We do so by proposing a highly efficient hierarchical data-structure based on octrees together with a set of algorithms to support the most compute-intensive operations in typical volumetric reconstruction pipelines. We employ our hierarchical representation in a novel dense pipeline based on occupancy probabilities. Crucially, the complete space representation encoded by the octree enables to demonstrate a fully integrated system in which tracking, mapping and occupancy queries can be performed seamlessly on a single coherent representation. While achieving accuracy either at par or better than the current state-of-the-art, we demonstrate run-time performance of at least an order of magnitude better than currently available hierarchical data-structures. Finally, we introduce a novel multi-scale reconstruction system that exploits our octree hierarchy. By adaptively selecting the appropriate scale to match the effective sensor resolution in both integration and rendering, we demonstrate better reconstruction results and tracking accuracy compared to single-resolution grids. Furthermore, we achieve much higher computational performance by propagating information up and down the tree in a lazy fashion, which allow us to reduce the computational load when updating distant surfaces. We have released our software as an open-source library, named supereight, which is freely available for the benefit of the wider community. One of the main advantages of our library is its flexibility. By carefully providing a set of algorithmic abstractions, supereight enables SLAM practitioners to freely experiment with different map representations with no intervention on the back-end library code and crucially, preserving performance. Our work has been adopted by robotics researchers in both academia and industry.Open Acces

    Distributed scene reconstruction from multiple mobile platforms

    Get PDF
    Recent research on mobile robotics has produced new designs that provide house-hold robots with omnidirectional motion. The image sensor embedded in these devices motivates the application of 3D vision techniques on them for navigation and mapping purposes. In addition to this, distributed cheapsensing systems acting as unitary entity have recently been discovered as an efficient alternative to expensive mobile equipment. In this work we present an implementation of a visual reconstruction method, structure from motion (SfM), on a low-budget, omnidirectional mobile platform, and extend this method to distributed 3D scene reconstruction with several instances of such a platform. Our approach overcomes the challenges yielded by the plaform. The unprecedented levels of noise produced by the image compression typical of the platform is processed by our feature filtering methods, which ensure suitable feature matching populations for epipolar geometry estimation by means of a strict quality-based feature selection. The robust pose estimation algorithms implemented, along with a novel feature tracking system, enable our incremental SfM approach to novelly deal with ill-conditioned inter-image configurations provoked by the omnidirectional motion. The feature tracking system developed efficiently manages the feature scarcity produced by noise and outputs quality feature tracks, which allow robust 3D mapping of a given scene even if - due to noise - their length is shorter than what it is usually assumed for performing stable 3D reconstructions. The distributed reconstruction from multiple instances of SfM is attained by applying loop-closing techniques. Our multiple reconstruction system merges individual 3D structures and resolves the global scale problem with minimal overlaps, whereas in the literature 3D mapping is obtained by overlapping stretches of sequences. The performance of this system is demonstrated in the 2-session case. The management of noise, the stability against ill-configurations and the robustness of our SfM system is validated on a number of experiments and compared with state-of-the-art approaches. Possible future research areas are also discussed

    Distributed Robotic Vision for Calibration, Localisation, and Mapping

    Get PDF
    This dissertation explores distributed algorithms for calibration, localisation, and mapping in the context of a multi-robot network equipped with cameras and onboard processing, comparing against centralised alternatives where all data is transmitted to a singular external node on which processing occurs. With the rise of large-scale camera networks, and as low-cost on-board processing becomes increasingly feasible in robotics networks, distributed algorithms are becoming important for robustness and scalability. Standard solutions to multi-camera computer vision require the data from all nodes to be processed at a central node which represents a significant single point of failure and incurs infeasible communication costs. Distributed solutions solve these issues by spreading the work over the entire network, operating only on local calculations and direct communication with nearby neighbours. This research considers a framework for a distributed robotic vision platform for calibration, localisation, mapping tasks where three main stages are identified: an initialisation stage where calibration and localisation are performed in a distributed manner, a local tracking stage where visual odometry is performed without inter-robot communication, and a global mapping stage where global alignment and optimisation strategies are applied. In consideration of this framework, this research investigates how algorithms can be developed to produce fundamentally distributed solutions, designed to minimise computational complexity whilst maintaining excellent performance, and designed to operate effectively in the long term. Therefore, three primary objectives are sought aligning with these three stages

    R3^3SGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems

    Full text link
    Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is hard to achieve with conventional hardware, making the use of embedded devices such as FPGAs attractive for low-power applications. However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA context, the accuracy of SGM has been improved by More Global Matching (MGM), which also helps tackle the streaking artifacts that afflict SGM. In this paper, we propose a novel, resource-efficient method that is inspired by MGM's techniques for improving depth quality, but which can be implemented to run in real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI and Middlebury), we show that in comparison to other real-time capable stereo approaches, we can achieve a state-of-the-art balance between accuracy, power efficiency and speed, making our approach highly desirable for use in real-time systems with limited power.Comment: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4 table

    Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

    Get PDF
    Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved
    • …
    corecore