1,531 research outputs found

    Delta Descriptors: Change-Based Place Representation for Robust Visual Localization

    Full text link
    Visual place recognition is challenging because there are so many factors that can cause the appearance of a place to change, from day-night cycles to seasonal change to atmospheric conditions. In recent years a large range of approaches have been developed to address this challenge including deep-learnt image descriptors, domain translation, and sequential filtering, all with shortcomings including generality and velocity-sensitivity. In this paper we propose a novel descriptor derived from tracking changes in any learned global descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the offsets induced in the original descriptor matching space in an unsupervised manner by considering temporal differences across places observed along a route. Like all other approaches, Delta Descriptors have a shortcoming - volatility on a frame to frame basis - which can be overcome by combining them with sequential filtering methods. Using two benchmark datasets, we first demonstrate the high performance of Delta Descriptors in isolation, before showing new state-of-the-art performance when combined with sequence-based matching. We also present results demonstrating the approach working with four different underlying descriptor types, and two other beneficial properties of Delta Descriptors in comparison to existing techniques: their increased inherent robustness to variations in camera motion and a reduced rate of performance degradation as dimensional reduction is applied. Source code is made available at https://github.com/oravus/DeltaDescriptors.Comment: 8 pages and 7 figures. Published in 2020 IEEE Robotics and Automation Letters (RA-L

    LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics

    Full text link
    Human visual scene understanding is so remarkable that we are able to recognize a revisited place when entering it from the opposite direction it was first visited, even in the presence of extreme variations in appearance. This capability is especially apparent during driving: a human driver can recognize where they are when travelling in the reverse direction along a route for the first time, without having to turn back and look. The difficulty of this problem exceeds any addressed in past appearance- and viewpoint-invariant visual place recognition (VPR) research, in part because large parts of the scene are not commonly observable from opposite directions. Consequently, as shown in this paper, the precision-recall performance of current state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders of magnitude below what would be usable in a closed-loop system. Current engineered solutions predominantly rely on panoramic camera or LIDAR sensing setups; an eminently suitable engineering solution but one that is clearly very different to how humans navigate, which also has implications for how naturally humans could interact and communicate with the navigation system. In this paper we develop a suite of novel semantic- and appearance-based techniques to enable for the first time high performance place recognition in this challenging scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of images using the convolutional feature maps from a state-of-the-art dense semantic segmentation network. Then, to verify the spatial semantic arrangement of the top matching candidates, we develop a novel approach for mining semantically-salient keypoint correspondences.Comment: Accepted for Robotics: Science and Systems (RSS) 2018. Source code now available at https://github.com/oravus/lost

    Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions

    Get PDF
    In recent years there has been significant improvement in the capability of Visual Place Recognition (VPR) methods, building on the success of both hand-crafted and learnt visual features, temporal filtering and usage of semantic scene information. The wide range of approaches and the relatively recent growth in interest in the field has meant that a wide range of datasets and assessment methodologies have been proposed, often with a focus only on precision-recall type metrics, making comparison difficult. In this paper we present a comprehensive approach to evaluating the performance of 10 state-of-the-art recently-developed VPR techniques, which utilizes three standardized metrics: (a) Matching Performance b) Matching Time c) Memory Footprint. Together this analysis provides an up-to-date and widely encompassing snapshot of the various strengths and weaknesses of contemporary approaches to the VPR problem. The aim of this work is to help move this particular research field towards a more mature and unified approach to the problem, enabling better comparison and hence more progress to be made in future research

    SeqNet: Learning Descriptors for Sequence-based Hierarchical Place Recognition

    Full text link
    Visual Place Recognition (VPR) is the task of matching current visual imagery from a camera to images stored in a reference map of the environment. While initial VPR systems used simple direct image methods or hand-crafted visual features, recent work has focused on learning more powerful visual features and further improving performance through either some form of sequential matcher / filter or a hierarchical matching process. In both cases the performance of the initial single-image based system is still far from perfect, putting significant pressure on the sequence matching or (in the case of hierarchical systems) pose refinement stages. In this paper we present a novel hybrid system that creates a high performance initial match hypothesis generator using short learnt sequential descriptors, which enable selective control sequential score aggregation using single image learnt descriptors. Sequential descriptors are generated using a temporal convolutional network dubbed SeqNet, encoding short image sequences using 1-D convolutions, which are then matched against the corresponding temporal descriptors from the reference dataset to provide an ordered list of place match hypotheses. We then perform selective sequential score aggregation using shortlisted single image learnt descriptors from a separate pipeline to produce an overall place match hypothesis. Comprehensive experiments on challenging benchmark datasets demonstrate the proposed method outperforming recent state-of-the-art methods using the same amount of sequential information. Source code and supplementary material can be found at https://github.com/oravus/seqNet.Comment: Accepted for publication in IEEE RA-L 2021; includes supplementar

    Visual sequence-based place recognition for changing conditions and varied viewpoints

    Get PDF
    Correctly identifying previously-visited locations is essential for robotic place recognition and localisation. This thesis presents training-free solutions to vision-based place recognition under changing environmental conditions and camera viewpoints. Using vision as a primary sensor, the proposed approaches combine image segmentation and rescaling techniques over sequences of visual imagery to enable successful place recognition over a range of challenging environments where prior techniques have failed

    2D Visual Place Recognition for Domestic Service Robots at Night

    Get PDF
    Domestic service robots such as lawn mowing and vacuum cleaning robots are the most numerous consumer robots in existence today. While early versions employed random exploration, recent systems fielded by most of the major manufacturers have utilized range-based and visual sensors and user-placed beacons to enable robots to map and localize. However, active range and visual sensing solutions have the disadvantages of being intrusive, expensive, or only providing a 1D scan of the environment, while the requirement for beacon placement imposes other practical limitations. In this paper we present a passive and potentially cheap vision-based solution to 2D localization at night that combines easily obtainable day-time maps with low resolution contrast-normalized image matching algorithms, image sequence-based matching in two-dimensions, place match interpolation and recent advances in conventional low light camera technology. In a range of experiments over a domestic lawn and in a lounge room, we demonstrate that the proposed approach enables 2D localization at night, and analyse the effect on performance of varying odometry noise levels, place match interpolation and sequence matching length. Finally we benchmark the new low light camera technology and show how it can enable robust place recognition even in an environment lit only by a moonless sky, raising the tantalizing possibility of being able to apply all conventional vision algorithms, even in the darkest of nights

    Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

    Get PDF
    Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to be robust to a wide variety of viewing condition, including day-night changes, as well as weather and seasonal variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera pose estimates. In this paper, we introduce the first benchmark datasets specifically designed for analyzing the impact of such factors on visual localization. Using carefully created ground truth poses for query images taken under a wide variety of conditions, we evaluate the impact of various factors on 6DOF camera pose estimation accuracy through extensive experiments with state-of-the-art localization approaches. Based on our results, we draw conclusions about the difficulty of different conditions, showing that long-term localization is far from solved, and propose promising avenues for future work, including sequence-based localization approaches and the need for better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh

    Skyline-based localisation for aggressively manoeuvring robots using UV sensors and spherical harmonics

    Get PDF
    Place recognition is a key capability for navigating robots. While significant advances have been achieved on large, stable platforms such as robot cars, achieving robust performance on rapidly manoeuvring platforms in outdoor natural conditions remains a challenge, with few systems able to deal with both variable conditions and large tilt variations caused by rough terrain. Taking inspiration from biology, we propose a novel combination of sensory modality and image processing to obtain a significant improvement in the robustness of sequence-based image matching for place recognition. We use a UV-sensitive fisheye lens camera to segment sky from ground, providing illumination invariance, and encode the resulting binary images using spherical harmonics to enable rotation-invariant image matching. In combination, these methods also produce substantial pitch and roll invariance, as the spherical harmonics for the sky shape are minimally affected, providing the sky remains visible. We evaluate the performance of our method against a leading appearance-invariant technique (SeqSLAM) and a leading viewpoint-invariant technique (FAB-MAP 2.0) on three new outdoor datasets encompassing variable robot heading, tilt, and lighting conditions in both forested and urban environments. The system demonstrates improved condition- and tilt-invariance, enabling robust place recognition during aggressive zigzag manoeuvring along bumpy trails and at tilt angles of up to 60 degrees

    The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection

    Full text link
    Where am I? This is one of the most critical questions that any intelligent system should answer to decide whether it navigates to a previously visited area. This problem has long been acknowledged for its challenging nature in simultaneous localization and mapping (SLAM), wherein the robot needs to correctly associate the incoming sensory data to the database allowing consistent map generation. The significant advances in computer vision achieved over the last 20 years, the increased computational power, and the growing demand for long-term exploration contributed to efficiently performing such a complex task with inexpensive perception sensors. In this article, visual loop closure detection, which formulates a solution based solely on appearance input data, is surveyed. We start by briefly introducing place recognition and SLAM concepts in robotics. Then, we describe a loop closure detection system's structure, covering an extensive collection of topics, including the feature extraction, the environment representation, the decision-making step, and the evaluation process. We conclude by discussing open and new research challenges, particularly concerning the robustness in dynamic environments, the computational complexity, and scalability in long-term operations. The article aims to serve as a tutorial and a position paper for newcomers to visual loop closure detection.Comment: 25 pages, 15 figure

    CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning

    Full text link
    Visual navigation tasks in real-world environments often require both self-motion and place recognition feedback. While deep reinforcement learning has shown success in solving these perception and decision-making problems in an end-to-end manner, these algorithms require large amounts of experience to learn navigation policies from high-dimensional data, which is generally impractical for real robots due to sample complexity. In this paper, we address these problems with two main contributions. We first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal image representations that can then be used to effectively learn control policies from a small amount of experience. Second, we present an interactive framework, CityLearn, that enables for the first time training and deployment of navigation algorithms across city-sized, realistic environments with extreme visual appearance changes. CityLearn features more than 10 benchmark datasets, often used in visual place recognition and autonomous driving research, including over 100 recorded traversals across 60 cities around the world. We evaluate our approach on two CityLearn environments, training our navigation policy on a single traversal. Results show our method can be over 2 orders of magnitude faster than when using raw images, and can also generalize across extreme visual changes including day to night and summer to winter transitions.Comment: Preprint version of article accepted to ICRA 202
    corecore