23 research outputs found

    Feature Map Filtering: Improving Visual Place Recognition with Convolutional Calibration

    Full text link
    Convolutional Neural Networks (CNNs) have recently been shown to excel at performing visual place recognition under changing appearance and viewpoint. Previously, place recognition has been improved by intelligently selecting relevant spatial keypoints within a convolutional layer and also by selecting the optimal layer to use. Rather than extracting features out of a particular layer, or a particular set of spatial keypoints within a layer, we propose the extraction of features using a subset of the channel dimensionality within a layer. Each feature map learns to encode a different set of weights that activate for different visual features within the set of training images. We propose a method of calibrating a CNN-based visual place recognition system, which selects the subset of feature maps that best encodes the visual features that are consistent between two different appearances of the same location. Using just 50 calibration images, all collected at the beginning of the current environment, we demonstrate a significant and consistent recognition improvement across multiple layers for two different neural networks. We evaluate our proposal on three datasets with different types of appearance changes - afternoon to morning, winter to summer and night to day. Additionally, the dimensionality reduction approach improves the computational processing speed of the recognition system.Comment: Accepted to the Australasian Conference on Robotics and Automation 201

    Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

    Full text link
    Visual Place Recognition is a challenging task for robotics and autonomous systems, which must deal with the twin problems of appearance and viewpoint change in an always changing world. This paper introduces Patch-NetVLAD, which provides a novel formulation for combining the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals. Unlike the fixed spatial neighborhood regime of existing local keypoint features, our method enables aggregation and matching of deep-learned local features defined over the feature-space grid. We further introduce a multi-scale fusion of patch features that have complementary scales (i.e. patch sizes) via an integral feature space and show that the fused features are highly invariant to both condition (season, structure, and illumination) and viewpoint (translation and rotation) changes. Patch-NetVLAD outperforms both global and local feature descriptor-based methods with comparable compute, achieving state-of-the-art visual place recognition results on a range of challenging real-world datasets, including winning the Facebook Mapillary Visual Place Recognition Challenge at ECCV2020. It is also adaptable to user requirements, with a speed-optimised version operating over an order of magnitude faster than the state-of-the-art. By combining superior performance with improved computational efficiency in a configurable framework, Patch-NetVLAD is well suited to enhance both stand-alone place recognition capabilities and the overall performance of SLAM systems.Comment: Accepted to IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021

    GeoAdapt: Self-Supervised Test-Time Adaptation in LiDAR Place Recognition Using Geometric Priors

    Full text link
    LiDAR place recognition approaches based on deep learning suffer from significant performance degradation when there is a shift between the distribution of training and test datasets, often requiring re-training the networks to achieve peak performance. However, obtaining accurate ground truth data for new training data can be prohibitively expensive, especially in complex or GPS-deprived environments. To address this issue we propose GeoAdapt, which introduces a novel auxiliary classification head to generate pseudo-labels for re-training on unseen environments in a self-supervised manner. GeoAdapt uses geometric consistency as a prior to improve the robustness of our generated pseudo-labels against domain shift, improving the performance and reliability of our Test-Time Adaptation approach. Comprehensive experiments show that GeoAdapt significantly boosts place recognition performance across moderate to severe domain shifts, and is competitive with fully supervised test-time adaptation approaches. Our code is available at https://github.com/csiro-robotics/GeoAdapt.Comment: Accepted to IEEE Robotics and Automation Letters (RA-L) November 202

    Communications Biophysics

    Get PDF
    Contains reports on four research projects.National Institutes of Health (Grant 5 P01 NS13126-02)National Institutes of Health (Grant 5 K04 NS00113-03)National Institutes of Health (Grant 2 ROI NS11153-02A1)National Science Foundation (Grant BNS77-16861)National Institutes of Health (Grant 5 RO1 NS10916-03)National Institutes of Health (Fellowship 1 F32 NS05327)National Institutes of Health (Grant 5 ROI NS12846-02)National Institutes of Health (Fellowship 1 F32 NS05266)Edith E. Sturgis FoundationNational Institutes of Health (Grant 1 R01 NS11680-01)National Institutes of Health (Grant 2 RO1 NS11080-04)National Institutes of Health (Grant 5 T32 GIM107301-03)National Institutes of Health (Grant 5 TOI GM01555-10

    Appearance and viewpoint invariant visual place recognition using multi-scale and multi-modality systems

    No full text
    This thesis provides solutions to the problem of visual place recognition, that is, the ability for an autonomous system to recognise where it is in the world using images. The thesis shows how novel ensemble and multi-scale methods can be combined with modern artificial intelligence techniques to provide robust and reliable place recognition capabilities. It provides autonomous systems with enhanced positioning capabilities that can facilitate navigation even in challenging environmental conditions, such as at night or during adverse weather

    Hierarchical Multi-Process Fusion for Visual Place Recognition

    No full text
    Combining multiple complementary techniques together has long been regarded as a way to improve performance. In visual localization, multi-sensor fusion, multi-process fusion of a single sensing modality, and even combinations of different localization techniques have been shown to result in improved performance. However, merely fusing together different localization techniques does not account for the varying performance characteristics of different localization techniques. In this paper we present a novel, hierarchical localization system that explicitly benefits from three varying characteristics of localization techniques: the distribution of their localization hypotheses, their appearance- and viewpointinvariant properties, and the resulting differences in where in an environment each system works well and fails. We show how two techniques deployed hierarchically work better than in parallel fusion, how combining two different techniques works better than two levels of a single technique, even when the single technique has superior individual performance, and develop two and three-tier hierarchical structures that progressively improve localization performance. Finally, we develop a stacked hierarchical framework where localization hypotheses from techniques with complementary characteristics are concatenated at each layer, significantly improving retention of the correct hypothesis through to the final localization stage. Using two challenging datasets, we show the proposed system outperforming state-of-the-art techniques.</p

    Filter early, match late: Improving network-based visual place recognition

    No full text
    CNNs have excelled at performing place recognition over time, particularly when the neural network is optimized for localization in the current environmental conditions. In this paper we investigate the concept of feature map filtering, where, rather than using all the activations within a convolutional tensor, only the most useful activations are used. Since specific feature maps encode different visual features, the objective is to remove feature maps that detract from the ability to recognize a location across appearance changes. Our key innovation is to filter the feature maps in an early convolutional layer, but then continue to run the network and extract a feature vector using a later, more viewpoint invariant layer in the same network. Our approach improves the condition and viewpoint invariance of a pre-trained network, using as little as a single training image pair from the deployment environment. An exhaustive experimental analysis is performed to determine the full scope of causality between early layer filtering and late layer extraction. For validity, we use three datasets: Oxford RobotCar, Nordland, and Gardens Point, achieving overall superior performance to NetVLAD. The work provides a number of new avenues for exploring CNN optimizations, without requiring any re-training of the network weights.</p

    Feature map filtering: Improving visual place recognition with convolutional calibration

    No full text
    Free to read on publisher's website Convolutional Neural Networks (CNNs) have recently been shown to excel at performing visual place recognition under changing appearance and viewpoint. Previously, place recognition has been improved by intelligently selecting relevant spatial keypoints within a convolutional layer and also by selecting the optimal layer to use. Rather than extracting features out of a particular layer, or a particular set of spatial keypoints within a layer, we propose the extraction of features using a subset of the channel dimensionality within a layer. Each feature map learns to encode a different set of weights that activate for different visual features within the set of training images. We propose a method of calibrating a CNN-based visual place recognition system, which selects the subset of feature maps that best encodes the visual features that are consistent between two different appearances of the same location. Using just 50 calibration images, all collected at the beginning of the current environment, we demonstrate a significant and consistent recognition improvement across multiple layers for two different neural networks. We evaluate our proposal on three datasets with different types of appearance changes - afternoon to morning, winter to summer and night to day. Additionally, the dimensionality reduction approach improves the computational processing speed of the recognition system

    Multi-process fusion: Visual place recognition using multiple image processing methods

    No full text
    Typical attempts to improve the capability of visual place recognition techniques include the use of multi-sensor fusion and the integration of information over time from image sequences. These approaches can improve performance but have disadvantages, including the need for multiple physical sensors and calibration processes, both for multiple sensors and for tuning the image matching sequence length. In this letter, we address these shortcomings with a novel 'multi-sensor' fusion approach applied to multiple image processing methods for a single visual image stream, combined with a dynamic sequence matching length technique and an automatic weighting scheme. In contrast to conventional single method approaches, our approach reduces the performance requirements of a single image processing methodology, instead requiring that within the suite of image processing methods, at least one performs well in any particular environment. In comparison to static sequence length techniques, the dynamic sequence matching technique enables reduced localization latencies through analysis of recognition quality metrics when re-entering familiar locations. We evaluate our approach on multiple challenging benchmark datasets, achieving superior performance to two state-of-the-art visual place recognition systems across environmental changes including winter to summer, afternoon to morning, and night to day. Across the four benchmark datasets our proposed approach achieves an average F1 score of 0.95, compared to 0.87 for NetVLAD and 0.46 for SeqSLAM. We provide source code for the multi-fusion method and present analysis explaining how superior performance is achieved despite the multiple, disparate, image processing methods all being applied to a single source of imagery, rather than to multiple separate sensors.</p

    Boosting Performance of a Baseline Visual Place Recognition Technique by Predicting the Maximally Complementary Technique

    No full text
    One recent promising approach to the Visual Place Recognition (VPR) problem has been to fuse the place recognition estimates of multiple complementary VPR techniques using methods such as shared representative appearance learning (SRAL) and multi-process fusion. These approaches come with a substantial practical limitation: they require all potential VPR methods to be brute-force run before they are selectively fused. The obvious solution to this limitation is to predict the viable subset of methods ahead of time, but this is challenging because it requires a predictive signal within the imagery itself that is indicative of high performance methods. Here we propose an alternative approach that instead starts with a known single base VPR technique, and learns to predict the most complementary additional VPR technique to fuse with it, that results in the largest improvement in performance. The key innovation here is to use a dimensionally reduced difference vector between the query image and the top-retrieved reference image using this baseline technique as the predictive signal of the most complementary additional technique, both during training and inference. We demonstrate that our approach can train a single network to select performant, complementary technique pairs across datasets which span multiple modes of transportation (train, car, walking) as well as to generalise to unseen datasets, outperforming multiple baseline strategies for manually selecting the best technique pairs based on the same training data.</p
    corecore