17 research outputs found
Feature Map Filtering: Improving Visual Place Recognition with Convolutional Calibration
Convolutional Neural Networks (CNNs) have recently been shown to excel at
performing visual place recognition under changing appearance and viewpoint.
Previously, place recognition has been improved by intelligently selecting
relevant spatial keypoints within a convolutional layer and also by selecting
the optimal layer to use. Rather than extracting features out of a particular
layer, or a particular set of spatial keypoints within a layer, we propose the
extraction of features using a subset of the channel dimensionality within a
layer. Each feature map learns to encode a different set of weights that
activate for different visual features within the set of training images. We
propose a method of calibrating a CNN-based visual place recognition system,
which selects the subset of feature maps that best encodes the visual features
that are consistent between two different appearances of the same location.
Using just 50 calibration images, all collected at the beginning of the current
environment, we demonstrate a significant and consistent recognition
improvement across multiple layers for two different neural networks. We
evaluate our proposal on three datasets with different types of appearance
changes - afternoon to morning, winter to summer and night to day.
Additionally, the dimensionality reduction approach improves the computational
processing speed of the recognition system.Comment: Accepted to the Australasian Conference on Robotics and Automation
201
CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning
Visual navigation tasks in real-world environments often require both
self-motion and place recognition feedback. While deep reinforcement learning
has shown success in solving these perception and decision-making problems in
an end-to-end manner, these algorithms require large amounts of experience to
learn navigation policies from high-dimensional data, which is generally
impractical for real robots due to sample complexity. In this paper, we address
these problems with two main contributions. We first leverage place recognition
and deep learning techniques combined with goal destination feedback to
generate compact, bimodal image representations that can then be used to
effectively learn control policies from a small amount of experience. Second,
we present an interactive framework, CityLearn, that enables for the first time
training and deployment of navigation algorithms across city-sized, realistic
environments with extreme visual appearance changes. CityLearn features more
than 10 benchmark datasets, often used in visual place recognition and
autonomous driving research, including over 100 recorded traversals across 60
cities around the world. We evaluate our approach on two CityLearn
environments, training our navigation policy on a single traversal. Results
show our method can be over 2 orders of magnitude faster than when using raw
images, and can also generalize across extreme visual changes including day to
night and summer to winter transitions.Comment: Preprint version of article accepted to ICRA 202
Fast, Compact and Highly Scalable Visual Place Recognition through Sequence-based Matching of Overloaded Representations
Visual place recognition algorithms trade off three key characteristics:
their storage footprint, their computational requirements, and their resultant
performance, often expressed in terms of recall rate. Significant prior work
has investigated highly compact place representations, sub-linear computational
scaling and sub-linear storage scaling techniques, but have always involved a
significant compromise in one or more of these regards, and have only been
demonstrated on relatively small datasets. In this paper we present a novel
place recognition system which enables for the first time the combination of
ultra-compact place representations, near sub-linear storage scaling and
extremely lightweight compute requirements. Our approach exploits the
inherently sequential nature of much spatial data in the robotics domain and
inverts the typical target criteria, through intentionally coarse scalar
quantization-based hashing that leads to more collisions but is resolved by
sequence-based matching. For the first time, we show how effective place
recognition rates can be achieved on a new very large 10 million place dataset,
requiring only 8 bytes of storage per place and 37K unitary operations to
achieve over 50% recall for matching a sequence of 100 frames, where a
conventional state-of-the-art approach both consumes 1300 times more compute
and fails catastrophically. We present analysis investigating the effectiveness
of our hashing overload approach under varying sizes of quantized vector
length, comparison of near miss matches with the actual match selections and
characterise the effect of variance re-scaling of data on quantization.Comment: 8 pages, 4 figures, Accepted for oral presentation at the 2020 IEEE
International Conference on Robotics and Automatio
Intelligent Reference Curation for Visual Place Recognition via Bayesian Selective Fusion
A key challenge in visual place recognition (VPR) is recognizing places
despite drastic visual appearance changes due to factors such as time of day,
season, weather or lighting conditions. Numerous approaches based on
deep-learnt image descriptors, sequence matching, domain translation, and
probabilistic localization have had success in addressing this challenge, but
most rely on the availability of carefully curated representative reference
images of the possible places. In this paper, we propose a novel approach,
dubbed Bayesian Selective Fusion, for actively selecting and fusing informative
reference images to determine the best place match for a given query image. The
selective element of our approach avoids the counterproductive fusion of every
reference image and enables the dynamic selection of informative reference
images in environments with changing visual conditions (such as indoors with
flickering lights, outdoors during sunshowers or over the day-night cycle). The
probabilistic element of our approach provides a means of fusing multiple
reference images that accounts for their varying uncertainty via a novel
training-free likelihood function for VPR. On difficult query images from two
benchmark datasets, we demonstrate that our approach matches and exceeds the
performance of several alternative fusion approaches along with
state-of-the-art techniques that are provided with prior (unfair) knowledge of
the best reference images. Our approach is well suited for long-term robot
autonomy where dynamic visual environments are commonplace since it is
training-free, descriptor-agnostic, and complements existing techniques such as
sequence matching.Comment: 8 pages, 10 figures, accepted in the IEEE Robotics and Automation
Letter