4,441 research outputs found
Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition
When a human drives a car along a road for the first time, they later
recognize where they are on the return journey typically without needing to
look in their rear-view mirror or turn around to look back, despite significant
viewpoint and appearance change. Such navigation capabilities are typically
attributed to our semantic visual understanding of the environment [1] beyond
geometry to recognizing the types of places we are passing through such as
"passing a shop on the left" or "moving through a forested area". Humans are in
effect using place categorization [2] to perform specific place recognition
even when the viewpoint is 180 degrees reversed. Recent advances in deep neural
networks have enabled high-performance semantic understanding of visual places
and scenes, opening up the possibility of emulating what humans do. In this
work, we develop a novel methodology for using the semantics-aware higher-order
layers of deep neural networks for recognizing specific places from within a
reference database. To further improve the robustness to appearance change, we
develop a descriptor normalization scheme that builds on the success of
normalization schemes for pure appearance-based techniques such as SeqSLAM [3].
Using two different datasets - one road-based, one pedestrian-based, we
evaluate the performance of the system in performing place recognition on
reverse traversals of a route with a limited field of view camera and no
turn-back-and-look behaviours, and compare to existing state-of-the-art
techniques and vanilla off-the-shelf features. The results demonstrate
significant improvements over the existing state of the art, especially for
extreme perceptual challenges that involve both great viewpoint change and
environmental appearance change. We also provide experimental analyses of the
contributions of the various system components.Comment: 9 pages, 11 figures, ICRA 201
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics
Human visual scene understanding is so remarkable that we are able to
recognize a revisited place when entering it from the opposite direction it was
first visited, even in the presence of extreme variations in appearance. This
capability is especially apparent during driving: a human driver can recognize
where they are when travelling in the reverse direction along a route for the
first time, without having to turn back and look. The difficulty of this
problem exceeds any addressed in past appearance- and viewpoint-invariant
visual place recognition (VPR) research, in part because large parts of the
scene are not commonly observable from opposite directions. Consequently, as
shown in this paper, the precision-recall performance of current
state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders
of magnitude below what would be usable in a closed-loop system. Current
engineered solutions predominantly rely on panoramic camera or LIDAR sensing
setups; an eminently suitable engineering solution but one that is clearly very
different to how humans navigate, which also has implications for how naturally
humans could interact and communicate with the navigation system. In this paper
we develop a suite of novel semantic- and appearance-based techniques to enable
for the first time high performance place recognition in this challenging
scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of
images using the convolutional feature maps from a state-of-the-art dense
semantic segmentation network. Then, to verify the spatial semantic arrangement
of the top matching candidates, we develop a novel approach for mining
semantically-salient keypoint correspondences.Comment: Accepted for Robotics: Science and Systems (RSS) 2018. Source code
now available at https://github.com/oravus/lost
Feature Map Filtering: Improving Visual Place Recognition with Convolutional Calibration
Convolutional Neural Networks (CNNs) have recently been shown to excel at
performing visual place recognition under changing appearance and viewpoint.
Previously, place recognition has been improved by intelligently selecting
relevant spatial keypoints within a convolutional layer and also by selecting
the optimal layer to use. Rather than extracting features out of a particular
layer, or a particular set of spatial keypoints within a layer, we propose the
extraction of features using a subset of the channel dimensionality within a
layer. Each feature map learns to encode a different set of weights that
activate for different visual features within the set of training images. We
propose a method of calibrating a CNN-based visual place recognition system,
which selects the subset of feature maps that best encodes the visual features
that are consistent between two different appearances of the same location.
Using just 50 calibration images, all collected at the beginning of the current
environment, we demonstrate a significant and consistent recognition
improvement across multiple layers for two different neural networks. We
evaluate our proposal on three datasets with different types of appearance
changes - afternoon to morning, winter to summer and night to day.
Additionally, the dimensionality reduction approach improves the computational
processing speed of the recognition system.Comment: Accepted to the Australasian Conference on Robotics and Automation
201
- …