1,531 research outputs found
Delta Descriptors: Change-Based Place Representation for Robust Visual Localization
Visual place recognition is challenging because there are so many factors
that can cause the appearance of a place to change, from day-night cycles to
seasonal change to atmospheric conditions. In recent years a large range of
approaches have been developed to address this challenge including deep-learnt
image descriptors, domain translation, and sequential filtering, all with
shortcomings including generality and velocity-sensitivity. In this paper we
propose a novel descriptor derived from tracking changes in any learned global
descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the
offsets induced in the original descriptor matching space in an unsupervised
manner by considering temporal differences across places observed along a
route. Like all other approaches, Delta Descriptors have a shortcoming -
volatility on a frame to frame basis - which can be overcome by combining them
with sequential filtering methods. Using two benchmark datasets, we first
demonstrate the high performance of Delta Descriptors in isolation, before
showing new state-of-the-art performance when combined with sequence-based
matching. We also present results demonstrating the approach working with four
different underlying descriptor types, and two other beneficial properties of
Delta Descriptors in comparison to existing techniques: their increased
inherent robustness to variations in camera motion and a reduced rate of
performance degradation as dimensional reduction is applied. Source code is
made available at https://github.com/oravus/DeltaDescriptors.Comment: 8 pages and 7 figures. Published in 2020 IEEE Robotics and Automation
Letters (RA-L
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics
Human visual scene understanding is so remarkable that we are able to
recognize a revisited place when entering it from the opposite direction it was
first visited, even in the presence of extreme variations in appearance. This
capability is especially apparent during driving: a human driver can recognize
where they are when travelling in the reverse direction along a route for the
first time, without having to turn back and look. The difficulty of this
problem exceeds any addressed in past appearance- and viewpoint-invariant
visual place recognition (VPR) research, in part because large parts of the
scene are not commonly observable from opposite directions. Consequently, as
shown in this paper, the precision-recall performance of current
state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders
of magnitude below what would be usable in a closed-loop system. Current
engineered solutions predominantly rely on panoramic camera or LIDAR sensing
setups; an eminently suitable engineering solution but one that is clearly very
different to how humans navigate, which also has implications for how naturally
humans could interact and communicate with the navigation system. In this paper
we develop a suite of novel semantic- and appearance-based techniques to enable
for the first time high performance place recognition in this challenging
scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of
images using the convolutional feature maps from a state-of-the-art dense
semantic segmentation network. Then, to verify the spatial semantic arrangement
of the top matching candidates, we develop a novel approach for mining
semantically-salient keypoint correspondences.Comment: Accepted for Robotics: Science and Systems (RSS) 2018. Source code
now available at https://github.com/oravus/lost
Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions
In recent years there has been significant improvement in the capability of
Visual Place Recognition (VPR) methods, building on the success of both
hand-crafted and learnt visual features, temporal filtering and usage of
semantic scene information. The wide range of approaches and the relatively
recent growth in interest in the field has meant that a wide range of datasets
and assessment methodologies have been proposed, often with a focus only on
precision-recall type metrics, making comparison difficult. In this paper we
present a comprehensive approach to evaluating the performance of 10
state-of-the-art recently-developed VPR techniques, which utilizes three
standardized metrics: (a) Matching Performance b) Matching Time c) Memory
Footprint. Together this analysis provides an up-to-date and widely
encompassing snapshot of the various strengths and weaknesses of contemporary
approaches to the VPR problem. The aim of this work is to help move this
particular research field towards a more mature and unified approach to the
problem, enabling better comparison and hence more progress to be made in
future research
SeqNet: Learning Descriptors for Sequence-based Hierarchical Place Recognition
Visual Place Recognition (VPR) is the task of matching current visual imagery
from a camera to images stored in a reference map of the environment. While
initial VPR systems used simple direct image methods or hand-crafted visual
features, recent work has focused on learning more powerful visual features and
further improving performance through either some form of sequential matcher /
filter or a hierarchical matching process. In both cases the performance of the
initial single-image based system is still far from perfect, putting
significant pressure on the sequence matching or (in the case of hierarchical
systems) pose refinement stages. In this paper we present a novel hybrid system
that creates a high performance initial match hypothesis generator using short
learnt sequential descriptors, which enable selective control sequential score
aggregation using single image learnt descriptors. Sequential descriptors are
generated using a temporal convolutional network dubbed SeqNet, encoding short
image sequences using 1-D convolutions, which are then matched against the
corresponding temporal descriptors from the reference dataset to provide an
ordered list of place match hypotheses. We then perform selective sequential
score aggregation using shortlisted single image learnt descriptors from a
separate pipeline to produce an overall place match hypothesis. Comprehensive
experiments on challenging benchmark datasets demonstrate the proposed method
outperforming recent state-of-the-art methods using the same amount of
sequential information. Source code and supplementary material can be found at
https://github.com/oravus/seqNet.Comment: Accepted for publication in IEEE RA-L 2021; includes supplementar
Visual sequence-based place recognition for changing conditions and varied viewpoints
Correctly identifying previously-visited locations is essential for robotic place recognition and localisation. This thesis presents training-free solutions to vision-based place recognition under changing environmental conditions and camera viewpoints. Using vision as a primary sensor, the proposed approaches combine image segmentation and rescaling techniques over sequences of visual imagery to enable successful place recognition over a range of challenging environments where prior techniques have failed
2D Visual Place Recognition for Domestic Service Robots at Night
Domestic service robots such as lawn mowing and vacuum cleaning robots are
the most numerous consumer robots in existence today. While early versions
employed random exploration, recent systems fielded by most of the major
manufacturers have utilized range-based and visual sensors and user-placed
beacons to enable robots to map and localize. However, active range and visual
sensing solutions have the disadvantages of being intrusive, expensive, or only
providing a 1D scan of the environment, while the requirement for beacon
placement imposes other practical limitations. In this paper we present a
passive and potentially cheap vision-based solution to 2D localization at night
that combines easily obtainable day-time maps with low resolution
contrast-normalized image matching algorithms, image sequence-based matching in
two-dimensions, place match interpolation and recent advances in conventional
low light camera technology. In a range of experiments over a domestic lawn and
in a lounge room, we demonstrate that the proposed approach enables 2D
localization at night, and analyse the effect on performance of varying
odometry noise levels, place match interpolation and sequence matching length.
Finally we benchmark the new low light camera technology and show how it can
enable robust place recognition even in an environment lit only by a moonless
sky, raising the tantalizing possibility of being able to apply all
conventional vision algorithms, even in the darkest of nights
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Visual localization enables autonomous vehicles to navigate in their
surroundings and augmented reality applications to link virtual to real worlds.
Practical visual localization approaches need to be robust to a wide variety of
viewing condition, including day-night changes, as well as weather and seasonal
variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera
pose estimates. In this paper, we introduce the first benchmark datasets
specifically designed for analyzing the impact of such factors on visual
localization. Using carefully created ground truth poses for query images taken
under a wide variety of conditions, we evaluate the impact of various factors
on 6DOF camera pose estimation accuracy through extensive experiments with
state-of-the-art localization approaches. Based on our results, we draw
conclusions about the difficulty of different conditions, showing that
long-term localization is far from solved, and propose promising avenues for
future work, including sequence-based localization approaches and the need for
better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh
Skyline-based localisation for aggressively manoeuvring robots using UV sensors and spherical harmonics
Place recognition is a key capability for navigating robots. While significant advances have been achieved on large, stable platforms such as robot cars, achieving robust performance on rapidly manoeuvring platforms in outdoor natural conditions remains a challenge, with few systems able to deal with both variable conditions and large tilt variations caused by rough terrain. Taking inspiration from biology, we propose a novel combination of sensory modality and image processing to obtain a significant improvement in the robustness of sequence-based image matching for place recognition. We use a UV-sensitive fisheye lens camera to segment sky from ground, providing illumination invariance, and encode the resulting binary images using spherical harmonics to enable rotation-invariant image matching. In combination, these methods also produce substantial pitch and roll invariance, as the spherical harmonics for the sky shape are minimally affected, providing the sky remains visible. We evaluate the performance of our method against a leading appearance-invariant technique (SeqSLAM) and a leading viewpoint-invariant technique (FAB-MAP 2.0) on three new outdoor datasets encompassing variable robot heading, tilt, and lighting conditions in both forested and urban environments. The system demonstrates improved condition- and tilt-invariance, enabling robust place recognition during aggressive zigzag manoeuvring along bumpy trails and at tilt angles of up to 60 degrees
The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection
Where am I? This is one of the most critical questions that any intelligent
system should answer to decide whether it navigates to a previously visited
area. This problem has long been acknowledged for its challenging nature in
simultaneous localization and mapping (SLAM), wherein the robot needs to
correctly associate the incoming sensory data to the database allowing
consistent map generation. The significant advances in computer vision achieved
over the last 20 years, the increased computational power, and the growing
demand for long-term exploration contributed to efficiently performing such a
complex task with inexpensive perception sensors. In this article, visual loop
closure detection, which formulates a solution based solely on appearance input
data, is surveyed. We start by briefly introducing place recognition and SLAM
concepts in robotics. Then, we describe a loop closure detection system's
structure, covering an extensive collection of topics, including the feature
extraction, the environment representation, the decision-making step, and the
evaluation process. We conclude by discussing open and new research challenges,
particularly concerning the robustness in dynamic environments, the
computational complexity, and scalability in long-term operations. The article
aims to serve as a tutorial and a position paper for newcomers to visual loop
closure detection.Comment: 25 pages, 15 figure
CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning
Visual navigation tasks in real-world environments often require both
self-motion and place recognition feedback. While deep reinforcement learning
has shown success in solving these perception and decision-making problems in
an end-to-end manner, these algorithms require large amounts of experience to
learn navigation policies from high-dimensional data, which is generally
impractical for real robots due to sample complexity. In this paper, we address
these problems with two main contributions. We first leverage place recognition
and deep learning techniques combined with goal destination feedback to
generate compact, bimodal image representations that can then be used to
effectively learn control policies from a small amount of experience. Second,
we present an interactive framework, CityLearn, that enables for the first time
training and deployment of navigation algorithms across city-sized, realistic
environments with extreme visual appearance changes. CityLearn features more
than 10 benchmark datasets, often used in visual place recognition and
autonomous driving research, including over 100 recorded traversals across 60
cities around the world. We evaluate our approach on two CityLearn
environments, training our navigation policy on a single traversal. Results
show our method can be over 2 orders of magnitude faster than when using raw
images, and can also generalize across extreme visual changes including day to
night and summer to winter transitions.Comment: Preprint version of article accepted to ICRA 202
- …