14 research outputs found
Deep Learning Features at Scale for Visual Place Recognition
The success of deep learning techniques in the computer vision domain has
triggered a range of initial investigations into their utility for visual place
recognition, all using generic features from networks that were trained for
other types of recognition tasks. In this paper, we train, at large scale, two
CNN architectures for the specific place recognition task and employ a
multi-scale feature encoding method to generate condition- and
viewpoint-invariant features. To enable this training to occur, we have
developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of
place appearance change at thousands of different places, as opposed to the
semantic place type datasets currently available. This new dataset enables us
to set up a training regime that interprets place recognition as a
classification problem. We comprehensively evaluate our trained networks on
several challenging benchmark place recognition datasets and demonstrate that
they achieve an average 10% increase in performance over other place
recognition algorithms and pre-trained CNNs. By analyzing the network responses
and their differences from pre-trained networks, we provide insights into what
a network learns when training for place recognition, and what these results
signify for future research in this area.Comment: 8 pages, 10 figures. Accepted by International Conference on Robotics
and Automation (ICRA) 2017. This is the submitted version. The final
published version may be slightly differen
Look No Further: Adapting the Localization Sensory Window to the Temporal Characteristics of the Environment
Many localization algorithms use a spatiotemporal window of sensory
information in order to recognize spatial locations, and the length of this
window is often a sensitive parameter that must be tuned to the specifics of
the application. This letter presents a general method for environment-driven
variation of the length of the spatiotemporal window based on searching for the
most significant localization hypothesis, to use as much context as is
appropriate but not more. We evaluate this approach on benchmark datasets using
visual and Wi-Fi sensor modalities and a variety of sensory comparison
front-ends under in-order and out-of-order traversals of the environment. Our
results show that the system greatly reduces the maximum distance traveled
without localization compared to a fixed-length approach while achieving
competitive localization accuracy, and our proposed method achieves this
performance without deployment-time tuning.Comment: Pre-print of article appearing in 2017 IEEE Robotics and Automation
Letters. v2: incorporated reviewer feedbac
Automated Map Reading: Image Based Localisation in 2-D Maps Using Binary Semantic Descriptors
We describe a novel approach to image based localisation in urban
environments using semantic matching between images and a 2-D map. It contrasts
with the vast majority of existing approaches which use image to image database
matching. We use highly compact binary descriptors to represent semantic
features at locations, significantly increasing scalability compared with
existing methods and having the potential for greater invariance to variable
imaging conditions. The approach is also more akin to human map reading, making
it more suited to human-system interaction. The binary descriptors indicate the
presence or not of semantic features relating to buildings and road junctions
in discrete viewing directions. We use CNN classifiers to detect the features
in images and match descriptor estimates with a database of location tagged
descriptors derived from the 2-D map. In isolation, the descriptors are not
sufficiently discriminative, but when concatenated sequentially along a route,
their combination becomes highly distinctive and allows localisation even when
using non-perfect classifiers. Performance is further improved by taking into
account left or right turns over a route. Experimental results obtained using
Google StreetView and OpenStreetMap data show that the approach has
considerable potential, achieving localisation accuracy of around 85% using
routes corresponding to approximately 200 meters.Comment: 8 pages, submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems 201
Multi-Session Visual SLAM for Illumination Invariant Localization in Indoor Environments
For robots navigating using only a camera, illumination changes in indoor
environments can cause localization failures during autonomous navigation. In
this paper, we present a multi-session visual SLAM approach to create a map
made of multiple variations of the same locations in different illumination
conditions. The multi-session map can then be used at any hour of the day for
improved localization capability. The approach presented is independent of the
visual features used, and this is demonstrated by comparing localization
performance between multi-session maps created using the RTAB-Map library with
SURF, SIFT, BRIEF, FREAK, BRISK, KAZE, DAISY and SuperPoint visual features.
The approach is tested on six mapping and six localization sessions recorded at
30 minutes intervals during sunset using a Google Tango phone in a real
apartment.Comment: 6 pages, 5 figure