6,232 research outputs found
Compact Environment-Invariant Codes for Robust Visual Place Recognition
Robust visual place recognition (VPR) requires scene representations that are
invariant to various environmental challenges such as seasonal changes and
variations due to ambient lighting conditions during day and night. Moreover, a
practical VPR system necessitates compact representations of environmental
features. To satisfy these requirements, in this paper we suggest a
modification to the existing pipeline of VPR systems to incorporate supervised
hashing. The modified system learns (in a supervised setting) compact binary
codes from image feature descriptors. These binary codes imbibe robustness to
the visual variations exposed to it during the training phase, thereby, making
the system adaptive to severe environmental changes. Also, incorporating
supervised hashing makes VPR computationally more efficient and easy to
implement on simple hardware. This is because binary embeddings can be learned
over simple-to-compute features and the distance computation is also in the
low-dimensional hamming space of binary codes. We have performed experiments on
several challenging data sets covering seasonal, illumination and viewpoint
variations. We also compare two widely used supervised hashing methods of
CCAITQ and MLH and show that this new pipeline out-performs or closely matches
the state-of-the-art deep learning VPR methods that are based on
high-dimensional features extracted from pre-trained deep convolutional neural
networks.Comment: Conference on Computer and Robot Vision (CRV) 201
Self-localization from Images with Small Overlap
With the recent success of visual features from deep convolutional neural
networks (DCNN) in visual robot self-localization, it has become important and
practical to address more general self-localization scenarios. In this paper,
we address the scenario of self-localization from images with small overlap. We
explicitly introduce a localization difficulty index as a decreasing function
of view overlap between query and relevant database images and investigate
performance versus difficulty for challenging cross-view self-localization
tasks. We then reformulate the self-localization as a scalable
bag-of-visual-features (BoVF) scene retrieval and present an efficient solution
called PCA-NBNN, aiming to facilitate fast and yet discriminative
correspondence between partially overlapping images. The proposed approach
adopts recent findings in discriminativity preserving encoding of DCNN features
using principal component analysis (PCA) and cross-domain scene matching using
naive Bayes nearest neighbor distance metric (NBNN). We experimentally
demonstrate that the proposed PCA-NBNN framework frequently achieves comparable
results to previous DCNN features and that the BoVF model is significantly more
efficient. We further address an important alternative scenario of
"self-localization from images with NO overlap" and report the result.Comment: 8 pages, 9 figures, Draft of a paper submitted to an International
Conferenc
Vision-based Pose Estimation for Augmented Reality : A Comparison Study
Augmented reality aims to enrich our real world by inserting 3D virtual
objects. In order to accomplish this goal, it is important that virtual
elements are rendered and aligned in the real scene in an accurate and visually
acceptable way. The solution of this problem can be related to a pose
estimation and 3D camera localization. This paper presents a survey on
different approaches of 3D pose estimation in augmented reality and gives
classification of key-points-based techniques. The study given in this paper
may help both developers and researchers in the field of augmented reality.Comment: IEEE International Conference on Pattern Analysis and Intelligent
Systems PAIS'201
Point-cloud-based place recognition using CNN feature extraction
This paper proposes a novel point-cloud-based place recognition system that
adopts a deep learning approach for feature extraction. By using a
convolutional neural network pre-trained on color images to extract features
from a range image without fine-tuning on extra range images, significant
improvement has been observed when compared to using hand-crafted features. The
resulting system is illumination invariant, rotation invariant and robust
against moving objects that are unrelated to the place identity. Apart from the
system itself, we also bring to the community a new place recognition dataset
containing both point cloud and grayscale images covering a full
environmental view. In addition, the dataset is organized in such a way that it
facilitates experimental validation with respect to rotation invariance or
robustness against unrelated moving objects separately
Scalable Change Retrieval Using Deep 3D Neural Codes
We present a novel scalable framework for image change detection (ICD) from
an on-board 3D imagery system. We argue that existing ICD systems are
constrained by the time required to align a given query image with individual
reference image coordinates. We utilize an invariant coordinate system (ICS) to
replace the time-consuming image alignment with an offline pre-processing
procedure. Our key contribution is an extension of the traditional image
comparison-based ICD tasks to setups of the image retrieval (IR) task. We
replace each component of the 3D ICD system, i.e., (1) image modeling, (2)
image alignment, and (3) image differencing, with significantly efficient
variants from the bag-of-words (BoW) IR paradigm. Further, we train a deep 3D
feature extractor in an unsupervised manner using an unsupervised Siamese
network and automatically collected training data. We conducted experiments on
a challenging cross-season ICD task using a publicly available dataset and
thereby validate the efficacy of the proposed approach.Comment: 5 pages, 1 figure, technical repor
VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change
Visual Place Recognition (VPR) is the process of recognising a previously
visited place using visual information, often under varying appearance
conditions and viewpoint changes and with computational constraints. VPR is a
critical component of many autonomous navigation systems ranging from
autonomous vehicles to drones. While the concept of place recognition has been
around for many years, VPR research has grown rapidly as a field over the past
decade due to both improving camera hardware technologies and its suitability
for application of deep learning-based techniques. With this growth however has
come field fragmentation, lack of standardisation and a disconnect between
current performance metrics and the actual utility of a VPR technique at
application-deployment. In this paper we address these key challenges through a
new comprehensive open-source evaluation framework, dubbed 'VPR-Bench'.
VPR-Bench introduces two much-needed capabilities for researchers: firstly,
quantification of viewpoint and illumination variation, replacing what has
largely been assessed qualitatively in the past, and secondly, new metrics
'Extended precision' (EP), 'Performance-Per-Compute-Unit' (PCU) and 'Number of
Prospective Place Matching Candidates' (NPPMC). These new metrics complement
the limitations of traditional Precision-Recall curves, by providing measures
that are more informative to the wide range of potential VPR applications.
Mechanistically, we develop new unified templates that facilitate the
implementation, deployment and evaluation of a wide range of VPR techniques and
datasets. We incorporate the most comprehensive combination of state-of-the-art
VPR techniques and datasets to date into VPR-Bench and demonstrate how it
provides a rich range of previously inaccessible insights, such as the nuanced
relationship between viewpoint invariance, different types of VPR techniques
and datasets.Comment: Currently under-review, 25 pages, 16 figure
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
From handcrafted to deep local features
This paper presents an overview of the evolution of local features from
handcrafted to deep-learning-based methods, followed by a discussion of several
benchmarks and papers evaluating such local features. Our investigations are
motivated by 3D reconstruction problems, where the precise location of the
features is important. As we describe these methods, we highlight and explain
the challenges of feature extraction and potential ways to overcome them. We
first present handcrafted methods, followed by methods based on classical
machine learning and finally we discuss methods based on deep-learning. This
largely chronologically-ordered presentation will help the reader to fully
understand the topic of image and region description in order to make best use
of it in modern computer vision applications. In particular, understanding
handcrafted methods and their motivation can help to understand modern
approaches and how machine learning is used to improve the results. We also
provide references to most of the relevant literature and code.Comment: Preprin
Large scale visual place recognition with sub-linear storage growth
Robotic and animal mapping systems share many of the same objectives and
challenges, but differ in one key aspect: where much of the research in robotic
mapping has focused on solving the data association problem, the grid cell
neurons underlying maps in the mammalian brain appear to intentionally break
data association by encoding many locations with a single grid cell neuron. One
potential benefit of this intentional aliasing is both sub-linear map storage
and computational requirements growth with environment size, which we
demonstrated in a previous proof-of-concept study that detected and encoded
mutually complementary co-prime pattern frequencies in the visual map data. In
this research, we solve several of the key theoretical and practical
limitations of that prototype model and achieve significantly better sub-linear
storage growth, a factor reduction in storage requirements per map location,
scalability to large datasets on standard compute equipment and improved
robustness to environments with visually challenging appearance change. These
improvements are achieved through several innovations including a flexible
user-driven choice mechanism for the periodic patterns underlying the new
encoding method, a parallelized chunking technique that splits the map into
sub-sections processed in parallel and a novel feature selection approach that
selects only the image information most relevant to the encoded temporal
patterns. We evaluate our techniques on two large benchmark datasets with the
comparison to the previous state-of-the-art system, as well as providing a
detailed analysis of system performance with respect to parameters such as
required precision performance and the number of cyclic patterns encoded
Place recognition survey: An update on deep learning approaches
Autonomous Vehicles (AV) are becoming more capable of navigating in complex
environments with dynamic and changing conditions. A key component that enables
these intelligent vehicles to overcome such conditions and become more
autonomous is the sophistication of the perception and localization systems. As
part of the localization system, place recognition has benefited from recent
developments in other perception tasks such as place categorization or object
recognition, namely with the emergence of deep learning (DL) frameworks. This
paper surveys recent approaches and methods used in place recognition,
particularly those based on deep learning. The contributions of this work are
twofold: surveying recent sensors such as 3D LiDARs and RADARs, applied in
place recognition; and categorizing the various DL-based place recognition
works into supervised, unsupervised, semi-supervised, parallel, and
hierarchical categories. First, this survey introduces key place recognition
concepts to contextualize the reader. Then, sensor characteristics are
addressed. This survey proceeds by elaborating on the various DL-based works,
presenting summaries for each framework. Some lessons learned from this survey
include: the importance of NetVLAD for supervised end-to-end learning; the
advantages of unsupervised approaches in place recognition, namely for
cross-domain applications; or the increasing tendency of recent works to seek,
not only for higher performance but also for higher efficiency.Comment: Under review in IEEE Transactions on Intelligent Vehicles. This work
was submitted on the 13/01/2021 to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. Upon acceptance of the article by IEEE, the preprint
article will be replaced with the accepted versio
- …