1,177 research outputs found
Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition
When a human drives a car along a road for the first time, they later
recognize where they are on the return journey typically without needing to
look in their rear-view mirror or turn around to look back, despite significant
viewpoint and appearance change. Such navigation capabilities are typically
attributed to our semantic visual understanding of the environment [1] beyond
geometry to recognizing the types of places we are passing through such as
"passing a shop on the left" or "moving through a forested area". Humans are in
effect using place categorization [2] to perform specific place recognition
even when the viewpoint is 180 degrees reversed. Recent advances in deep neural
networks have enabled high-performance semantic understanding of visual places
and scenes, opening up the possibility of emulating what humans do. In this
work, we develop a novel methodology for using the semantics-aware higher-order
layers of deep neural networks for recognizing specific places from within a
reference database. To further improve the robustness to appearance change, we
develop a descriptor normalization scheme that builds on the success of
normalization schemes for pure appearance-based techniques such as SeqSLAM [3].
Using two different datasets - one road-based, one pedestrian-based, we
evaluate the performance of the system in performing place recognition on
reverse traversals of a route with a limited field of view camera and no
turn-back-and-look behaviours, and compare to existing state-of-the-art
techniques and vanilla off-the-shelf features. The results demonstrate
significant improvements over the existing state of the art, especially for
extreme perceptual challenges that involve both great viewpoint change and
environmental appearance change. We also provide experimental analyses of the
contributions of the various system components.Comment: 9 pages, 11 figures, ICRA 201
Fast, Compact and Highly Scalable Visual Place Recognition through Sequence-based Matching of Overloaded Representations
Visual place recognition algorithms trade off three key characteristics:
their storage footprint, their computational requirements, and their resultant
performance, often expressed in terms of recall rate. Significant prior work
has investigated highly compact place representations, sub-linear computational
scaling and sub-linear storage scaling techniques, but have always involved a
significant compromise in one or more of these regards, and have only been
demonstrated on relatively small datasets. In this paper we present a novel
place recognition system which enables for the first time the combination of
ultra-compact place representations, near sub-linear storage scaling and
extremely lightweight compute requirements. Our approach exploits the
inherently sequential nature of much spatial data in the robotics domain and
inverts the typical target criteria, through intentionally coarse scalar
quantization-based hashing that leads to more collisions but is resolved by
sequence-based matching. For the first time, we show how effective place
recognition rates can be achieved on a new very large 10 million place dataset,
requiring only 8 bytes of storage per place and 37K unitary operations to
achieve over 50% recall for matching a sequence of 100 frames, where a
conventional state-of-the-art approach both consumes 1300 times more compute
and fails catastrophically. We present analysis investigating the effectiveness
of our hashing overload approach under varying sizes of quantized vector
length, comparison of near miss matches with the actual match selections and
characterise the effect of variance re-scaling of data on quantization.Comment: 8 pages, 4 figures, Accepted for oral presentation at the 2020 IEEE
International Conference on Robotics and Automatio
SeqNet: Learning Descriptors for Sequence-based Hierarchical Place Recognition
Visual Place Recognition (VPR) is the task of matching current visual imagery
from a camera to images stored in a reference map of the environment. While
initial VPR systems used simple direct image methods or hand-crafted visual
features, recent work has focused on learning more powerful visual features and
further improving performance through either some form of sequential matcher /
filter or a hierarchical matching process. In both cases the performance of the
initial single-image based system is still far from perfect, putting
significant pressure on the sequence matching or (in the case of hierarchical
systems) pose refinement stages. In this paper we present a novel hybrid system
that creates a high performance initial match hypothesis generator using short
learnt sequential descriptors, which enable selective control sequential score
aggregation using single image learnt descriptors. Sequential descriptors are
generated using a temporal convolutional network dubbed SeqNet, encoding short
image sequences using 1-D convolutions, which are then matched against the
corresponding temporal descriptors from the reference dataset to provide an
ordered list of place match hypotheses. We then perform selective sequential
score aggregation using shortlisted single image learnt descriptors from a
separate pipeline to produce an overall place match hypothesis. Comprehensive
experiments on challenging benchmark datasets demonstrate the proposed method
outperforming recent state-of-the-art methods using the same amount of
sequential information. Source code and supplementary material can be found at
https://github.com/oravus/seqNet.Comment: Accepted for publication in IEEE RA-L 2021; includes supplementar
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics
Human visual scene understanding is so remarkable that we are able to
recognize a revisited place when entering it from the opposite direction it was
first visited, even in the presence of extreme variations in appearance. This
capability is especially apparent during driving: a human driver can recognize
where they are when travelling in the reverse direction along a route for the
first time, without having to turn back and look. The difficulty of this
problem exceeds any addressed in past appearance- and viewpoint-invariant
visual place recognition (VPR) research, in part because large parts of the
scene are not commonly observable from opposite directions. Consequently, as
shown in this paper, the precision-recall performance of current
state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders
of magnitude below what would be usable in a closed-loop system. Current
engineered solutions predominantly rely on panoramic camera or LIDAR sensing
setups; an eminently suitable engineering solution but one that is clearly very
different to how humans navigate, which also has implications for how naturally
humans could interact and communicate with the navigation system. In this paper
we develop a suite of novel semantic- and appearance-based techniques to enable
for the first time high performance place recognition in this challenging
scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of
images using the convolutional feature maps from a state-of-the-art dense
semantic segmentation network. Then, to verify the spatial semantic arrangement
of the top matching candidates, we develop a novel approach for mining
semantically-salient keypoint correspondences.Comment: Accepted for Robotics: Science and Systems (RSS) 2018. Source code
now available at https://github.com/oravus/lost
A Hierarchical Dual Model of Environment- and Place-Specific Utility for Visual Place Recognition
Visual Place Recognition (VPR) approaches have typically attempted to match
places by identifying visual cues, image regions or landmarks that have high
``utility'' in identifying a specific place. But this concept of utility is not
singular - rather it can take a range of forms. In this paper, we present a
novel approach to deduce two key types of utility for VPR: the utility of
visual cues `specific' to an environment, and to a particular place. We employ
contrastive learning principles to estimate both the environment- and
place-specific utility of Vector of Locally Aggregated Descriptors (VLAD)
clusters in an unsupervised manner, which is then used to guide local feature
matching through keypoint selection. By combining these two utility measures,
our approach achieves state-of-the-art performance on three challenging
benchmark datasets, while simultaneously reducing the required storage and
compute time. We provide further analysis demonstrating that unsupervised
cluster selection results in semantically meaningful results, that finer
grained categorization often has higher utility for VPR than high level
semantic categorization (e.g. building, road), and characterise how these two
utility measures vary across different places and environments. Source code is
made publicly available at https://github.com/Nik-V9/HEAPUtil.Comment: Accepted to IEEE Robotics and Automation Letters (RA-L) and IROS 202
PORTRAIT: a hybrid aPproach tO cReate extractive ground-TRuth summAry for dIsaster evenT
Disaster summarization approaches provide an overview of the important
information posted during disaster events on social media platforms, such as,
Twitter. However, the type of information posted significantly varies across
disasters depending on several factors like the location, type, severity, etc.
Verification of the effectiveness of disaster summarization approaches still
suffer due to the lack of availability of good spectrum of datasets along with
the ground-truth summary. Existing approaches for ground-truth summary
generation (ground-truth for extractive summarization) relies on the wisdom and
intuition of the annotators. Annotators are provided with a complete set of
input tweets from which a subset of tweets is selected by the annotators for
the summary. This process requires immense human effort and significant time.
Additionally, this intuition-based selection of the tweets might lead to a high
variance in summaries generated across annotators. Therefore, to handle these
challenges, we propose a hybrid (semi-automated) approach (PORTRAIT) where we
partly automate the ground-truth summary generation procedure. This approach
reduces the effort and time of the annotators while ensuring the quality of the
created ground-truth summary. We validate the effectiveness of PORTRAIT on 5
disaster events through quantitative and qualitative comparisons of
ground-truth summaries generated by existing intuitive approaches, a
semi-automated approach, and PORTRAIT. We prepare and release the ground-truth
summaries for 5 disaster events which consist of both natural and man-made
disaster events belonging to 4 different countries. Finally, we provide a study
about the performance of various state-of-the-art summarization approaches on
the ground-truth summaries generated by PORTRAIT using ROUGE-N F1-scores
Delta Descriptors: Change-Based Place Representation for Robust Visual Localization
Visual place recognition is challenging because there are so many factors
that can cause the appearance of a place to change, from day-night cycles to
seasonal change to atmospheric conditions. In recent years a large range of
approaches have been developed to address this challenge including deep-learnt
image descriptors, domain translation, and sequential filtering, all with
shortcomings including generality and velocity-sensitivity. In this paper we
propose a novel descriptor derived from tracking changes in any learned global
descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the
offsets induced in the original descriptor matching space in an unsupervised
manner by considering temporal differences across places observed along a
route. Like all other approaches, Delta Descriptors have a shortcoming -
volatility on a frame to frame basis - which can be overcome by combining them
with sequential filtering methods. Using two benchmark datasets, we first
demonstrate the high performance of Delta Descriptors in isolation, before
showing new state-of-the-art performance when combined with sequence-based
matching. We also present results demonstrating the approach working with four
different underlying descriptor types, and two other beneficial properties of
Delta Descriptors in comparison to existing techniques: their increased
inherent robustness to variations in camera motion and a reduced rate of
performance degradation as dimensional reduction is applied. Source code is
made available at https://github.com/oravus/DeltaDescriptors.Comment: 8 pages and 7 figures. Published in 2020 IEEE Robotics and Automation
Letters (RA-L
OntoDSumm : Ontology based Tweet Summarization for Disaster Events
The huge popularity of social media platforms like Twitter attracts a large
fraction of users to share real-time information and short situational messages
during disasters. A summary of these tweets is required by the government
organizations, agencies, and volunteers for efficient and quick disaster
response. However, the huge influx of tweets makes it difficult to manually get
a precise overview of ongoing events. To handle this challenge, several tweet
summarization approaches have been proposed. In most of the existing
literature, tweet summarization is broken into a two-step process where in the
first step, it categorizes tweets, and in the second step, it chooses
representative tweets from each category. There are both supervised as well as
unsupervised approaches found in literature to solve the problem of first step.
Supervised approaches requires huge amount of labelled data which incurs cost
as well as time. On the other hand, unsupervised approaches could not clusters
tweet properly due to the overlapping keywords, vocabulary size, lack of
understanding of semantic meaning etc. While, for the second step of
summarization, existing approaches applied different ranking methods where
those ranking methods are very generic which fail to compute proper importance
of a tweet respect to a disaster. Both the problems can be handled far better
with proper domain knowledge. In this paper, we exploited already existing
domain knowledge by the means of ontology in both the steps and proposed a
novel disaster summarization method OntoDSumm. We evaluate this proposed method
with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results
reveal that OntoDSumm outperforms existing methods by approximately 2-66% in
terms of ROUGE-1 F1 score
- …