6 research outputs found
VNI-Net: Vector Neurons-based Rotation-Invariant Descriptor for LiDAR Place Recognition
LiDAR-based place recognition plays a crucial role in Simultaneous
Localization and Mapping (SLAM) and LiDAR localization.
Despite the emergence of various deep learning-based and hand-crafting-based
methods, rotation-induced place recognition failure remains a critical
challenge.
Existing studies address this limitation through specific training strategies
or network structures.
However, the former does not produce satisfactory results, while the latter
focuses mainly on the reduced problem of SO(2) rotation invariance. Methods
targeting SO(3) rotation invariance suffer from limitations in discrimination
capability.
In this paper, we propose a new method that employs Vector Neurons Network
(VNN) to achieve SO(3) rotation invariance.
We first extract rotation-equivariant features from neighboring points and
map low-dimensional features to a high-dimensional space through VNN.
Afterwards, we calculate the Euclidean and Cosine distance in the
rotation-equivariant feature space as rotation-invariant feature descriptors.
Finally, we aggregate the features using GeM pooling to obtain global
descriptors.
To address the significant information loss when formulating
rotation-invariant descriptors, we propose computing distances between features
at different layers within the Euclidean space neighborhood.
This greatly improves the discriminability of the point cloud descriptors
while ensuring computational efficiency.
Experimental results on public datasets show that our approach significantly
outperforms other baseline methods implementing rotation invariance, while
achieving comparable results with current state-of-the-art place recognition
methods that do not consider rotation issues
Data-Efficient Large Scale Place Recognition with Graded Similarity Supervision
Visual place recognition (VPR) is a fundamental task of computer vision for visual localization. Existing methods are trained using image pairs that either depict the same place or not. Such a binary indication does not consider continuous relations of similarity between images of the same place taken from different positions, determined by the continuous nature of camera pose. The binary similarity induces a noisy supervision signal into the training of VPR methods, which stall in local minima and require expensive hard mining algorithms to guarantee convergence. Motivated by the fact that two images of the same place only partially share visual cues due to camera pose differences, we deploy an automatic re-annotation strategy to re-label VPR datasets. We compute graded similarity labels for image pairs based on available localization metadata. Furthermore, we propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks. We demonstrate that the use of the new labels and GCL allow to dispense from hard-pair mining, and to train image descriptors that perform better in VPR by nearest neighbor search, obtaining superior or comparable results than methods that require expensive hard-pair mining and re-ranking techniques.</p
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
Visual Place Recognition is a challenging task for robotics and autonomous
systems, which must deal with the twin problems of appearance and viewpoint
change in an always changing world. This paper introduces Patch-NetVLAD, which
provides a novel formulation for combining the advantages of both local and
global descriptor methods by deriving patch-level features from NetVLAD
residuals. Unlike the fixed spatial neighborhood regime of existing local
keypoint features, our method enables aggregation and matching of deep-learned
local features defined over the feature-space grid. We further introduce a
multi-scale fusion of patch features that have complementary scales (i.e. patch
sizes) via an integral feature space and show that the fused features are
highly invariant to both condition (season, structure, and illumination) and
viewpoint (translation and rotation) changes. Patch-NetVLAD outperforms both
global and local feature descriptor-based methods with comparable compute,
achieving state-of-the-art visual place recognition results on a range of
challenging real-world datasets, including winning the Facebook Mapillary
Visual Place Recognition Challenge at ECCV2020. It is also adaptable to user
requirements, with a speed-optimised version operating over an order of
magnitude faster than the state-of-the-art. By combining superior performance
with improved computational efficiency in a configurable framework,
Patch-NetVLAD is well suited to enhance both stand-alone place recognition
capabilities and the overall performance of SLAM systems.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2021