45 research outputs found

    Generalized Contrastive Optimization of Siamese Networks for Place Recognition

    Full text link
    Visual place recognition is a challenging task in computer vision and a key component of camera-based localization and navigation systems. Recently, Convolutional Neural Networks (CNNs) achieved high results and good generalization capabilities. They are usually trained using pairs or triplets of images labeled as either similar or dissimilar, in a binary fashion. In practice, the similarity between two images is not binary, but rather continuous. Furthermore, training these CNNs is computationally complex and involves costly pair and triplet mining strategies. We propose a Generalized Contrastive loss (GCL) function that relies on image similarity as a continuous measure, and use it to train a siamese CNN. Furthermore, we propose three techniques for automatic annotation of image pairs with labels indicating their degree of similarity, and deploy them to re-annotate the MSLS, TB-Places, and 7Scenes datasets. We demonstrate that siamese CNNs trained using the GCL function and the improved annotations consistently outperform their binary counterparts. Our models trained on MSLS outperform the state-of-the-art methods, including NetVLAD, and generalize well on the Pittsburgh, TokyoTM and Tokyo 24/7 datasets. Furthermore, training a siamese network using the GCL function does not require complex pair mining. We release the source code at https://github.com/marialeyvallina/generalized_contrastive_loss.Comment: Under revie

    LiDAR-Based Place Recognition For Autonomous Driving: A Survey

    Full text link
    LiDAR-based place recognition (LPR) plays a pivotal role in autonomous driving, which assists Simultaneous Localization and Mapping (SLAM) systems in reducing accumulated errors and achieving reliable localization. However, existing reviews predominantly concentrate on visual place recognition (VPR) methods. Despite the recent remarkable progress in LPR, to the best of our knowledge, there is no dedicated systematic review in this area. This paper bridges the gap by providing a comprehensive review of place recognition methods employing LiDAR sensors, thus facilitating and encouraging further research. We commence by delving into the problem formulation of place recognition, exploring existing challenges, and describing relations to previous surveys. Subsequently, we conduct an in-depth review of related research, which offers detailed classifications, strengths and weaknesses, and architectures. Finally, we summarize existing datasets, commonly used evaluation metrics, and comprehensive evaluation results from various methods on public datasets. This paper can serve as a valuable tutorial for newcomers entering the field of place recognition and for researchers interested in long-term robot localization. We pledge to maintain an up-to-date project on our website https://github.com/ShiPC-AI/LPR-Survey.Comment: 26 pages,13 figures, 5 table

    SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data

    Full text link
    Place recognition is an important component for autonomous vehicles to achieve loop closing or global localization. In this paper, we tackle the problem of place recognition based on sequential 3D LiDAR scans obtained by an onboard LiDAR sensor. We propose a transformer-based network named SeqOT to exploit the temporal and spatial information provided by sequential range images generated from the LiDAR data. It uses multi-scale transformers to generate a global descriptor for each sequence of LiDAR range images in an end-to-end fashion. During online operation, our SeqOT finds similar places by matching such descriptors between the current query sequence and those stored in the map. We evaluate our approach on four datasets collected with different types of LiDAR sensors in different environments. The experimental results show that our method outperforms the state-of-the-art LiDAR-based place recognition methods and generalizes well across different environments. Furthermore, our method operates online faster than the frame rate of the sensor. The implementation of our method is released as open source at: https://github.com/BIT-MJY/SeqOT.Comment: Submitted to IEEE Transactions on Industrial Electronic

    Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

    Full text link
    Visual Place Recognition is a challenging task for robotics and autonomous systems, which must deal with the twin problems of appearance and viewpoint change in an always changing world. This paper introduces Patch-NetVLAD, which provides a novel formulation for combining the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals. Unlike the fixed spatial neighborhood regime of existing local keypoint features, our method enables aggregation and matching of deep-learned local features defined over the feature-space grid. We further introduce a multi-scale fusion of patch features that have complementary scales (i.e. patch sizes) via an integral feature space and show that the fused features are highly invariant to both condition (season, structure, and illumination) and viewpoint (translation and rotation) changes. Patch-NetVLAD outperforms both global and local feature descriptor-based methods with comparable compute, achieving state-of-the-art visual place recognition results on a range of challenging real-world datasets, including winning the Facebook Mapillary Visual Place Recognition Challenge at ECCV2020. It is also adaptable to user requirements, with a speed-optimised version operating over an order of magnitude faster than the state-of-the-art. By combining superior performance with improved computational efficiency in a configurable framework, Patch-NetVLAD is well suited to enhance both stand-alone place recognition capabilities and the overall performance of SLAM systems.Comment: Accepted to IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021

    Intelligent System for Depression Scale Estimation with Facial Expressions and Case Study in Industrial Intelligence

    Get PDF
    As a mental disorder, depression has affected people's lives, works, and so on. Researchers have proposed various industrial intelligent systems in the pattern recognition field for audiovisual depression detection. This paper presents an end‐to‐end trainable intelligent system to generate high‐level representations over the entire video clip. Specifically, a three‐dimensional (3D) convolutional neural network equipped with a module spatiotemporal feature aggregation module (STFAM) is trained from scratch on audio/visual emotion challenge (AVEC)2013 and AVEC2014 data, which can model the discriminative patterns closely related to depression. In the STFAM, channel and spatial attention mechanism and an aggregation method, namely 3D DEP‐NetVLAD, are integrated to learn the compact characteristic based on the feature maps. Extensive experiments on the two databases (i.e., AVEC2013 and AVEC2014) are illustrated that the proposed intelligent system can efficiently model the underlying depression patterns and obtain better performances over the most video‐based depression recognition approaches. Case studies are presented to describes the applicability of the proposed intelligent system for industrial intelligence.Peer reviewe

    Leveraging BEV Representation for 360-degree Visual Place Recognition

    Full text link
    This paper investigates the advantages of using Bird's Eye View (BEV) representation in 360-degree visual place recognition (VPR). We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion, which bridges visual cues and spatial awareness. Our method extracts image features using standard convolutional networks and combines the features according to pre-defined 3D grid spatial points. To alleviate the mechanical and time misalignments between cameras, we further introduce deformable attention to learn the compensation. Upon the BEV feature representation, we then employ the polar transform and the Discrete Fourier transform for aggregation, which is shown to be rotation-invariant. In addition, the image and point cloud cues can be easily stated in the same coordinates, which benefits sensor fusion for place recognition. The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets, including on-the-road and off-the-road scenarios. The experimental results verify the hypothesis that BEV can benefit VPR by its superior performance compared to baseline methods. To the best of our knowledge, this is the first trial of employing BEV representation in this task

    Deep Image Retrieval: A Survey

    Get PDF
    In recent years a vast amount of visual content has been generated and shared from various fields, such as social media platforms, medical images, and robotics. This abundance of content creation and sharing has introduced new challenges. In particular, searching databases for similar content, i.e.content based image retrieval (CBIR), is a long-established research area, and more efficient and accurate methods are needed for real time retrieval. Artificial intelligence has made progress in CBIR and has significantly facilitated the process of intelligent search. In this survey we organize and review recent CBIR works that are developed based on deep learning algorithms and techniques, including insights and techniques from recent papers. We identify and present the commonly-used benchmarks and evaluation methods used in the field. We collect common challenges and propose promising future directions. More specifically, we focus on image retrieval with deep learning and organize the state of the art methods according to the types of deep network structure, deep features, feature enhancement methods, and network fine-tuning strategies. Our survey considers a wide variety of recent methods, aiming to promote a global view of the field of instance-based CBIR.Comment: 20 pages, 11 figure

    SelFLoc: Selective Feature Fusion for Large-scale Point Cloud-based Place Recognition

    Full text link
    Point cloud-based place recognition is crucial for mobile robots and autonomous vehicles, especially when the global positioning sensor is not accessible. LiDAR points are scattered on the surface of objects and buildings, which have strong shape priors along different axes. To enhance message passing along particular axes, Stacked Asymmetric Convolution Block (SACB) is designed, which is one of the main contributions in this paper. Comprehensive experiments demonstrate that asymmetric convolution and its corresponding strategies employed by SACB can contribute to the more effective representation of point cloud feature. On this basis, Selective Feature Fusion Block (SFFB), which is formed by stacking point- and channel-wise gating layers in a predefined sequence, is proposed to selectively boost salient local features in certain key regions, as well as to align the features before fusion phase. SACBs and SFFBs are combined to construct a robust and accurate architecture for point cloud-based place recognition, which is termed SelFLoc. Comparative experimental results show that SelFLoc achieves the state-of-the-art (SOTA) performance on the Oxford and other three in-house benchmarks with an improvement of 1.6 absolute percentages on mean average recall@1
    corecore