389 research outputs found

    Leveraging BEV Representation for 360-degree Visual Place Recognition

    Full text link
    This paper investigates the advantages of using Bird's Eye View (BEV) representation in 360-degree visual place recognition (VPR). We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion, which bridges visual cues and spatial awareness. Our method extracts image features using standard convolutional networks and combines the features according to pre-defined 3D grid spatial points. To alleviate the mechanical and time misalignments between cameras, we further introduce deformable attention to learn the compensation. Upon the BEV feature representation, we then employ the polar transform and the Discrete Fourier transform for aggregation, which is shown to be rotation-invariant. In addition, the image and point cloud cues can be easily stated in the same coordinates, which benefits sensor fusion for place recognition. The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets, including on-the-road and off-the-road scenarios. The experimental results verify the hypothesis that BEV can benefit VPR by its superior performance compared to baseline methods. To the best of our knowledge, this is the first trial of employing BEV representation in this task

    HDPV-SLAM: Hybrid Depth-augmented Panoramic Visual SLAM for Mobile Mapping System with Tilted LiDAR and Panoramic Visual Camera

    Full text link
    This paper proposes a novel visual simultaneous localization and mapping (SLAM) system called Hybrid Depth-augmented Panoramic Visual SLAM (HDPV-SLAM), that employs a panoramic camera and a tilted multi-beam LiDAR scanner to generate accurate and metrically-scaled trajectories. RGB-D SLAM was the design basis for HDPV-SLAM, which added depth information to visual features. It aims to solve the two major issues hindering the performance of similar SLAM systems. The first obstacle is the sparseness of LiDAR depth, which makes it difficult to correlate it with the extracted visual features of the RGB image. A deep learning-based depth estimation module for iteratively densifying sparse LiDAR depth was suggested to address this issue. The second issue pertains to the difficulties in depth association caused by a lack of horizontal overlap between the panoramic camera and the tilted LiDAR sensor. To surmount this difficulty, we present a hybrid depth association module that optimally combines depth information estimated by two independent procedures, feature-based triangulation and depth estimation. During a phase of feature tracking, this hybrid depth association module aims to maximize the use of more accurate depth information between the triangulated depth with visual features tracked and the deep learning-based corrected depth. We evaluated the efficacy of HDPV-SLAM using the 18.95 km-long York University and Teledyne Optech (YUTO) MMS dataset. The experimental results demonstrate that the two proposed modules contribute substantially to the performance of HDPV-SLAM, which surpasses that of the state-of-the-art (SOTA) SLAM systems.Comment: 8 pages, 3 figures, To be published in IEEE International Conference on Automation Science and Engineering (CASE) 202

    PanoDepth - Panoramic Monocular Depth Perception Model and Framework

    Get PDF
    Depth perception has become a heavily researched area as companies and researchers are striving towards the development of self-driving cars. Self-driving cars rely on perceiving the surrounding area, which heavily depends on technology capable of providing the system with depth perception capabilities. In this paper, we explore developing a single camera (monocular) depth prediction model that is trained on panoramic depth images. Our model makes novel use of transfer learning efficient encoder models, pre-training on a larger dataset of flat depth images, and optimizing the model for use with a Jetson Nano. Additionally, we present a training and optimization framework to make developing and testing new monocular depth perception models easier and faster. While the model failed to achieve a high frame rate, the framework and models developed are a promising starting place for future work

    Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video

    Full text link
    We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video. Panoramic depth estimation is an important technology for applications such as virtual reality, 3D modeling, and autonomous robotic navigation. In contrast to previous approaches for applying convolutional neural networks to panoramic imagery, we use the cylindrical panoramic projection which allows for the use of the traditional CNN layers such as convolutional filters and max pooling without modification. Our evaluation of synthetic and real data shows that unsupervised learning of depth and ego-motion on cylindrical panoramic images can produce high-quality depth maps and that an increased field-of-view improves ego-motion estimation accuracy. We also introduce Headcam, a novel dataset of panoramic video collected from a helmet-mounted camera while biking in an urban setting.Comment: Accepted to IEEE AIVR 201
    corecore