389 research outputs found
Leveraging BEV Representation for 360-degree Visual Place Recognition
This paper investigates the advantages of using Bird's Eye View (BEV)
representation in 360-degree visual place recognition (VPR). We propose a novel
network architecture that utilizes the BEV representation in feature
extraction, feature aggregation, and vision-LiDAR fusion, which bridges visual
cues and spatial awareness. Our method extracts image features using standard
convolutional networks and combines the features according to pre-defined 3D
grid spatial points. To alleviate the mechanical and time misalignments between
cameras, we further introduce deformable attention to learn the compensation.
Upon the BEV feature representation, we then employ the polar transform and the
Discrete Fourier transform for aggregation, which is shown to be
rotation-invariant. In addition, the image and point cloud cues can be easily
stated in the same coordinates, which benefits sensor fusion for place
recognition. The proposed BEV-based method is evaluated in ablation and
comparative studies on two datasets, including on-the-road and off-the-road
scenarios. The experimental results verify the hypothesis that BEV can benefit
VPR by its superior performance compared to baseline methods. To the best of
our knowledge, this is the first trial of employing BEV representation in this
task
HDPV-SLAM: Hybrid Depth-augmented Panoramic Visual SLAM for Mobile Mapping System with Tilted LiDAR and Panoramic Visual Camera
This paper proposes a novel visual simultaneous localization and mapping
(SLAM) system called Hybrid Depth-augmented Panoramic Visual SLAM (HDPV-SLAM),
that employs a panoramic camera and a tilted multi-beam LiDAR scanner to
generate accurate and metrically-scaled trajectories. RGB-D SLAM was the design
basis for HDPV-SLAM, which added depth information to visual features. It aims
to solve the two major issues hindering the performance of similar SLAM
systems. The first obstacle is the sparseness of LiDAR depth, which makes it
difficult to correlate it with the extracted visual features of the RGB image.
A deep learning-based depth estimation module for iteratively densifying sparse
LiDAR depth was suggested to address this issue. The second issue pertains to
the difficulties in depth association caused by a lack of horizontal overlap
between the panoramic camera and the tilted LiDAR sensor. To surmount this
difficulty, we present a hybrid depth association module that optimally
combines depth information estimated by two independent procedures,
feature-based triangulation and depth estimation. During a phase of feature
tracking, this hybrid depth association module aims to maximize the use of more
accurate depth information between the triangulated depth with visual features
tracked and the deep learning-based corrected depth. We evaluated the efficacy
of HDPV-SLAM using the 18.95 km-long York University and Teledyne Optech (YUTO)
MMS dataset. The experimental results demonstrate that the two proposed modules
contribute substantially to the performance of HDPV-SLAM, which surpasses that
of the state-of-the-art (SOTA) SLAM systems.Comment: 8 pages, 3 figures, To be published in IEEE International Conference
on Automation Science and Engineering (CASE) 202
PanoDepth - Panoramic Monocular Depth Perception Model and Framework
Depth perception has become a heavily researched area as companies and researchers are striving towards the development of self-driving cars. Self-driving cars rely on perceiving the surrounding area, which heavily depends on technology capable of providing the system with depth perception capabilities. In this paper, we explore developing a single camera (monocular) depth prediction model that is trained on panoramic depth images. Our model makes novel use of transfer learning efficient encoder models, pre-training on a larger dataset of flat depth images, and optimizing the model for use with a Jetson Nano. Additionally, we present a training and optimization framework to make developing and testing new monocular depth perception models easier and faster. While the model failed to achieve a high frame rate, the framework and models developed are a promising starting place for future work
Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video
We introduce a convolutional neural network model for unsupervised learning
of depth and ego-motion from cylindrical panoramic video. Panoramic depth
estimation is an important technology for applications such as virtual reality,
3D modeling, and autonomous robotic navigation. In contrast to previous
approaches for applying convolutional neural networks to panoramic imagery, we
use the cylindrical panoramic projection which allows for the use of the
traditional CNN layers such as convolutional filters and max pooling without
modification. Our evaluation of synthetic and real data shows that unsupervised
learning of depth and ego-motion on cylindrical panoramic images can produce
high-quality depth maps and that an increased field-of-view improves ego-motion
estimation accuracy. We also introduce Headcam, a novel dataset of panoramic
video collected from a helmet-mounted camera while biking in an urban setting.Comment: Accepted to IEEE AIVR 201
- …