882 research outputs found
Lidar Point Cloud Guided Monocular 3D Object Detection
Monocular 3D object detection is a challenging task in the self-driving and
computer vision community. As a common practice, most previous works use
manually annotated 3D box labels, where the annotating process is expensive. In
this paper, we find that the precisely and carefully annotated labels may be
unnecessary in monocular 3D detection, which is an interesting and
counterintuitive finding. Using rough labels that are randomly disturbed, the
detector can achieve very close accuracy compared to the one using the
ground-truth labels. We delve into this underlying mechanism and then
empirically find that: concerning the label accuracy, the 3D location part in
the label is preferred compared to other parts of labels. Motivated by the
conclusions above and considering the precise LiDAR 3D measurement, we propose
a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D
object detection (LPCG). This framework is capable of either reducing the
annotation costs or considerably boosting the detection accuracy without
introducing extra annotation costs. Specifically, It generates pseudo labels
from unlabeled LiDAR point clouds. Thanks to accurate LiDAR 3D measurements in
3D space, such pseudo labels can replace manually annotated labels in the
training of monocular 3D detectors, since their 3D location information is
precise. LPCG can be applied into any monocular 3D detector to fully use
massive unlabeled data in a self-driving system. As a result, in KITTI
benchmark, we take the first place on both monocular 3D and BEV
(bird's-eye-view) detection with a significant margin. In Waymo benchmark, our
method using 10% labeled data achieves comparable accuracy to the baseline
detector using 100% labeled data. The codes are released at
https://github.com/SPengLiang/LPCG.Comment: ECCV 202
Monocular 3D Object Detection via Ego View-to-Bird’s Eye View Translation
The advanced development in autonomous agents like self-driving cars can be attributed to computer vision, a branch of artificial intelligence that enables software to understand the content of image and video. These autonomous agents require a three-dimensional modelling of its surrounding in order to operate reliably in the real-world. Despite the significant progress of 2D object detectors, they have a critical limitation in location sensitive applications as they do not provide accurate physical information of objects in 3D space. 3D object detection is a promising topic that can provide relevant solutions which could improve existing 2D based applications. Due to the advancements in deep learning methods and relevant datasets, the task of 3D scene understanding has evolved greatly in the past few years. 3D object detection and localization are crucial in autonomous driving tasks such as obstacle avoidance, path planning and motion control. Traditionally, there have been successful methods towards 3D object detection but they rely on highly expensive 3D LiDAR sensors for accurate depth information. On the other hand, 3D object detection from single monocular images is inexpensive but lacks in accuracy. The primary reason for such a disparity in performance is that the monocular image-based methods attempt at inferring 3D information from 2D images. In this work, we try to bridge the performance gap observed in single image input by introducing different mapping strategies between the 2D image data and its corresponding 3D representation and use it to perform object detection in 3D. The performance of the proposed method is evaluated on the popular KITTI 3D object detection benchmark dataset
Self-supervised 3D Object Detection from Monocular Pseudo-LiDAR
There have been attempts to detect 3D objects by fusion of stereo camera
images and LiDAR sensor data or using LiDAR for pre-training and only monocular
images for testing, but there have been less attempts to use only monocular
image sequences due to low accuracy. In addition, when depth prediction using
only monocular images, only scale-inconsistent depth can be predicted, which is
the reason why researchers are reluctant to use monocular images alone.
Therefore, we propose a method for predicting absolute depth and detecting 3D
objects using only monocular image sequences by enabling end-to-end learning of
detection networks and depth prediction networks. As a result, the proposed
method surpasses other existing methods in performance on the KITTI 3D dataset.
Even when monocular image and 3D LiDAR are used together during training in an
attempt to improve performance, ours exhibit is the best performance compared
to other methods using the same input. In addition, end-to-end learning not
only improves depth prediction performance, but also enables absolute depth
prediction, because our network utilizes the fact that the size of a 3D object
such as a car is determined by the approximate size.Comment: Accepted for the 2022 IEEE International Conference on Multisensor
Fusion and Integration (MFI 2022
- …