Monocular 3D object detection is a challenging task in the self-driving and
computer vision community. As a common practice, most previous works use
manually annotated 3D box labels, where the annotating process is expensive. In
this paper, we find that the precisely and carefully annotated labels may be
unnecessary in monocular 3D detection, which is an interesting and
counterintuitive finding. Using rough labels that are randomly disturbed, the
detector can achieve very close accuracy compared to the one using the
ground-truth labels. We delve into this underlying mechanism and then
empirically find that: concerning the label accuracy, the 3D location part in
the label is preferred compared to other parts of labels. Motivated by the
conclusions above and considering the precise LiDAR 3D measurement, we propose
a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D
object detection (LPCG). This framework is capable of either reducing the
annotation costs or considerably boosting the detection accuracy without
introducing extra annotation costs. Specifically, It generates pseudo labels
from unlabeled LiDAR point clouds. Thanks to accurate LiDAR 3D measurements in
3D space, such pseudo labels can replace manually annotated labels in the
training of monocular 3D detectors, since their 3D location information is
precise. LPCG can be applied into any monocular 3D detector to fully use
massive unlabeled data in a self-driving system. As a result, in KITTI
benchmark, we take the first place on both monocular 3D and BEV
(bird's-eye-view) detection with a significant margin. In Waymo benchmark, our
method using 10% labeled data achieves comparable accuracy to the baseline
detector using 100% labeled data. The codes are released at
https://github.com/SPengLiang/LPCG.Comment: ECCV 202