In this paper, we propose a novel training strategy called SupFusion, which
provides an auxiliary feature level supervision for effective LiDAR-Camera
fusion and significantly boosts detection performance. Our strategy involves a
data enhancement method named Polar Sampling, which densifies sparse objects
and trains an assistant model to generate high-quality features as the
supervision. These features are then used to train the LiDAR-Camera fusion
model, where the fusion feature is optimized to simulate the generated
high-quality features. Furthermore, we propose a simple yet effective deep
fusion module, which contiguously gains superior performance compared with
previous fusion methods with SupFusion strategy. In such a manner, our proposal
shares the following advantages. Firstly, SupFusion introduces auxiliary
feature-level supervision which could boost LiDAR-Camera detection performance
without introducing extra inference costs. Secondly, the proposed deep fusion
could continuously improve the detector's abilities. Our proposed SupFusion and
deep fusion module is plug-and-play, we make extensive experiments to
demonstrate its effectiveness. Specifically, we gain around 2% 3D mAP
improvements on KITTI benchmark based on multiple LiDAR-Camera 3D detectors.Comment: Accepted to ICCV202