33 research outputs found
SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection
LiDAR and camera are two essential sensors for 3D object detection in
autonomous driving. LiDAR provides accurate and reliable 3D geometry
information while the camera provides rich texture with color. Despite the
increasing popularity of fusing these two complementary sensors, the challenge
remains in how to effectively fuse 3D LiDAR point cloud with 2D camera images.
Recent methods focus on point-level fusion which paints the LiDAR point cloud
with camera features in the perspective view or bird's-eye view (BEV)-level
fusion which unifies multi-modality features in the BEV representation. In this
paper, we rethink these previous fusion strategies and analyze their
information loss and influences on geometric and semantic features. We present
SemanticBEVFusion to deeply fuse camera features with LiDAR features in a
unified BEV representation while maintaining per-modality strengths for 3D
object detection. Our method achieves state-of-the-art performance on the
large-scale nuScenes dataset, especially for challenging distant objects. The
code will be made publicly available.Comment: The first two authors contributed equally to this wor
Fully Sparse 3D Object Detection
As the perception range of LiDAR increases, LiDAR-based 3D object detection
becomes a dominant task in the long-range perception task of autonomous
driving. The mainstream 3D object detectors usually build dense feature maps in
the network backbone and prediction head. However, the computational and
spatial costs on the dense feature map are quadratic to the perception range,
which makes them hardly scale up to the long-range setting. To enable efficient
long-range LiDAR-based object detection, we build a fully sparse 3D object
detector (FSD). The computational and spatial cost of FSD is roughly linear to
the number of points and independent of the perception range. FSD is built upon
the general sparse voxel encoder and a novel sparse instance recognition (SIR)
module. SIR first groups the points into instances and then applies
instance-wise feature extraction and prediction. In this way, SIR resolves the
issue of center feature missing, which hinders the design of the fully sparse
architecture for all center-based or anchor-based detectors. Moreover, SIR
avoids the time-consuming neighbor queries in previous point-based methods by
grouping points into instances. We conduct extensive experiments on the
large-scale Waymo Open Dataset to reveal the working mechanism of FSD, and
state-of-the-art performance is reported. To demonstrate the superiority of FSD
in long-range detection, we also conduct experiments on Argoverse 2 Dataset,
which has a much larger perception range () than Waymo Open Dataset
(). On such a large perception range, FSD achieves state-of-the-art
performance and is 2.4 faster than the dense counterpart. Codes will be
released at https://github.com/TuSimple/SST.Comment: NeurIPS 202