This paper presents novel hybrid architectures that combine grid- and
point-based processing to improve the detection performance and orientation
estimation of radar-based object detection networks. Purely grid-based
detection models operate on a bird's-eye-view (BEV) projection of the input
point cloud. These approaches suffer from a loss of detailed information
through the discrete grid resolution. This applies in particular to radar
object detection, where relatively coarse grid resolutions are commonly used to
account for the sparsity of radar point clouds. In contrast, point-based models
are not affected by this problem as they process point clouds without
discretization. However, they generally exhibit worse detection performances
than grid-based methods.
We show that a point-based model can extract neighborhood features,
leveraging the exact relative positions of points, before grid rendering. This
has significant benefits for a subsequent grid-based convolutional detection
backbone. In experiments on the public nuScenes dataset our hybrid architecture
achieves improvements in terms of detection performance (19.7% higher mAP for
car class than next-best radar-only submission) and orientation estimates
(11.5% relative orientation improvement) over networks from previous
literature.Comment: (c) 2022 IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future
media, including reprinting/republishing this material for advertising or
promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work