Accurate moving object segmentation is an essential task for autonomous
driving. It can provide effective information for many downstream tasks, such
as collision avoidance, path planning, and static map construction. How to
effectively exploit the spatial-temporal information is a critical question for
3D LiDAR moving object segmentation (LiDAR-MOS). In this work, we propose a
novel deep neural network exploiting both spatial-temporal information and
different representation modalities of LiDAR scans to improve LiDAR-MOS
performance. Specifically, we first use a range image-based dual-branch
structure to separately deal with spatial and temporal information that can be
obtained from sequential LiDAR scans, and later combine them using
motion-guided attention modules. We also use a point refinement module via 3D
sparse convolution to fuse the information from both LiDAR range image and
point cloud representations and reduce the artifacts on the borders of the
objects. We verify the effectiveness of our proposed approach on the LiDAR-MOS
benchmark of SemanticKITTI. Our method outperforms the state-of-the-art methods
significantly in terms of LiDAR-MOS IoU. Benefiting from the devised
coarse-to-fine architecture, our method operates online at sensor frame rate.
The implementation of our method is available as open source at:
https://github.com/haomo-ai/MotionSeg3D.Comment: Accepted by IROS2022. Code: https://github.com/haomo-ai/MotionSeg3