Due to its robust and precise distance measurements, LiDAR plays an important
role in scene understanding for autonomous driving. Training deep neural
networks (DNNs) on LiDAR data requires large-scale point-wise annotations,
which are time-consuming and expensive to obtain. Instead, simulation-to-real
domain adaptation (SRDA) trains a DNN using unlimited synthetic data with
automatically generated labels and transfers the learned model to real
scenarios. Existing SRDA methods for LiDAR point cloud segmentation mainly
employ a multi-stage pipeline and focus on feature-level alignment. They
require prior knowledge of real-world statistics and ignore the pixel-level
dropout noise gap and the spatial feature gap between different domains. In
this paper, we propose a novel end-to-end framework, named ePointDA, to address
the above issues. Specifically, ePointDA consists of three modules:
self-supervised dropout noise rendering, statistics-invariant and
spatially-adaptive feature alignment, and transferable segmentation learning.
The joint optimization enables ePointDA to bridge the domain shift at the
pixel-level by explicitly rendering dropout noise for synthetic LiDAR and at
the feature-level by spatially aligning the features between different domains,
without requiring the real-world statistics. Extensive experiments adapting
from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrate the
superiority of ePointDA for LiDAR point cloud segmentation.Comment: Accepted by AAAI 202