In this paper, we introduce a deep encoder-decoder network, named SalsaNet,
for efficient semantic segmentation of 3D LiDAR point clouds. SalsaNet segments
the road, i.e. drivable free-space, and vehicles in the scene by employing the
Bird-Eye-View (BEV) image projection of the point cloud. To overcome the lack
of annotated point cloud data, in particular for the road segments, we
introduce an auto-labeling process which transfers automatically generated
labels from the camera to LiDAR. We also explore the role of imagelike
projection of LiDAR data in semantic segmentation by comparing BEV with
spherical-front-view projection and show that SalsaNet is projection-agnostic.
We perform quantitative and qualitative evaluations on the KITTI dataset, which
demonstrate that the proposed SalsaNet outperforms other state-of-the-art
semantic segmentation networks in terms of accuracy and computation time. Our
code and data are publicly available at
https://gitlab.com/aksoyeren/salsanet.git