State-of-the-art lidar panoptic segmentation (LPS) methods follow bottom-up
segmentation-centric fashion wherein they build upon semantic segmentation
networks by utilizing clustering to obtain object instances. In this paper, we
re-think this approach and propose a surprisingly simple yet effective
detection-centric network for both LPS and tracking. Our network is modular by
design and optimized for all aspects of both the panoptic segmentation and
tracking task. One of the core components of our network is the object instance
detection branch, which we train using point-level (modal) annotations, as
available in segmentation-centric datasets. In the absence of amodal (cuboid)
annotations, we regress modal centroids and object extent using
trajectory-level supervision that provides information about object size, which
cannot be inferred from single scans due to occlusions and the sparse nature of
the lidar data. We obtain fine-grained instance segments by learning to
associate lidar points with detected centroids. We evaluate our method on
several 3D/4D LPS benchmarks and observe that our model establishes a new
state-of-the-art among open-sourced models, outperforming recent query-based
models.Comment: IROS 2023. Code at https://github.com/abhinavagarwalla/most-lp