In this work, we introduce panoramic panoptic segmentation, as the most
holistic scene understanding, both in terms of Field of View (FoV) and
image-level understanding for standard camera-based input. A complete
surrounding understanding provides a maximum of information to a mobile agent.
This is essential information for any intelligent vehicle to make informed
decisions in a safety-critical dynamic environment such as real-world traffic.
In order to overcome the lack of annotated panoramic images, we propose a
framework which allows model training on standard pinhole images and transfers
the learned features to the panoramic domain in a cost-minimizing way. The
domain shift from pinhole to panoramic images is non-trivial as large objects
and surfaces are heavily distorted close to the image border regions and look
different across the two domains. Using our proposed method with dense
contrastive learning, we manage to achieve significant improvements over a
non-adapted approach. Depending on the efficient panoptic segmentation
architecture, we can improve 3.5-6.5% measured in Panoptic Quality (PQ) over
non-adapted models on our established Wild Panoramic Panoptic Segmentation
(WildPPS) dataset. Furthermore, our efficient framework does not need access to
the images of the target domain, making it a feasible domain generalization
approach suitable for a limited hardware setting. As additional contributions,
we publish WildPPS: The first panoramic panoptic image dataset to foster
progress in surrounding perception and explore a novel training procedure
combining supervised and contrastive training.Comment: Accepted to IEEE Transactions on Intelligent Transportation Systems
(T-ITS). Extended version of arXiv:2103.00868. The project is at
https://github.com/alexanderjaus/PP