2,250 research outputs found
Unifying Training and Inference for Panoptic Segmentation
We present an end-to-end network to bridge the gap between training and
inference pipeline for panoptic segmentation, a task that seeks to partition an
image into semantic regions for "stuff" and object instances for "things". In
contrast to recent works, our network exploits a parametrised, yet lightweight
panoptic segmentation submodule, powered by an end-to-end learnt dense instance
affinity, to capture the probability that any pair of pixels belong to the same
instance. This panoptic submodule gives rise to a novel propagation mechanism
for panoptic logits and enables the network to output a coherent panoptic
segmentation map for both "stuff" and "thing" classes, without any
post-processing. Reaping the benefits of end-to-end training, our full system
sets new records on the popular street scene dataset, Cityscapes, achieving
61.4 PQ with a ResNet-50 backbone using only the fine annotations. On the
challenging COCO dataset, our ResNet-50-based network also delivers
state-of-the-art accuracy of 43.4 PQ. Moreover, our network flexibly works with
and without object mask cues, performing competitively under both settings,
which is of interest for applications with computation budgets.Comment: CVPR 202
Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network
We present a single network method for panoptic segmentation. This method
combines the predictions from a jointly trained semantic and instance
segmentation network using heuristics. Joint training is the first step towards
an end-to-end panoptic segmentation network and is faster and more memory
efficient than training and predicting with two networks, as done in previous
work. The architecture consists of a ResNet-50 feature extractor shared by the
semantic segmentation and instance segmentation branch. For instance
segmentation, a Mask R-CNN type of architecture is used, while the semantic
segmentation branch is augmented with a Pyramid Pooling Module. Results for
this method are submitted to the COCO and Mapillary Joint Recognition Challenge
2018. Our approach achieves a PQ score of 17.6 on the Mapillary Vistas
validation set and 27.2 on the COCO test-dev set.Comment: Technical repor
ConsInstancy: learning instance representations for semi-supervised panoptic segmentation of concrete aggregate particles
We present a semi-supervised method for panoptic segmentation based on ConsInstancy regularisation, a novel strategy for semi-supervised learning. It leverages completely unlabelled data by enforcing consistency between predicted instance representations and semantic segmentations during training in order to improve the segmentation performance. To this end, we also propose new types of instance representations that can be predicted by one simple forward path through a fully convolutional network (FCN), delivering a convenient and simple-to-train framework for panoptic segmentation. More specifically, we propose the prediction of a three-dimensional instance orientation map as intermediate representation and two complementary distance transform maps as final representation, providing unique instance representations for a panoptic segmentation. We test our method on two challenging data sets of both, hardened and fresh concrete, the latter being proposed by the authors in this paper demonstrating the effectiveness of our approach, outperforming the results achieved by state-of-the-art methods for semi-supervised segmentation. In particular, we are able to show that by leveraging completely unlabelled data in our semi-supervised approach the achieved overall accuracy (OA) is increased by up to 5% compared to an entirely supervised training using only labelled data. Furthermore, we exceed the OA achieved by state-of-the-art semi-supervised methods by up to 1.5%
LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment
3D panoptic segmentation is a challenging perception task that requires both
semantic segmentation and instance segmentation. In this task, we notice that
images could provide rich texture, color, and discriminative information, which
can complement LiDAR data for evident performance improvement, but their fusion
remains a challenging problem. To this end, we propose LCPS, the first
LiDAR-Camera Panoptic Segmentation network. In our approach, we conduct
LiDAR-Camera fusion in three stages: 1) an Asynchronous Compensation Pixel
Alignment (ACPA) module that calibrates the coordinate misalignment caused by
asynchronous problems between sensors; 2) a Semantic-Aware Region Alignment
(SARA) module that extends the one-to-one point-pixel mapping to one-to-many
semantic relations; 3) a Point-to-Voxel feature Propagation (PVP) module that
integrates both geometric and semantic fusion information for the entire point
cloud. Our fusion strategy improves about 6.9% PQ performance over the
LiDAR-only baseline on NuScenes dataset. Extensive quantitative and qualitative
experiments further demonstrate the effectiveness of our novel framework. The
code will be released at https://github.com/zhangzw12319/lcps.git.Comment: Accepted as ICCV 2023 pape
- …