1,291 research outputs found
Panoptic Segmentation
We propose and study a task we name panoptic segmentation (PS). Panoptic
segmentation unifies the typically distinct tasks of semantic segmentation
(assign a class label to each pixel) and instance segmentation (detect and
segment each object instance). The proposed task requires generating a coherent
scene segmentation that is rich and complete, an important step toward
real-world vision systems. While early work in computer vision addressed
related image/scene parsing tasks, these are not currently popular, possibly
due to lack of appropriate metrics or associated recognition challenges. To
address this, we propose a novel panoptic quality (PQ) metric that captures
performance for all classes (stuff and things) in an interpretable and unified
manner. Using the proposed metric, we perform a rigorous study of both human
and machine performance for PS on three existing datasets, revealing
interesting insights about the task. The aim of our work is to revive the
interest of the community in a more unified view of image segmentation.Comment: accepted to CVPR 201
A SAM-based Solution for Hierarchical Panoptic Segmentation of Crops and Weeds Competition
Panoptic segmentation in agriculture is an advanced computer vision technique
that provides a comprehensive understanding of field composition. It
facilitates various tasks such as crop and weed segmentation, plant panoptic
segmentation, and leaf instance segmentation, all aimed at addressing
challenges in agriculture. Exploring the application of panoptic segmentation
in agriculture, the 8th Workshop on Computer Vision in Plant Phenotyping and
Agriculture (CVPPA) hosted the challenge of hierarchical panoptic segmentation
of crops and weeds using the PhenoBench dataset. To tackle the tasks presented
in this competition, we propose an approach that combines the effectiveness of
the Segment AnyThing Model (SAM) for instance segmentation with prompt input
from object detection models. Specifically, we integrated two notable
approaches in object detection, namely DINO and YOLO-v8. Our best-performing
model achieved a PQ+ score of 81.33 based on the evaluation metrics of the
competition.Comment: Technical report of NYCU-WEED team for the challenge of hierarchical
panoptic segmentation of crops and weeds using the PhenoBench dataset at the
8th Workshop on Computer Vision in Plant Phenotyping and Agriculture (CVPPA)
- International Conference on Computer Vision (ICCV) 202
PVO: Panoptic Visual Odometry
We present PVO, a novel panoptic visual odometry framework to achieve more
comprehensive modeling of the scene motion, geometry, and panoptic segmentation
information. Our PVO models visual odometry (VO) and video panoptic
segmentation (VPS) in a unified view, which makes the two tasks mutually
beneficial. Specifically, we introduce a panoptic update module into the VO
Module with the guidance of image panoptic segmentation. This Panoptic-Enhanced
VO Module can alleviate the impact of dynamic objects in the camera pose
estimation with a panoptic-aware dynamic mask. On the other hand, the
VO-Enhanced VPS Module also improves the segmentation accuracy by fusing the
panoptic segmentation result of the current frame on the fly to the adjacent
frames, using geometric information such as camera pose, depth, and optical
flow obtained from the VO Module. These two modules contribute to each other
through recurrent iterative optimization. Extensive experiments demonstrate
that PVO outperforms state-of-the-art methods in both visual odometry and video
panoptic segmentation tasks.Comment: CVPR2023 Project page: https://zju3dv.github.io/pvo/ code:
https://github.com/zju3dv/PV
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving
Precise situational awareness is required for the safe decision-making of
assisted and automated driving (AAD) functions. Panoptic segmentation is a
promising perception technique to identify and categorise objects, impending
hazards, and driveable space at a pixel level. While segmentation quality is
generally associated with the quality of the camera data, a comprehensive
understanding and modelling of this relationship are paramount for AAD system
designers. Motivated by such a need, this work proposes a unifying pipeline to
assess the robustness of panoptic segmentation models for AAD, correlating it
with traditional image quality. The first step of the proposed pipeline
involves generating degraded camera data that reflects real-world noise
factors. To this end, 19 noise factors have been identified and implemented
with 3 severity levels. Of these factors, this work proposes novel models for
unfavourable light and snow. After applying the degradation models, three
state-of-the-art CNN- and vision transformers (ViT)-based panoptic segmentation
networks are used to analyse their robustness. The variations of the
segmentation performance are then correlated to 8 selected image quality
metrics. This research reveals that: 1) certain specific noise factors produce
the highest impact on panoptic segmentation, i.e. droplets on lens and Gaussian
noise; 2) the ViT-based panoptic segmentation backbones show better robustness
to the considered noise factors; 3) some image quality metrics (i.e. LPIPS and
CW-SSIM) correlate strongly with panoptic segmentation performance and
therefore they can be used as predictive metrics for network performance
PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation
The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging
vision problem that aims to predict panoptic segmentation and depth in a video
simultaneously. The previous work solves this task by extending the existing
panoptic segmentation method with an extra dense depth prediction and instance
tracking head. However, the relationship between the depth and panoptic
segmentation is not well explored -- simply combining existing methods leads to
competition and needs carefully weight balancing. In this paper, we present
PolyphonicFormer, a vision transformer to unify these sub-tasks under the DVPS
task and lead to more robust results. Our principal insight is that the depth
can be harmonized with the panoptic segmentation with our proposed new paradigm
of predicting instance level depth maps with object queries. Then the
relationship between the two tasks via query-based learning is explored. From
the experiments, we demonstrate the benefits of our design from both depth
estimation and panoptic segmentation aspects. Since each thing query also
encodes the instance-wise information, it is natural to perform tracking
directly with appearance learning. Our method achieves state-of-the-art results
on two DVPS datasets (Semantic KITTI, Cityscapes), and ranks 1st on the
ICCV-2021 BMTT Challenge video + depth track. Code is available at
https://github.com/HarborYuan/PolyphonicFormer .Comment: Accepted by ECCV 202
PanDA: Panoptic Data Augmentation
The recently proposed panoptic segmentation task presents a significant challenge of image understanding with computer vision by unifying semantic segmentation and instance segmentation tasks. In this paper we present an efficient and novel panoptic data augmentation (PanDA) method which operates exclusively in pixel space, requires no additional data or training, and is computationally cheap to implement. By retraining original state-of-the-art models on PanDA augmented datasets generated with a single frozen set of parameters, we show robust performance gains in panoptic segmentation, instance segmentation, as well as detection across models, backbones, dataset domains, and scales. Finally, the effectiveness of unrealistic-looking training images synthesized by PanDA suggest that one should rethink the need for image realism for efficient data augmentation
- …