3,098 research outputs found
Exploring Transformers for Open-world Instance Segmentation
Open-world instance segmentation is a rising task, which aims to segment all
objects in the image by learning from a limited number of base-category
objects. This task is challenging, as the number of unseen categories could be
hundreds of times larger than that of seen categories. Recently, the DETR-like
models have been extensively studied in the closed world while stay unexplored
in the open world. In this paper, we utilize the Transformer for open-world
instance segmentation and present SWORD. Firstly, we introduce to attach the
stop-gradient operation before classification head and further add IoU heads
for discovering novel objects. We demonstrate that a simple stop-gradient
operation not only prevents the novel objects from being suppressed as
background, but also allows the network to enjoy the merit of heuristic label
assignment. Secondly, we propose a novel contrastive learning framework to
enlarge the representations between objects and background. Specifically, we
maintain a universal object queue to obtain the object center, and dynamically
select positive and negative samples from the object queries for contrastive
learning. While the previous works only focus on pursuing average recall and
neglect average precision, we show the prominence of SWORD by giving
consideration to both criteria. Our models achieve state-of-the-art performance
in various open-world cross-category and cross-dataset generalizations.
Particularly, in VOC to non-VOC setup, our method sets new state-of-the-art
results of 40.0% on ARb100 and 34.9% on ARm100. For COCO to UVO generalization,
SWORD significantly outperforms the previous best open-world model by 5.9% on
APm and 8.1% on ARm100.Comment: Accepted by ICCV2023. 16 page
- …