Few-shot semantic segmentation (FSS) aims to form class-agnostic models
segmenting unseen classes with only a handful of annotations. Previous methods
limited to the semantic feature and prototype representation suffer from coarse
segmentation granularity and train-set overfitting. In this work, we design
Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support
correlation based on the transformer architecture. The self-attention modules
are used to assist in establishing hierarchical dense features, as a means to
accomplish the cascade matching between query and support features. Moreover,
we propose a matching module to reduce train-set overfitting and introduce
correlation distillation leveraging semantic correspondence from coarse
resolution to boost fine-grained segmentation. Our method performs decently in
experiments. We achieve 50.0% mIoU on COCO dataset one-shot setting and 56.0%
on five-shot segmentation, respectively. The code will be available on the
project website. We hope our work can benefit broader industrial applications
where novel classes with limited annotations are required to be decently
identified.Comment: Accepted to CVPR 2023 VISION Workshop, Oral. The extended abstract of
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation. arXiv
admin note: substantial text overlap with arXiv:2303.1465