Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

Ma, Chaofan; Wang, Yanfeng; Xie, Weidi; Yang, Yuhuan; Zhang, Ya

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

Authors: Chaofan Ma
Yanfeng Wang
Weidi Xie
Yuhuan Yang
Ya Zhang
Publication date: 26 October 2022
Publisher

Abstract

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks. In this paper, we investigate simple, yet effective approaches for adapting the pre-trained foundation models to the downstream task of interest, namely, open-vocabulary semantic segmentation. To this end, we make the following contributions: (i) we introduce Fusioner, with a lightweight, transformer-based fusion module, that pairs the frozen visual representation with language concept through a handful of image segmentation data. As a consequence, the model gains the capability of zero-shot transfer to segment novel categories; (ii) without loss of generality, we experiment on a broad range of self-supervised models that have been pre-trained with different schemes, e.g. visual-only models (MoCo v3, DINO), language-only models (BERT), visual-language model (CLIP), and show that, the proposed fusion approach is effective to any pair of visual and language models, even those pre-trained on a corpus of uni-modal data; (iii) we conduct thorough ablation studies to analyze the critical components in our proposed Fusioner, while evaluating on standard benchmarks, e.g. PASCAL-5i and COCO-20i , it surpasses existing state-of-the-art models by a large margin, despite only being trained on frozen visual and language features; (iv) to measure the model's robustness on learning visual-language correspondence, we further evaluate on synthetic dataset, named Mosaic-4, where images are constructed by mosaicking the samples from FSS-1000. Fusioner demonstrates superior performance over previous models.Comment: BMVC 2022 Ora

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2210.15138

Last time updated on 06/12/2022