Large Model Based Referring Camouflaged Object Detection

Cheng, Shupeng; Fan, Deng-Ping; Ji, Ge-Peng; Qin, Pengda; Xu, Peng; Zhou, Bowen

Large Model Based Referring Camouflaged Object Detection

Authors: Shupeng Cheng
Deng-Ping Fan
Ge-Peng Ji
Pengda Qin
Peng Xu
Bowen Zhou
Publication date: 28 November 2023
Publisher

Abstract

Referring camouflaged object detection (Ref-COD) is a recently-proposed problem aiming to segment out specified camouflaged objects matched with a textual or visual reference. This task involves two major challenges: the COD domain-specific perception and multimodal reference-image alignment. Our motivation is to make full use of the semantic intelligence and intrinsic knowledge of recent Multimodal Large Language Models (MLLMs) to decompose this complex task in a human-like way. As language is highly condensed and inductive, linguistic expression is the main media of human knowledge learning, and the transmission of knowledge information follows a multi-level progression from simplicity to complexity. In this paper, we propose a large-model-based Multi-Level Knowledge-Guided multimodal method for Ref-COD termed MLKG, where multi-level knowledge descriptions from MLLM are organized to guide the large vision model of segmentation to perceive the camouflage-targets and camouflage-scene progressively and meanwhile deeply align the textual references with camouflaged photos. To our knowledge, our contributions mainly include: (1) This is the first time that the MLLM knowledge is studied for Ref-COD and COD. (2) We, for the first time, propose decomposing Ref-COD into two main perspectives of perceiving the target and scene by integrating MLLM knowledge, and contribute a multi-level knowledge-guided method. (3) Our method achieves the state-of-the-art on the Ref-COD benchmark outperforming numerous strong competitors. Moreover, thanks to the injected rich knowledge, it demonstrates zero-shot generalization ability on uni-modal COD datasets. We will release our code soon

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.17122

Last time updated on 10/05/2024