Vision transformers have recently shown strong global context modeling
capabilities in camouflaged object detection. However, they suffer from two
major limitations: less effective locality modeling and insufficient feature
aggregation in decoders, which are not conducive to camouflaged object
detection that explores subtle cues from indistinguishable backgrounds. To
address these issues, in this paper, we propose a novel transformer-based
Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode
locality-enhanced neighboring transformer features through progressive
shrinking for camouflaged object detection. Specifically, we propose a nonlocal
token enhancement module (NL-TEM) that employs the non-local mechanism to
interact neighboring tokens and explore graph-based high-order relations within
tokens to enhance local representations of transformers. Moreover, we design a
feature shrinkage decoder (FSD) with adjacent interaction modules (AIM), which
progressively aggregates adjacent transformer features through a layer-bylayer
shrinkage pyramid to accumulate imperceptible but effective cues as much as
possible for object information decoding. Extensive quantitative and
qualitative experiments demonstrate that the proposed model significantly
outperforms the existing 24 competitors on three challenging COD benchmark
datasets under six widely-used evaluation metrics. Our code is publicly
available at https://github.com/ZhouHuang23/FSPNet.Comment: CVPR 2023. Project webpage at:
https://tzxiang.github.io/project/COD-FSPNet/index.htm