4,166 research outputs found
Global Context-Aware Progressive Aggregation Network for Salient Object Detection
Deep convolutional neural networks have achieved competitive performance in
salient object detection, in which how to learn effective and comprehensive
features plays a critical role. Most of the previous works mainly adopted
multiple level feature integration yet ignored the gap between different
features. Besides, there also exists a dilution process of high-level features
as they passed on the top-down pathway. To remedy these issues, we propose a
novel network named GCPANet to effectively integrate low-level appearance
features, high-level semantic features, and global context features through
some progressive context-aware Feature Interweaved Aggregation (FIA) modules
and generate the saliency map in a supervised way. Moreover, a Head Attention
(HA) module is used to reduce information redundancy and enhance the top layers
features by leveraging the spatial and channel-wise attention, and the Self
Refinement (SR) module is utilized to further refine and heighten the input
features. Furthermore, we design the Global Context Flow (GCF) module to
generate the global context information at different stages, which aims to
learn the relationship among different salient regions and alleviate the
dilution effect of high-level features. Experimental results on six benchmark
datasets demonstrate that the proposed approach outperforms the
state-of-the-art methods both quantitatively and qualitatively
Salient Object Detection Techniques in Computer Vision-A Survey.
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end
Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
Vision transformers have recently shown strong global context modeling
capabilities in camouflaged object detection. However, they suffer from two
major limitations: less effective locality modeling and insufficient feature
aggregation in decoders, which are not conducive to camouflaged object
detection that explores subtle cues from indistinguishable backgrounds. To
address these issues, in this paper, we propose a novel transformer-based
Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode
locality-enhanced neighboring transformer features through progressive
shrinking for camouflaged object detection. Specifically, we propose a nonlocal
token enhancement module (NL-TEM) that employs the non-local mechanism to
interact neighboring tokens and explore graph-based high-order relations within
tokens to enhance local representations of transformers. Moreover, we design a
feature shrinkage decoder (FSD) with adjacent interaction modules (AIM), which
progressively aggregates adjacent transformer features through a layer-bylayer
shrinkage pyramid to accumulate imperceptible but effective cues as much as
possible for object information decoding. Extensive quantitative and
qualitative experiments demonstrate that the proposed model significantly
outperforms the existing 24 competitors on three challenging COD benchmark
datasets under six widely-used evaluation metrics. Our code is publicly
available at https://github.com/ZhouHuang23/FSPNet.Comment: CVPR 2023. Project webpage at:
https://tzxiang.github.io/project/COD-FSPNet/index.htm
CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient Object Detection
Focusing on the issue of how to effectively capture and utilize
cross-modality information in RGB-D salient object detection (SOD) task, we
present a convolutional neural network (CNN) model, named CIR-Net, based on the
novel cross-modality interaction and refinement. For the cross-modality
interaction, 1) a progressive attention guided integration unit is proposed to
sufficiently integrate RGB-D feature representations in the encoder stage, and
2) a convergence aggregation structure is proposed, which flows the RGB and
depth decoding features into the corresponding RGB-D decoding streams via an
importance gated fusion unit in the decoder stage. For the cross-modality
refinement, we insert a refinement middleware structure between the encoder and
the decoder, in which the RGB, depth, and RGB-D encoder features are further
refined by successively using a self-modality attention refinement unit and a
cross-modality weighting refinement unit. At last, with the gradually refined
features, we predict the saliency map in the decoder stage. Extensive
experiments on six popular RGB-D SOD benchmarks demonstrate that our network
outperforms the state-of-the-art saliency detectors both qualitatively and
quantitatively.Comment: Accepted by IEEE Transactions on Image Processing 2022, 16 pages, 11
figure
Recurrent Attentional Networks for Saliency Detection
Convolutional-deconvolution networks can be adopted to perform end-to-end
saliency detection. But, they do not work well with objects of multiple scales.
To overcome such a limitation, in this work, we propose a recurrent attentional
convolutional-deconvolution network (RACDNN). Using spatial transformer and
recurrent network units, RACDNN is able to iteratively attend to selected image
sub-regions to perform saliency refinement progressively. Besides tackling the
scale problem, RACDNN can also learn context-aware features from past
iterations to enhance saliency refinement in future iterations. Experiments on
several challenging saliency detection datasets validate the effectiveness of
RACDNN, and show that RACDNN outperforms state-of-the-art saliency detection
methods.Comment: CVPR 201
- …