269 research outputs found
PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection
Contexts play an important role in the saliency detection task. However,
given a context region, not all contextual information is helpful for the final
task. In this paper, we propose a novel pixel-wise contextual attention
network, i.e., the PiCANet, to learn to selectively attend to informative
context locations for each pixel. Specifically, for each pixel, it can generate
an attention map in which each attention weight corresponds to the contextual
relevance at each context location. An attended contextual feature can then be
constructed by selectively aggregating the contextual information. We formulate
the proposed PiCANet in both global and local forms to attend to global and
local contexts, respectively. Both models are fully differentiable and can be
embedded into CNNs for joint training. We also incorporate the proposed models
with the U-Net architecture to detect salient objects. Extensive experiments
show that the proposed PiCANets can consistently improve saliency detection
performance. The global and local PiCANets facilitate learning global contrast
and homogeneousness, respectively. As a result, our saliency model can detect
salient objects more accurately and uniformly, thus performing favorably
against the state-of-the-art methods
Global Context-Aware Progressive Aggregation Network for Salient Object Detection
Deep convolutional neural networks have achieved competitive performance in
salient object detection, in which how to learn effective and comprehensive
features plays a critical role. Most of the previous works mainly adopted
multiple level feature integration yet ignored the gap between different
features. Besides, there also exists a dilution process of high-level features
as they passed on the top-down pathway. To remedy these issues, we propose a
novel network named GCPANet to effectively integrate low-level appearance
features, high-level semantic features, and global context features through
some progressive context-aware Feature Interweaved Aggregation (FIA) modules
and generate the saliency map in a supervised way. Moreover, a Head Attention
(HA) module is used to reduce information redundancy and enhance the top layers
features by leveraging the spatial and channel-wise attention, and the Self
Refinement (SR) module is utilized to further refine and heighten the input
features. Furthermore, we design the Global Context Flow (GCF) module to
generate the global context information at different stages, which aims to
learn the relationship among different salient regions and alleviate the
dilution effect of high-level features. Experimental results on six benchmark
datasets demonstrate that the proposed approach outperforms the
state-of-the-art methods both quantitatively and qualitatively
Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection
Salient Object Detection (SOD) aims to identify and segment the most
conspicuous objects in an image or video. As an important pre-processing step,
it has many potential applications in multimedia and vision tasks. With the
advance of imaging devices, SOD with high-resolution images is of great demand,
recently. However, traditional SOD methods are largely limited to
low-resolution images, making them difficult to adapt to the development of
High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no
large enough datasets for training and evaluating. Besides, current HRSOD
methods generally produce incomplete object regions and irregular object
boundaries. To address above issues, in this work, we first propose a new
HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K
resolution. As far as we know, it is the largest dataset for the HRSOD task,
which will significantly help future works in training and evaluating models.
Furthermore, to improve the HRSOD performance, we propose a novel Recurrent
Multi-scale Transformer (RMFormer), which recurrently utilizes shared
Transformers and multi-scale refinement architectures. Thus, high-resolution
saliency maps can be generated with the guidance of lower-resolution
predictions. Extensive experiments on both high-resolution and low-resolution
benchmarks show the effectiveness and superiority of the proposed framework.
The source code and dataset are released at:
https://github.com/DrowsyMon/RMFormer.Comment: This work is accepted by ACM MM2023. More modifications may be
performed for further improvement
- …