173 research outputs found
PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection
Contexts play an important role in the saliency detection task. However,
given a context region, not all contextual information is helpful for the final
task. In this paper, we propose a novel pixel-wise contextual attention
network, i.e., the PiCANet, to learn to selectively attend to informative
context locations for each pixel. Specifically, for each pixel, it can generate
an attention map in which each attention weight corresponds to the contextual
relevance at each context location. An attended contextual feature can then be
constructed by selectively aggregating the contextual information. We formulate
the proposed PiCANet in both global and local forms to attend to global and
local contexts, respectively. Both models are fully differentiable and can be
embedded into CNNs for joint training. We also incorporate the proposed models
with the U-Net architecture to detect salient objects. Extensive experiments
show that the proposed PiCANets can consistently improve saliency detection
performance. The global and local PiCANets facilitate learning global contrast
and homogeneousness, respectively. As a result, our saliency model can detect
salient objects more accurately and uniformly, thus performing favorably
against the state-of-the-art methods
Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection
Salient Object Detection (SOD) aims to identify and segment the most
conspicuous objects in an image or video. As an important pre-processing step,
it has many potential applications in multimedia and vision tasks. With the
advance of imaging devices, SOD with high-resolution images is of great demand,
recently. However, traditional SOD methods are largely limited to
low-resolution images, making them difficult to adapt to the development of
High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no
large enough datasets for training and evaluating. Besides, current HRSOD
methods generally produce incomplete object regions and irregular object
boundaries. To address above issues, in this work, we first propose a new
HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K
resolution. As far as we know, it is the largest dataset for the HRSOD task,
which will significantly help future works in training and evaluating models.
Furthermore, to improve the HRSOD performance, we propose a novel Recurrent
Multi-scale Transformer (RMFormer), which recurrently utilizes shared
Transformers and multi-scale refinement architectures. Thus, high-resolution
saliency maps can be generated with the guidance of lower-resolution
predictions. Extensive experiments on both high-resolution and low-resolution
benchmarks show the effectiveness and superiority of the proposed framework.
The source code and dataset are released at:
https://github.com/DrowsyMon/RMFormer.Comment: This work is accepted by ACM MM2023. More modifications may be
performed for further improvement
- …