18 research outputs found
Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection
Salient Object Detection (SOD) aims to identify and segment the most
conspicuous objects in an image or video. As an important pre-processing step,
it has many potential applications in multimedia and vision tasks. With the
advance of imaging devices, SOD with high-resolution images is of great demand,
recently. However, traditional SOD methods are largely limited to
low-resolution images, making them difficult to adapt to the development of
High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no
large enough datasets for training and evaluating. Besides, current HRSOD
methods generally produce incomplete object regions and irregular object
boundaries. To address above issues, in this work, we first propose a new
HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K
resolution. As far as we know, it is the largest dataset for the HRSOD task,
which will significantly help future works in training and evaluating models.
Furthermore, to improve the HRSOD performance, we propose a novel Recurrent
Multi-scale Transformer (RMFormer), which recurrently utilizes shared
Transformers and multi-scale refinement architectures. Thus, high-resolution
saliency maps can be generated with the guidance of lower-resolution
predictions. Extensive experiments on both high-resolution and low-resolution
benchmarks show the effectiveness and superiority of the proposed framework.
The source code and dataset are released at:
https://github.com/DrowsyMon/RMFormer.Comment: This work is accepted by ACM MM2023. More modifications may be
performed for further improvement
Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton
In this paper, we solve three low-level pixel-wise vision problems, including
salient object segmentation, edge detection, and skeleton extraction, within a
unified framework. We first show some similarities shared by these tasks and
then demonstrate how they can be leveraged for developing a unified framework
that can be trained end-to-end. In particular, we introduce a selective
integration module that allows each task to dynamically choose features at
different levels from the shared backbone based on its own characteristics.
Furthermore, we design a task-adaptive attention module, aiming at
intelligently allocating information for different tasks according to the image
content priors. To evaluate the performance of our proposed network on these
tasks, we conduct exhaustive experiments on multiple representative datasets.
We will show that though these tasks are naturally quite different, our network
can work well on all of them and even perform better than current
single-purpose state-of-the-art methods. In addition, we also conduct adequate
ablation analyses that provide a full understanding of the design principles of
the proposed framework. To facilitate future research, source code will be
released
Video Object Segmentation using Point-based Memory Network
Recent years have witnessed the prevalence of memory-based methods for Semi-supervised Video Object Segmentation (SVOS) which utilise past frames efficiently for label propagation. When conducting feature matching, fine-grained multi-scale feature matching has typically been performed using all query points, which inevitably results in redundant computations and thus makes the fusion of multi-scale results ineffective. In this paper, we develop a new Point-based Memory Network, termed as PMNet, to perform fine-grained feature matching on hard samples only, assuming that easy samples can already obtain satisfactory matching results without the need for complicated multi-scale feature matching. Our approach first generates an uncertainty map from the initial decoding outputs. Next, the fine-grained features at uncertain locations are sampled to match the memory features on the same scale. Finally, the matching results are further decoded to provide a refined output. The point-based scheme works with the coarsest feature matching in a complementary and efficient manner. Furthermore, we propose an approach to adaptively perform global or regional matching based on the motion history of memory points, making our method more robust against ambiguous backgrounds. Experimental results on several benchmark datasets demonstrate the superiority of our proposed method over state-of-the-art methods
Salient Object Detection via Integrity Learning
Albeit current salient object detection (SOD) works have achieved fantastic
progress, they are cast into the shade when it comes to the integrity of the
predicted salient regions. We define the concept of integrity at both the micro
and macro level. Specifically, at the micro level, the model should highlight
all parts that belong to a certain salient object, while at the macro level,
the model needs to discover all salient objects from the given image scene. To
facilitate integrity learning for salient object detection, we design a novel
Integrity Cognition Network (ICON), which explores three important components
to learn strong integrity features. 1) Unlike the existing models that focus
more on feature discriminability, we introduce a diverse feature aggregation
(DFA) component to aggregate features with various receptive fields (i.e.,,
kernel shape and context) and increase the feature diversity. Such diversity is
the foundation for mining the integral salient objects. 2) Based on the DFA
features, we introduce the integrity channel enhancement (ICE) component with
the goal of enhancing feature channels that highlight the integral salient
objects at the macro level, while suppressing the other distracting ones. 3)
After extracting the enhanced features, the part-whole verification (PWV)
method is employed to determine whether the part and whole object features have
strong agreement. Such part-whole agreements can further improve the
micro-level integrity for each salient object. To demonstrate the effectiveness
of ICON, comprehensive experiments are conducted on seven challenging
benchmarks, where promising results are achieved