10 research outputs found
Res2Net: A New Multi-scale Backbone Architecture
Representing features at multiple scales is of great importance for numerous
vision tasks. Recent advances in backbone convolutional neural networks (CNNs)
continually demonstrate stronger multi-scale representation ability, leading to
consistent performance gains on a wide range of applications. However, most
existing methods represent the multi-scale features in a layer-wise manner. In
this paper, we propose a novel building block for CNNs, namely Res2Net, by
constructing hierarchical residual-like connections within one single residual
block. The Res2Net represents multi-scale features at a granular level and
increases the range of receptive fields for each network layer. The proposed
Res2Net block can be plugged into the state-of-the-art backbone CNN models,
e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these
models and demonstrate consistent performance gains over baseline models on
widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies
and experimental results on representative computer vision tasks, i.e., object
detection, class activation mapping, and salient object detection, further
verify the superiority of the Res2Net over the state-of-the-art baseline
methods. The source code and trained models are available on
https://mmcheng.net/res2net/.Comment: 11 pages, 7 figure
CoSformer: Detecting Co-Salient Object with Transformers
Co-Salient Object Detection (CoSOD) aims at simulating the human visual
system to discover the common and salient objects from a group of relevant
images. Recent methods typically develop sophisticated deep learning based
models have greatly improved the performance of CoSOD task. But there are still
two major drawbacks that need to be further addressed, 1) sub-optimal
inter-image relationship modeling; 2) lacking consideration of inter-image
separability. In this paper, we propose the Co-Salient Object Detection
Transformer (CoSformer) network to capture both salient and common visual
patterns from multiple images. By leveraging Transformer architecture, the
proposed method address the influence of the input orders and greatly improve
the stability of the CoSOD task. We also introduce a novel concept of
inter-image separability. We construct a contrast learning scheme to modeling
the inter-image separability and learn more discriminative embedding space to
distinguish true common objects from noisy objects. Extensive experiments on
three challenging benchmarks, i.e., CoCA, CoSOD3k, and Cosal2015, demonstrate
that our CoSformer outperforms cutting-edge models and achieves the new
state-of-the-art. We hope that CoSformer can motivate future research for more
visual co-analysis tasks
Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton
In this paper, we solve three low-level pixel-wise vision problems, including
salient object segmentation, edge detection, and skeleton extraction, within a
unified framework. We first show some similarities shared by these tasks and
then demonstrate how they can be leveraged for developing a unified framework
that can be trained end-to-end. In particular, we introduce a selective
integration module that allows each task to dynamically choose features at
different levels from the shared backbone based on its own characteristics.
Furthermore, we design a task-adaptive attention module, aiming at
intelligently allocating information for different tasks according to the image
content priors. To evaluate the performance of our proposed network on these
tasks, we conduct exhaustive experiments on multiple representative datasets.
We will show that though these tasks are naturally quite different, our network
can work well on all of them and even perform better than current
single-purpose state-of-the-art methods. In addition, we also conduct adequate
ablation analyses that provide a full understanding of the design principles of
the proposed framework. To facilitate future research, source code will be
released
Salient Object Detection via Integrity Learning
Albeit current salient object detection (SOD) works have achieved fantastic
progress, they are cast into the shade when it comes to the integrity of the
predicted salient regions. We define the concept of integrity at both the micro
and macro level. Specifically, at the micro level, the model should highlight
all parts that belong to a certain salient object, while at the macro level,
the model needs to discover all salient objects from the given image scene. To
facilitate integrity learning for salient object detection, we design a novel
Integrity Cognition Network (ICON), which explores three important components
to learn strong integrity features. 1) Unlike the existing models that focus
more on feature discriminability, we introduce a diverse feature aggregation
(DFA) component to aggregate features with various receptive fields (i.e.,,
kernel shape and context) and increase the feature diversity. Such diversity is
the foundation for mining the integral salient objects. 2) Based on the DFA
features, we introduce the integrity channel enhancement (ICE) component with
the goal of enhancing feature channels that highlight the integral salient
objects at the macro level, while suppressing the other distracting ones. 3)
After extracting the enhanced features, the part-whole verification (PWV)
method is employed to determine whether the part and whole object features have
strong agreement. Such part-whole agreements can further improve the
micro-level integrity for each salient object. To demonstrate the effectiveness
of ICON, comprehensive experiments are conducted on seven challenging
benchmarks, where promising results are achieved