10 research outputs found

    Res2Net: A New Multi-scale Backbone Architecture

    Full text link
    Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/.Comment: 11 pages, 7 figure

    CoSformer: Detecting Co-Salient Object with Transformers

    Full text link
    Co-Salient Object Detection (CoSOD) aims at simulating the human visual system to discover the common and salient objects from a group of relevant images. Recent methods typically develop sophisticated deep learning based models have greatly improved the performance of CoSOD task. But there are still two major drawbacks that need to be further addressed, 1) sub-optimal inter-image relationship modeling; 2) lacking consideration of inter-image separability. In this paper, we propose the Co-Salient Object Detection Transformer (CoSformer) network to capture both salient and common visual patterns from multiple images. By leveraging Transformer architecture, the proposed method address the influence of the input orders and greatly improve the stability of the CoSOD task. We also introduce a novel concept of inter-image separability. We construct a contrast learning scheme to modeling the inter-image separability and learn more discriminative embedding space to distinguish true common objects from noisy objects. Extensive experiments on three challenging benchmarks, i.e., CoCA, CoSOD3k, and Cosal2015, demonstrate that our CoSformer outperforms cutting-edge models and achieves the new state-of-the-art. We hope that CoSformer can motivate future research for more visual co-analysis tasks

    Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton

    Full text link
    In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction, within a unified framework. We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework that can be trained end-to-end. In particular, we introduce a selective integration module that allows each task to dynamically choose features at different levels from the shared backbone based on its own characteristics. Furthermore, we design a task-adaptive attention module, aiming at intelligently allocating information for different tasks according to the image content priors. To evaluate the performance of our proposed network on these tasks, we conduct exhaustive experiments on multiple representative datasets. We will show that though these tasks are naturally quite different, our network can work well on all of them and even perform better than current single-purpose state-of-the-art methods. In addition, we also conduct adequate ablation analyses that provide a full understanding of the design principles of the proposed framework. To facilitate future research, source code will be released

    Salient Object Detection via Integrity Learning

    Full text link
    Albeit current salient object detection (SOD) works have achieved fantastic progress, they are cast into the shade when it comes to the integrity of the predicted salient regions. We define the concept of integrity at both the micro and macro level. Specifically, at the micro level, the model should highlight all parts that belong to a certain salient object, while at the macro level, the model needs to discover all salient objects from the given image scene. To facilitate integrity learning for salient object detection, we design a novel Integrity Cognition Network (ICON), which explores three important components to learn strong integrity features. 1) Unlike the existing models that focus more on feature discriminability, we introduce a diverse feature aggregation (DFA) component to aggregate features with various receptive fields (i.e.,, kernel shape and context) and increase the feature diversity. Such diversity is the foundation for mining the integral salient objects. 2) Based on the DFA features, we introduce the integrity channel enhancement (ICE) component with the goal of enhancing feature channels that highlight the integral salient objects at the macro level, while suppressing the other distracting ones. 3) After extracting the enhanced features, the part-whole verification (PWV) method is employed to determine whether the part and whole object features have strong agreement. Such part-whole agreements can further improve the micro-level integrity for each salient object. To demonstrate the effectiveness of ICON, comprehensive experiments are conducted on seven challenging benchmarks, where promising results are achieved
    corecore