537 research outputs found

    Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection

    Full text link
    We propose the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN) for fast and accurate single-shot object detection. Feature Pyramid (FP) is widely used in recent visual detection, however the top-down pathway of FP cannot preserve accurate localization due to pooling shifting. The advantage of FP is weaken as deeper backbones with more layers are used. To address this issue, we propose a new parallel FP structure with bi-directional (top-down and bottom-up) fusion and associated improvements to retain high-quality features for accurate localization. Our method is particularly suitable for detecting small objects. We provide the following design improvements: (1) A parallel bifusion FP structure with a Bottom-up Fusion Module (BFM) to detect both small and large objects at once with high accuracy. (2) A COncatenation and RE-organization (CORE) module provides a bottom-up pathway for feature fusion, which leads to the bi-directional fusion FP that can recover lost information from lower-layer feature maps. (3) The CORE feature is further purified to retain richer contextual information. Such purification is performed with CORE in a few iterations in both top-down and bottom-up pathways. (4) The adding of a residual design to CORE leads to a new Re-CORE module that enables easy training and integration with a wide range of (deeper or lighter) backbones. The proposed network achieves state-of-the-art performance on UAVDT17 and MS COCO datasets.Comment: accepted by IEEE transactions on Image Processin

    Deep learning for real-world object detection

    Get PDF

    Hybrid Graph Neural Networks for Crowd Counting

    Full text link
    Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges:(i) multi-scale relations for capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can distill rich relations between the nodes to obtain more powerful representations, leading to robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art approaches by a large margin.Comment: To appear in AAAI 202
    • …
    corecore