537 research outputs found
Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection
We propose the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN)
for fast and accurate single-shot object detection. Feature Pyramid (FP) is
widely used in recent visual detection, however the top-down pathway of FP
cannot preserve accurate localization due to pooling shifting. The advantage of
FP is weaken as deeper backbones with more layers are used. To address this
issue, we propose a new parallel FP structure with bi-directional (top-down and
bottom-up) fusion and associated improvements to retain high-quality features
for accurate localization. Our method is particularly suitable for detecting
small objects. We provide the following design improvements: (1) A parallel
bifusion FP structure with a Bottom-up Fusion Module (BFM) to detect both small
and large objects at once with high accuracy. (2) A COncatenation and
RE-organization (CORE) module provides a bottom-up pathway for feature fusion,
which leads to the bi-directional fusion FP that can recover lost information
from lower-layer feature maps. (3) The CORE feature is further purified to
retain richer contextual information. Such purification is performed with CORE
in a few iterations in both top-down and bottom-up pathways. (4) The adding of
a residual design to CORE leads to a new Re-CORE module that enables easy
training and integration with a wide range of (deeper or lighter) backbones.
The proposed network achieves state-of-the-art performance on UAVDT17 and MS
COCO datasets.Comment: accepted by IEEE transactions on Image Processin
Hybrid Graph Neural Networks for Crowd Counting
Crowd counting is an important yet challenging task due to the large scale
and density variation. Recent investigations have shown that distilling rich
relations among multi-scale features and exploiting useful information from the
auxiliary task, i.e., localization, are vital for this task. Nevertheless, how
to comprehensively leverage these relations within a unified network
architecture is still a challenging problem. In this paper, we present a novel
network structure called Hybrid Graph Neural Network (HyGnn) which targets to
relieve the problem by interweaving the multi-scale features for crowd density
as well as its auxiliary task (localization) together and performing joint
reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to
jointly represent the task-specific feature maps of different scales as nodes,
and two types of relations as edges:(i) multi-scale relations for capturing the
feature dependencies across scales and (ii) mutual beneficial relations
building bridges for the cooperation between counting and localization. Thus,
through message passing, HyGnn can distill rich relations between the nodes to
obtain more powerful representations, leading to robust and accurate results.
Our HyGnn performs significantly well on four challenging datasets:
ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming
the state-of-the-art approaches by a large margin.Comment: To appear in AAAI 202
- …