2 research outputs found
Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection
State-of-the-art (SoTA) models have improved the accuracy of object detection
with a large margin via a FP (feature pyramid). FP is a top-down aggregation to
collect semantically strong features to improve scale invariance in both
two-stage and one-stage detectors. However, this top-down pathway cannot
preserve accurate object positions due to the shift-effect of pooling. Thus,
the advantage of FP to improve detection accuracy will disappear when more
layers are used. The original FP lacks a bottom-up pathway to offset the lost
information from lower-layer feature maps. It performs well in large-sized
object detection but poor in small-sized object detection. A new structure
"residual feature pyramid" is proposed in this paper. It is bidirectional to
fuse both deep and shallow features towards more effective and robust detection
for both small-sized and large-sized objects. Due to the "residual" nature, it
can be easily trained and integrated to different backbones (even deeper or
lighter) than other bi-directional methods. One important property of this
residual FP is: accuracy improvement is still found even if more layers are
adopted. Extensive experiments on VOC and MS COCO datasets showed the proposed
method achieved the SoTA results for highly-accurate and efficient object
detection.
ResFPN: Residual Skip Connections in Multi-Resolution Feature Pyramid Networks for Accurate Dense Pixel Matching
Dense pixel matching is required for many computer vision algorithms such as
disparity, optical flow or scene flow estimation. Feature Pyramid Networks
(FPN) have proven to be a suitable feature extractor for CNN-based dense
matching tasks. FPN generates well localized and semantically strong features
at multiple scales. However, the generic FPN is not utilizing its full
potential, due to its reasonable but limited localization accuracy. Thus, we
present ResFPN -- a multi-resolution feature pyramid network with multiple
residual skip connections, where at any scale, we leverage the information from
higher resolution maps for stronger and better localized features. In our
ablation study, we demonstrate the effectiveness of our novel architecture with
clearly higher accuracy than FPN. In addition, we verify the superior accuracy
of ResFPN in many different pixel matching applications on established datasets
like KITTI, Sintel, and FlyingThings3D.Comment: Accepted at ICPR 202