17,202 research outputs found
Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection
We propose the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN)
for fast and accurate single-shot object detection. Feature Pyramid (FP) is
widely used in recent visual detection, however the top-down pathway of FP
cannot preserve accurate localization due to pooling shifting. The advantage of
FP is weaken as deeper backbones with more layers are used. To address this
issue, we propose a new parallel FP structure with bi-directional (top-down and
bottom-up) fusion and associated improvements to retain high-quality features
for accurate localization. Our method is particularly suitable for detecting
small objects. We provide the following design improvements: (1) A parallel
bifusion FP structure with a Bottom-up Fusion Module (BFM) to detect both small
and large objects at once with high accuracy. (2) A COncatenation and
RE-organization (CORE) module provides a bottom-up pathway for feature fusion,
which leads to the bi-directional fusion FP that can recover lost information
from lower-layer feature maps. (3) The CORE feature is further purified to
retain richer contextual information. Such purification is performed with CORE
in a few iterations in both top-down and bottom-up pathways. (4) The adding of
a residual design to CORE leads to a new Re-CORE module that enables easy
training and integration with a wide range of (deeper or lighter) backbones.
The proposed network achieves state-of-the-art performance on UAVDT17 and MS
COCO datasets.Comment: accepted by IEEE transactions on Image Processin
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
In this work we address the task of semantic image segmentation with Deep
Learning and make three main contributions that are experimentally shown to
have substantial practical merit. First, we highlight convolution with
upsampled filters, or 'atrous convolution', as a powerful tool in dense
prediction tasks. Atrous convolution allows us to explicitly control the
resolution at which feature responses are computed within Deep Convolutional
Neural Networks. It also allows us to effectively enlarge the field of view of
filters to incorporate larger context without increasing the number of
parameters or the amount of computation. Second, we propose atrous spatial
pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP
probes an incoming convolutional feature layer with filters at multiple
sampling rates and effective fields-of-views, thus capturing objects as well as
image context at multiple scales. Third, we improve the localization of object
boundaries by combining methods from DCNNs and probabilistic graphical models.
The commonly deployed combination of max-pooling and downsampling in DCNNs
achieves invariance but has a toll on localization accuracy. We overcome this
by combining the responses at the final DCNN layer with a fully connected
Conditional Random Field (CRF), which is shown both qualitatively and
quantitatively to improve localization performance. Our proposed "DeepLab"
system sets the new state-of-art at the PASCAL VOC-2012 semantic image
segmentation task, reaching 79.7% mIOU in the test set, and advances the
results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and
Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
- …