1,752 research outputs found
Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection
We propose the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN)
for fast and accurate single-shot object detection. Feature Pyramid (FP) is
widely used in recent visual detection, however the top-down pathway of FP
cannot preserve accurate localization due to pooling shifting. The advantage of
FP is weaken as deeper backbones with more layers are used. To address this
issue, we propose a new parallel FP structure with bi-directional (top-down and
bottom-up) fusion and associated improvements to retain high-quality features
for accurate localization. Our method is particularly suitable for detecting
small objects. We provide the following design improvements: (1) A parallel
bifusion FP structure with a Bottom-up Fusion Module (BFM) to detect both small
and large objects at once with high accuracy. (2) A COncatenation and
RE-organization (CORE) module provides a bottom-up pathway for feature fusion,
which leads to the bi-directional fusion FP that can recover lost information
from lower-layer feature maps. (3) The CORE feature is further purified to
retain richer contextual information. Such purification is performed with CORE
in a few iterations in both top-down and bottom-up pathways. (4) The adding of
a residual design to CORE leads to a new Re-CORE module that enables easy
training and integration with a wide range of (deeper or lighter) backbones.
The proposed network achieves state-of-the-art performance on UAVDT17 and MS
COCO datasets.Comment: accepted by IEEE transactions on Image Processin
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
In this work we address the task of semantic image segmentation with Deep
Learning and make three main contributions that are experimentally shown to
have substantial practical merit. First, we highlight convolution with
upsampled filters, or 'atrous convolution', as a powerful tool in dense
prediction tasks. Atrous convolution allows us to explicitly control the
resolution at which feature responses are computed within Deep Convolutional
Neural Networks. It also allows us to effectively enlarge the field of view of
filters to incorporate larger context without increasing the number of
parameters or the amount of computation. Second, we propose atrous spatial
pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP
probes an incoming convolutional feature layer with filters at multiple
sampling rates and effective fields-of-views, thus capturing objects as well as
image context at multiple scales. Third, we improve the localization of object
boundaries by combining methods from DCNNs and probabilistic graphical models.
The commonly deployed combination of max-pooling and downsampling in DCNNs
achieves invariance but has a toll on localization accuracy. We overcome this
by combining the responses at the final DCNN layer with a fully connected
Conditional Random Field (CRF), which is shown both qualitatively and
quantitatively to improve localization performance. Our proposed "DeepLab"
system sets the new state-of-art at the PASCAL VOC-2012 semantic image
segmentation task, reaching 79.7% mIOU in the test set, and advances the
results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and
Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
- …