5,232 research outputs found
SFD: Single Shot Scale-invariant Face Detector
This paper presents a real-time face detector, named Single Shot
Scale-invariant Face Detector (SFD), which performs superiorly on various
scales of faces with a single deep neural network, especially for small faces.
Specifically, we try to solve the common problem that anchor-based detectors
deteriorate dramatically as the objects become smaller. We make contributions
in the following three aspects: 1) proposing a scale-equitable face detection
framework to handle different scales of faces well. We tile anchors on a wide
range of layers to ensure that all scales of faces have enough features for
detection. Besides, we design anchor scales based on the effective receptive
field and a proposed equal proportion interval principle; 2) improving the
recall rate of small faces by a scale compensation anchor matching strategy; 3)
reducing the false positive rate of small faces via a max-out background label.
As a consequence, our method achieves state-of-the-art detection performance on
all the common face detection benchmarks, including the AFW, PASCAL face, FDDB
and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for
VGA-resolution images.Comment: Accepted by ICCV 2017 + its supplementary materials; Updated the
latest results on WIDER FAC
Multi-Path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces"
Large-scale variations still pose a challenge in unconstrained face
detection. To the best of our knowledge, no current face detection algorithm
can detect a face as large as 800 x 800 pixels while simultaneously detecting
another one as small as 8 x 8 pixels within a single image with equally high
accuracy. We propose a two-stage cascaded face detection framework, Multi-Path
Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a
deep neural network with a classic learning strategy, to tackle this challenge.
The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes
faces at three different scales. It simultaneously utilizes three parallel
outputs of the convolutional feature maps to predict multi-scale candidate face
regions. The "atrous" convolution trick (convolution with up-sampled filters)
and a newly proposed sampling layer for "hard" examples are embedded in MP-RPN
to further boost its performance. The second stage is a Boosted Forests
classifier, which utilizes deep facial features pooled from inside the
candidate face regions as well as deep contextual features pooled from a larger
region surrounding the candidate face regions. This step is included to further
remove hard negative samples. Experiments show that this approach achieves
state-of-the-art face detection performance on the WIDER FACE dataset "hard"
partition, outperforming the former best result by 9.6% for the Average
Precision.Comment: 11 pages, 7 figures, to be presented at CRV 201
An Analysis of Scale Invariance in Object Detection - SNIP
An analysis of different techniques for recognizing and detecting objects
under extreme scale variation is presented. Scale specific and scale invariant
design of detectors are compared by training them with different configurations
of input data. By evaluating the performance of different network architectures
for classifying small objects on ImageNet, we show that CNNs are not robust to
changes in scale. Based on this analysis, we propose to train and test
detectors on the same scales of an image-pyramid. Since small and large objects
are difficult to recognize at smaller and larger scales respectively, we
present a novel training scheme called Scale Normalization for Image Pyramids
(SNIP) which selectively back-propagates the gradients of object instances of
different sizes as a function of the image scale. On the COCO dataset, our
single model performance is 45.7% and an ensemble of 3 networks obtains an mAP
of 48.3%. We use off-the-shelf ImageNet-1000 pre-trained models and only train
with bounding box supervision. Our submission won the Best Student Entry in the
COCO 2017 challenge. Code will be made available at
\url{http://bit.ly/2yXVg4c}.Comment: CVPR 2018, camera ready versio
Speed/accuracy trade-offs for modern convolutional object detectors
The goal of this paper is to serve as a guide for selecting a detection
architecture that achieves the right speed/memory/accuracy balance for a given
application and platform. To this end, we investigate various ways to trade
accuracy for speed and memory usage in modern convolutional object detection
systems. A number of successful systems have been proposed in recent years, but
apples-to-apples comparisons are difficult due to different base feature
extractors (e.g., VGG, Residual Networks), different default image resolutions,
as well as different hardware and software platforms. We present a unified
implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016]
and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and
trace out the speed/accuracy trade-off curve created by using alternative
feature extractors and varying other critical parameters such as image size
within each of these meta-architectures. On one extreme end of this spectrum
where speed and memory are critical, we present a detector that achieves real
time speeds and can be deployed on a mobile device. On the opposite end in
which accuracy is critical, we present a detector that achieves
state-of-the-art performance measured on the COCO detection task.Comment: Accepted to CVPR 201
Incorporating Intra-Class Variance to Fine-Grained Visual Recognition
Fine-grained visual recognition aims to capture discriminative
characteristics amongst visually similar categories. The state-of-the-art
research work has significantly improved the fine-grained recognition
performance by deep metric learning using triplet network. However, the impact
of intra-category variance on the performance of recognition and robust feature
representation has not been well studied. In this paper, we propose to leverage
intra-class variance in metric learning of triplet network to improve the
performance of fine-grained recognition. Through partitioning training images
within each category into a few groups, we form the triplet samples across
different categories as well as different groups, which is called Group
Sensitive TRiplet Sampling (GS-TRS). Accordingly, the triplet loss function is
strengthened by incorporating intra-class variance with GS-TRS, which may
contribute to the optimization objective of triplet network. Extensive
experiments over benchmark datasets CompCar and VehicleID show that the
proposed GS-TRS has significantly outperformed state-of-the-art approaches in
both classification and retrieval tasks.Comment: 6 pages, 5 figure
- …