1,387 research outputs found
Selective Refinement Network for High Performance Face Detection
High performance face detection remains a very challenging problem,
especially when there exists many tiny faces. This paper presents a novel
single-shot face detector, named Selective Refinement Network (SRN), which
introduces novel two-step classification and regression operations selectively
into an anchor-based face detector to reduce false positives and improve
location accuracy simultaneously. In particular, the SRN consists of two
modules: the Selective Two-step Classification (STC) module and the Selective
Two-step Regression (STR) module. The STC aims to filter out most simple
negative anchors from low level detection layers to reduce the search space for
the subsequent classifier, while the STR is designed to coarsely adjust the
locations and sizes of anchors from high level detection layers to provide
better initialization for the subsequent regressor. Moreover, we design a
Receptive Field Enhancement (RFE) block to provide more diverse receptive
field, which helps to better capture faces in some extreme poses. As a
consequence, the proposed SRN detector achieves state-of-the-art performance on
all the widely used face detection benchmarks, including AFW, PASCAL face,
FDDB, and WIDER FACE datasets. Codes will be released to facilitate further
studies on the face detection problem.Comment: The first two authors have equal contributions. Corresponding author:
Shifeng Zhang ([email protected]
Single-Shot Refinement Neural Network for Object Detection
For object detection, the two-stage approach (e.g., Faster R-CNN) has been
achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has
the advantage of high efficiency. To inherit the merits of both while
overcoming their disadvantages, in this paper, we propose a novel single-shot
based detector, called RefineDet, that achieves better accuracy than two-stage
methods and maintains comparable efficiency of one-stage methods. RefineDet
consists of two inter-connected modules, namely, the anchor refinement module
and the object detection module. Specifically, the former aims to (1) filter
out negative anchors to reduce search space for the classifier, and (2)
coarsely adjust the locations and sizes of anchors to provide better
initialization for the subsequent regressor. The latter module takes the
refined anchors as the input from the former to further improve the regression
and predict multi-class label. Meanwhile, we design a transfer connection block
to transfer the features in the anchor refinement module to predict locations,
sizes and class labels of objects in the object detection module. The
multi-task loss function enables us to train the whole network in an end-to-end
way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO
demonstrate that RefineDet achieves state-of-the-art detection accuracy with
high efficiency. Code is available at https://github.com/sfzhang15/RefineDetComment: 14 pages, 7 figures, 7 table
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
- …