1,220 research outputs found
Deep Learning Face Attributes in the Wild
Predicting face attributes in the wild is challenging due to complex face
variations. We propose a novel deep learning framework for attribute prediction
in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly
with attribute tags, but pre-trained differently. LNet is pre-trained by
massive general object categories for face localization, while ANet is
pre-trained by massive face identities for attribute prediction. This framework
not only outperforms the state-of-the-art with a large margin, but also reveals
valuable facts on learning face representation.
(1) It shows how the performances of face localization (LNet) and attribute
prediction (ANet) can be improved by different pre-training strategies.
(2) It reveals that although the filters of LNet are fine-tuned only with
image-level attribute tags, their response maps over entire images have strong
indication of face locations. This fact enables training LNet for face
localization with only image-level annotations, but without face bounding boxes
or landmarks, which are required by all attribute recognition works.
(3) It also demonstrates that the high-level hidden neurons of ANet
automatically discover semantic concepts after pre-training with massive face
identities, and such concepts are significantly enriched after fine-tuning with
attribute tags. Each attribute can be well explained with a sparse linear
combination of these concepts.Comment: To appear in International Conference on Computer Vision (ICCV) 201
Multi-view Face Detection Using Deep Convolutional Neural Networks
In this paper we consider the problem of multi-view face detection. While
there has been significant research on this problem, current state-of-the-art
approaches for this task require annotation of facial landmarks, e.g. TSM [25],
or annotation of face poses [28, 22]. They also require training dozens of
models to fully capture faces in all orientations, e.g. 22 models in HeadHunter
method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method
that does not require pose/landmark annotation and is able to detect faces in a
wide range of orientations using a single model based on deep convolutional
neural networks. The proposed method has minimal complexity; unlike other
recent deep learning object detection methods [9], it does not require
additional components such as segmentation, bounding-box regression, or SVM
classifiers. Furthermore, we analyzed scores of the proposed face detector for
faces in different orientations and found that 1) the proposed method is able
to detect faces from different angles and can handle occlusion to some extent,
2) there seems to be a correlation between dis- tribution of positive examples
in the training set and scores of the proposed face detector. The latter
suggests that the proposed methods performance can be further improved by using
better sampling strategies and more sophisticated data augmentation techniques.
Evaluations on popular face detection benchmark datasets show that our
single-model face detector algorithm has similar or better performance compared
to the previous methods, which are more complex and require annotations of
either different poses or facial landmarks.Comment: in International Conference on Multimedia Retrieval 2015 (ICMR
Face Alignment Assisted by Head Pose Estimation
In this paper we propose a supervised initialization scheme for cascaded face
alignment based on explicit head pose estimation. We first investigate the
failure cases of most state of the art face alignment approaches and observe
that these failures often share one common global property, i.e. the head pose
variation is usually large. Inspired by this, we propose a deep convolutional
network model for reliable and accurate head pose estimation. Instead of using
a mean face shape, or randomly selected shapes for cascaded face alignment
initialisation, we propose two schemes for generating initialisation: the first
one relies on projecting a mean 3D face shape (represented by 3D facial
landmarks) onto 2D image under the estimated head pose; the second one searches
nearest neighbour shapes from the training set according to head pose distance.
By doing so, the initialisation gets closer to the actual shape, which enhances
the possibility of convergence and in turn improves the face alignment
performance. We demonstrate the proposed method on the benchmark 300W dataset
and show very competitive performance in both head pose estimation and face
alignment.Comment: Accepted by BMVC201
Studies on Imaging System and Machine Learning: 3D Halftoning and Human Facial Landmark Localization
In this dissertation, studies on digital halftoning and human facial landmark localization will be discussed. 3D printing is becoming increasingly popular around the world today. By utilizing 3D printing technology, customized products can be manufactured much more quickly and efficiently with much less cost. However, 3D printing still suffers from low-quality surface reproduction compared with 2D printing. One approach to improve it is to develop an advanced halftoning algorithm for 3D printing. In this presentation, we will describe a novel method to 3D halftoning that can cooperate with 3D printing technology in order to generate a high-quality surface reproduction. In the second part of this report, a new method named direct element swap to create a threshold matrix for halftoning is proposed. This method directly swaps the elements in a threshold matrix to find the best element arrangement by minimizing a designated perceived error metric. Through experimental results, the new method yields halftone quality that is competitive with the conventional level-by-level matrix design method. Besides, by using direct element swap method, for the first time, threshold matrix can be designed through being trained with real images. In the second part of the dissertation, a novel facial landmark detection system is presented. Facial landmark detection plays a critical role in many face analysis tasks. However, it still remains a very challenging problem. The challenges come from the large variations of face appearance caused by different illuminations, different facial expressions, different yaw, pitch and roll angles of heads and different image qualities. To tackle this problem, a novel coarse-to-fine cascaded convolutional neural network system for robust facial landmark detection of faces in the wild is presented. The experiment result shows our method outperforms other state-of-the-art methods on public test datasets. Besides, a frontal and profile landmark localization system is proposed and designed. By using a frontal/profile face classifier, either frontal landmark configuration or profile landmark configuration is employed in the facial landmark prediction based on the input face yaw angle
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
- …