70,335 research outputs found
Research on 3D reconstruction based on 2D face images.
3D face reconstruction is a popular research area in the field of computer vision and has a wide range of applications in various fields such as animation design, virtual reality, medical guidelines, and face recognition. Current commercial 3D face reconstruction generally relies on large image scanning equipment to fuse multiple images through sensors for 3D face reconstruction. However, this approach requires manual modelling, which is costly in terms of time and money, and expensive in terms of equipment, making it unpopular in practical applications. Compared to 3D face construction with multiple images, the single-image approach reduces computational time and economic costs, is relatively simple to implement and does not require specific Hardware equipment. Therefore, we focus on single-image approach in this dissertation and contribute in terms of research novelty and practical use. The main work is as follows: A unique pre-processing process is designed to separate face alignment from face reconstruction. In this dissertation, the Active Shape Model (ASM) algorithm is used for face alignment to detect the face feature points in the image. The face data is posing corrected so that the corrected face is better adapted to the face pose of the UV-Position Map. The UV coordinates are then used to map the 3D information onto the 2D image, creating a UV-3D mapping map. In order to enhance the effect, this dissertation also does face cropping to fill the whole space as much as possible with face data and expands the face dataset using rotation, scaling, panning and noise addition. Improving the neural network model by using the idea of residual learning to train the network model incrementally, emphasizing the reconstruction of the model for deep information. Face data characteristics are first extracted using the encoding and decoding layers, and then face features are learned using the residual learning layer. By comparing with the previous algorithm, we achieved a considerable lead on the 300W-LP face dataset, with a 35% reduction in NME error accumulation over the RPN algorithm. Based on the pre-processing methods and residual structures we proposed, the experimental results have shown good performance on 3D reconstruction of faces. The end-to-end approach based on deep learning achieves better reconstruction quality and accuracy compared to traditional, model-based face reconstruction methods
Deep face recognition in the wild
Face recognition has attracted particular interest in biometric recognition with wide applications in security, entertainment, health, marketing.
Recent years have witnessed rapid development of face recognition technique in both academic and industrial fields with the advent of (a) large amounts of available annotated training datasets, (b) Convolutional Neural Network (CNN) based deep structures, (c) affordable, powerful computation resources and (d) advanced loss functions. Despite the significant improvement and success, there are still challenges remaining to be tackled.
This thesis contributes towards in the wild face recognition from three perspectives including network design, model compression, and model explanation.
Firstly, although the facial landmarks capture pose, expression and shape information, they are only used as the pre-processing step in the current face recognition pipeline without considering their potential in improving model's representation. Thus, we propose the ``FAN-Face'' framework which gradually integrates features from different layers of a facial landmark localization network into different layers of the recognition network. This operation has broken the align-cropped data pre-possessing routine but achieved simple orthogonal improvement to deep face recognition. We attribute this success to the coarse to fine shape-related information stored in the alignment network helping to establish correspondence for face matching.
Secondly, motivated by the success of knowledge distillation in model compression in the object classification task, we have examined current knowledge distillation methods on training lightweight face recognition models. By taking into account the classification problem at hand, we advocate a direct feature matching approach by letting the pre-trained classifier in teacher validate the feature representation from the student network. In addition, as the teacher network trained on labeled dataset alone is capable of capturing rich relational information among labels both in class space and feature space, we make first attempts to use unlabeled data to further enhance the model's performance under the knowledge distillation framework.
Finally, to increase the interpretability of the ``black box'' deep face recognition model, we have developed a new structure with dynamic convolution which is able to provide clustering of the faces in terms of facial attributes. In particular, we propose to cluster the routing weights of dynamic convolution experts to learn facial attributes in an unsupervised manner without forfeiting face recognition accuracy. Besides, we also introduce group convolution into dynamic convolution to increase the expert granularity. We further confirm that the routing vector benefits the feature-based face reconstruction via the deep inversion technique
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
- …