25,546 research outputs found
DOMAIN ADAPTION FOR UNCONSTRAINED FACE VERIFICATION AND IDENTIFICATION
Face recognition has been receiving consistent attention in computer vision community for over three decades. Although recent advances in deep convolutional neural networks (DCNNs) have pushed face recognition algorithms to surpass human performance in most controlled situations, the unconstrained face recognition performance is still far from satisfactory. This is mainly because the domain shift between training and test data is substantial when faces are captured under extreme pose, blur or other covariates variations. In this dissertation, we study the effects of covariates and present approaches of mitigating the domain mismatch to improve the performance of unconstrained face verification and identification.
To study how covariates affect the performance of deep neural networks on the large-scale unconstrained face verification problem, we implement five state-of-the-art deep convolutional networks (DCNNs) and evaluate them on three challenging covariates datasets. In total, seven covariates are considered: pose (yaw and roll), age, facial hair, gender, indoor/outdoor, occlusion (nose and mouth visibility, and forehead visibility), and skin tone. Some of the results confirm and extend the findings of previous studies, while others are new findings that were rarely mentioned before or did not show consistent trends. In addition, we demonstrate that with the assistance of gender information, the quality of a pre-curated noisy large-scale face dataset can be further improved.
Based on the results of this study, we propose four domain adaptation methods to alleviate the effects of covariates. First, since we find that pose is a key factor for performance degradation, we propose a metric learning method to alleviate the effects of pose on face verification performance. We learn a joint model for face and pose verification tasks and explicitly discourage information sharing between the identity and pose metrics. Specifically, we enforce an orthogonal regularization constraint on the learned projection matrices for the two tasks leading to making the identity metrics for face verification more pose-robust. Extensive experiments are conducted on three challenging unconstrained face datasets that show promising results compared to state-of-the-art methods.
Second, to tackle the negative effects brought by image blur, we propose two approaches. The first approach is an incremental dictionary learning method to mitigate the distribution difference between sharp training data and blurred test data. Some blurred faces called supportive samples are selected, which are used for building more discriminative classification models and act as a bridge to connect the two domains. Second, we propose an unsupervised face deblurring approach based on disentangled representations. The disentanglement is achieved by splitting the content and blur features in a blurred image using content encoders and blur encoders. An adversarial loss is added on deblurred results to generate visually realistic faces. We conduct extensive experiments on two challenging face datasets that show promising results.
Finally, apart from the effects of pose and blur, face verification performance also suffers from the generic domain mismatch between source and target faces. To tackle this problem, we propose a template adaptation method for template-based face verification. A template-specific metric is trained to adaptively learn the discriminative information between test templates and the negative training set, which contains subjects that are mutually exclusive to subjects in test templates. Extensive experiments on two challenging face verification datasets yield promising results compared to other competitive methods
MobiFace: A Novel Dataset for Mobile Face Tracking in the Wild
Face tracking serves as the crucial initial step in mobile applications
trying to analyse target faces over time in mobile settings. However, this
problem has received little attention, mainly due to the scarcity of dedicated
face tracking benchmarks. In this work, we introduce MobiFace, the first
dataset for single face tracking in mobile situations. It consists of 80
unedited live-streaming mobile videos captured by 70 different smartphone users
in fully unconstrained environments. Over bounding boxes are manually
labelled. The videos are carefully selected to cover typical smartphone usage.
The videos are also annotated with 14 attributes, including 6 newly proposed
attributes and 8 commonly seen in object tracking. 36 state-of-the-art
trackers, including facial landmark trackers, generic object trackers and
trackers that we have fine-tuned or improved, are evaluated. The results
suggest that mobile face tracking cannot be solved through existing approaches.
In addition, we show that fine-tuning on the MobiFace training data
significantly boosts the performance of deep learning-based trackers,
suggesting that MobiFace captures the unique characteristics of mobile face
tracking. Our goal is to offer the community a diverse dataset to enable the
design and evaluation of mobile face trackers. The dataset, annotations and the
evaluation server will be on \url{https://mobiface.github.io/}.Comment: To appear on The 14th IEEE International Conference on Automatic Face
and Gesture Recognition (FG 2019
Polarimetric Thermal to Visible Face Verification via Self-Attention Guided Synthesis
Polarimetric thermal to visible face verification entails matching two images
that contain significant domain differences. Several recent approaches have
attempted to synthesize visible faces from thermal images for cross-modal
matching. In this paper, we take a different approach in which rather than
focusing only on synthesizing visible faces from thermal faces, we also propose
to synthesize thermal faces from visible faces. Our intuition is based on the
fact that thermal images also contain some discriminative information about the
person for verification. Deep features from a pre-trained Convolutional Neural
Network (CNN) are extracted from the original as well as the synthesized
images. These features are then fused to generate a template which is then used
for verification. The proposed synthesis network is based on the self-attention
generative adversarial network (SAGAN) which essentially allows efficient
attention-guided image synthesis. Extensive experiments on the ARL polarimetric
thermal face dataset demonstrate that the proposed method achieves
state-of-the-art performance.Comment: This work is accepted at the 12th IAPR International Conference On
Biometrics (ICB 2019
GhostVLAD for set-based face recognition
The objective of this paper is to learn a compact representation of image
sets for template-based face recognition. We make the following contributions:
first, we propose a network architecture which aggregates and embeds the face
descriptors produced by deep convolutional neural networks into a compact
fixed-length representation. This compact representation requires minimal
memory storage and enables efficient similarity computation. Second, we propose
a novel GhostVLAD layer that includes {\em ghost clusters}, that do not
contribute to the aggregation. We show that a quality weighting on the input
faces emerges automatically such that informative images contribute more than
those with low quality, and that the ghost clusters enhance the network's
ability to deal with poor quality images. Third, we explore how input feature
dimension, number of clusters and different training techniques affect the
recognition performance. Given this analysis, we train a network that far
exceeds the state-of-the-art on the IJB-B face recognition dataset. This is
currently one of the most challenging public benchmarks, and we surpass the
state-of-the-art on both the identification and verification protocols.Comment: Accepted by ACCV 201
Multicolumn Networks for Face Recognition
The objective of this work is set-based face recognition, i.e. to decide if
two sets of images of a face are of the same person or not. Conventionally, the
set-wise feature descriptor is computed as an average of the descriptors from
individual face images within the set. In this paper, we design a neural
network architecture that learns to aggregate based on both "visual" quality
(resolution, illumination), and "content" quality (relative importance for
discriminative classification). To this end, we propose a Multicolumn Network
(MN) that takes a set of images (the number in the set can vary) as input, and
learns to compute a fix-sized feature descriptor for the entire set. To
encourage high-quality representations, each individual input image is first
weighted by its "visual" quality, determined by a self-quality assessment
module, and followed by a dynamic recalibration based on "content" qualities
relative to the other images within the set. Both of these qualities are learnt
implicitly during training for set-wise classification. Comparing with the
previous state-of-the-art architectures trained with the same dataset
(VGGFace2), our Multicolumn Networks show an improvement of between 2-6% on the
IARPA IJB face recognition benchmarks, and exceed the state of the art for all
methods on these benchmarks.Comment: To appear in BMVC201
Performance Evaluation of Biometric Template Update
Template update allows to modify the biometric reference of a user while he
uses the biometric system. With such kind of mechanism we expect the biometric
system uses always an up to date representation of the user, by capturing his
intra-class (temporary or permanent) variability. Although several studies
exist in the literature, there is no commonly adopted evaluation scheme. This
does not ease the comparison of the different systems of the literature. In
this paper, we show that using different evaluation procedures can lead in
different, and contradictory, interpretations of the results. We use a
keystroke dynamics (which is a modality suffering of template ageing quickly)
template update system on a dataset consisting of height different sessions to
illustrate this point. Even if we do not answer to this problematic, it shows
that it is necessary to normalize the template update evaluation procedures.Comment: International Biometric Performance Testing Conference 2012,
Gaithersburg, MD, USA : United States (2012
A Proximity-Aware Hierarchical Clustering of Faces
In this paper, we propose an unsupervised face clustering algorithm called
"Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local
structure of deep representations. In the proposed method, a similarity measure
between deep features is computed by evaluating linear SVM margins. SVMs are
trained using nearest neighbors of sample data, and thus do not require any
external training data. Clusters are then formed by thresholding the similarity
scores. We evaluate the clustering performance using three challenging
unconstrained face datasets, including Celebrity in Frontal-Profile (CFP),
IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3)
datasets. Experimental results demonstrate that the proposed approach can
achieve significant improvements over state-of-the-art methods. Moreover, we
also show that the proposed clustering algorithm can be applied to curate a set
of large-scale and noisy training dataset while maintaining sufficient amount
of images and their variations due to nuisance factors. The face verification
performance on JANUS CS3 improves significantly by finetuning a DCNN model with
the curated MS-Celeb-1M dataset which contains over three million face images
- …