8,911 research outputs found
Towards Interpretable Face Recognition
Deep CNNs have been pushing the frontier of visual recognition over past
years. Besides recognition accuracy, strong demands in understanding deep CNNs
in the research community motivate developments of tools to dissect pre-trained
models to visualize how they make predictions. Recent works further push the
interpretability in the network learning stage to learn more meaningful
representations. In this work, focusing on a specific area of visual
recognition, we report our efforts towards interpretable face recognition. We
propose a spatial activation diversity loss to learn more structured face
representations. By leveraging the structure, we further design a feature
activation diversity loss to push the interpretable representations to be
discriminative and robust to occlusions. We demonstrate on three face
recognition benchmarks that our proposed method is able to improve face
recognition accuracy with easily interpretable face representations.Comment: 10 pages, 9 figures, 6 tables, To appear in ICCV 2019 as an oral
pape
Occlusion Coherence: Detecting and Localizing Occluded Faces
The presence of occluders significantly impacts object recognition accuracy.
However, occlusion is typically treated as an unstructured source of noise and
explicit models for occluders have lagged behind those for object appearance
and shape. In this paper we describe a hierarchical deformable part model for
face detection and landmark localization that explicitly models part occlusion.
The proposed model structure makes it possible to augment positive training
data with large numbers of synthetically occluded instances. This allows us to
easily incorporate the statistics of occlusion patterns in a discriminatively
trained model. We test the model on several benchmarks for landmark
localization and detection including challenging new data sets featuring
significant occlusion. We find that the addition of an explicit occlusion model
yields a detection system that outperforms existing approaches for occluded
instances while maintaining competitive accuracy in detection and landmark
localization for unoccluded instances
Learning Disentangling and Fusing Networks for Face Completion Under Structured Occlusions
Face completion aims to generate semantically new pixels for missing facial
components. It is a challenging generative task due to large variations of face
appearance. This paper studies generative face completion under structured
occlusions. We treat the face completion and corruption as disentangling and
fusing processes of clean faces and occlusions, and propose a jointly
disentangling and fusing Generative Adversarial Network (DF-GAN). First, three
domains are constructed, corresponding to the distributions of occluded faces,
clean faces and structured occlusions. The disentangling and fusing processes
are formulated as the transformations between the three domains. Then the
disentangling and fusing networks are built to learn the transformations from
unpaired data, where the encoder-decoder structure is adopted and allows DF-GAN
to simulate structure occlusions by modifying the latent representations.
Finally, the disentangling and fusing processes are unified into a dual
learning framework along with an adversarial strategy. The proposed method is
evaluated on Meshface verification problem. Experimental results on four
Meshface databases demonstrate the effectiveness of our proposed method for the
face completion under structured occlusions.Comment: Submitted to CVPR 201
Recognizing Partial Biometric Patterns
Biometric recognition on partial captured targets is challenging, where only
several partial observations of objects are available for matching. In this
area, deep learning based methods are widely applied to match these partial
captured objects caused by occlusions, variations of postures or just partial
out of view in person re-identification and partial face recognition. However,
most current methods are not able to identify an individual in case that some
parts of the object are not obtainable, while the rest are specialized to
certain constrained scenarios. To this end, we propose a robust general
framework for arbitrary biometric matching scenarios without the limitations of
alignment as well as the size of inputs. We introduce a feature post-processing
step to handle the feature maps from FCN and a dictionary learning based
Spatial Feature Reconstruction (SFR) to match different sized feature maps in
this work. Moreover, the batch hard triplet loss function is applied to
optimize the model. The applicability and effectiveness of the proposed method
are demonstrated by the results from experiments on three person
re-identification datasets (Market1501, CUHK03, DukeMTMC-reID), two partial
person datasets (Partial REID and Partial iLIDS) and two partial face datasets
(CASIA-NIR-Distance and Partial LFW), on which state-of-the-art performance is
ensured in comparison with several state-of-the-art approaches. The code is
released online and can be found on the website:
https://github.com/lingxiao-he/Partial-Person-ReID.Comment: 13 pages, 11 figure
Learning Locality-Constrained Collaborative Representation for Face Recognition
The model of low-dimensional manifold and sparse representation are two
well-known concise models that suggest each data can be described by a few
characteristics. Manifold learning is usually investigated for dimension
reduction by preserving some expected local geometric structures from the
original space to a low-dimensional one. The structures are generally
determined by using pairwise distance, e.g., Euclidean distance. Alternatively,
sparse representation denotes a data point as a linear combination of the
points from the same subspace. In practical applications, however, the nearby
points in terms of pairwise distance may not belong to the same subspace, and
vice versa. Consequently, it is interesting and important to explore how to get
a better representation by integrating these two models together. To this end,
this paper proposes a novel coding algorithm, called Locality-Constrained
Collaborative Representation (LCCR), which improves the robustness and
discrimination of data representation by introducing a kind of local
consistency. The locality term derives from a biologic observation that the
similar inputs have similar code. The objective function of LCCR has an
analytical solution, and it does not involve local minima. The empirical
studies based on four public facial databases, ORL, AR, Extended Yale B, and
Multiple PIE, show that LCCR is promising in recognizing human faces from
frontal views with varying expression and illumination, as well as various
corruptions and occlusions.Comment: 16 pages, v
Structured Occlusion Coding for Robust Face Recognition
Occlusion in face recognition is a common yet challenging problem. While
sparse representation based classification (SRC) has been shown promising
performance in laboratory conditions (i.e. noiseless or random pixel
corrupted), it performs much worse in practical scenarios. In this paper, we
consider the practical face recognition problem, where the occlusions are
predictable and available for sampling. We propose the structured occlusion
coding (SOC) to address occlusion problems. The structured coding here lies in
two folds. On one hand, we employ a structured dictionary for recognition. On
the other hand, we propose to use the structured sparsity in this formulation.
Specifically, SOC simultaneously separates the occlusion and classifies the
image. In this way, the problem of recognizing an occluded image is turned into
seeking a structured sparse solution on occlusion-appended dictionary. In order
to construct a well-performing occlusion dictionary, we propose an occlusion
mask estimating technique via locality constrained dictionary (LCD), showing
striking improvement in occlusion sample. On a category-specific occlusion
dictionary, we replace norm sparsity with the structured sparsity which is
shown more robust, further enhancing the robustness of our approach. Moreover,
SOC achieves significant improvement in handling large occlusion in real world.
Extensive experiments are conducted on public data sets to validate the
superiority of the proposed algorithm
Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition
Occlusion and pose variations, which can change facial appearance
significantly, are two major obstacles for automatic Facial Expression
Recognition (FER). Though automatic FER has made substantial progresses in the
past few decades, occlusion-robust and pose-invariant issues of FER have
received relatively less attention, especially in real-world scenarios. This
paper addresses the real-world pose and occlusion robust FER problem with
three-fold contributions. First, to stimulate the research of FER under
real-world occlusions and variant poses, we build several in-the-wild facial
expression datasets with manual annotations for the community. Second, we
propose a novel Region Attention Network (RAN), to adaptively capture the
importance of facial regions for occlusion and pose variant FER. The RAN
aggregates and embeds varied number of region features produced by a backbone
convolutional neural network into a compact fixed-length representation. Last,
inspired by the fact that facial expressions are mainly defined by facial
action units, we propose a region biased loss to encourage high attention
weights for the most important regions. We validate our RAN and region biased
loss on both our built test datasets and four popular datasets: FERPlus,
AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region
biased loss largely improve the performance of FER with occlusion and variant
pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet,
RAF-DB, and SFEW. Code and the collected test data will be publicly available.Comment: The test set and the code of this paper will be available at
https://github.com/kaiwang960112/Challenge-condition-FER-datase
A Survey of the Trends in Facial and Expression Recognition Databases and Methods
Automated facial identification and facial expression recognition have been
topics of active research over the past few decades. Facial and expression
recognition find applications in human-computer interfaces, subject tracking,
real-time security surveillance systems and social networking. Several holistic
and geometric methods have been developed to identify faces and expressions
using public and local facial image databases. In this work we present the
evolution in facial image data sets and the methodologies for facial
identification and recognition of expressions such as anger, sadness,
happiness, disgust, fear and surprise. We observe that most of the earlier
methods for facial and expression recognition aimed at improving the
recognition rates for facial feature-based methods using static images.
However, the recent methodologies have shifted focus towards robust
implementation of facial/expression recognition from large image databases that
vary with space (gathered from the internet) and time (video recordings). The
evolution trends in databases and methodologies for facial and expression
recognition can be useful for assessing the next-generation topics that may
have applications in security systems or personal identification systems that
involve "Quantitative face" assessments.Comment: 16 pages, 4 figures, 3 tables, International Journal of Computer
Science and Engineering Survey, October, 201
Deep Multi-Center Learning for Face Alignment
Facial landmarks are highly correlated with each other since a certain
landmark can be estimated by its neighboring landmarks. Most of the existing
deep learning methods only use one fully-connected layer called shape
prediction layer to estimate the locations of facial landmarks. In this paper,
we propose a novel deep learning framework named Multi-Center Learning with
multiple shape prediction layers for face alignment. In particular, each shape
prediction layer emphasizes on the detection of a certain cluster of
semantically relevant landmarks respectively. Challenging landmarks are focused
firstly, and each cluster of landmarks is further optimized respectively.
Moreover, to reduce the model complexity, we propose a model assembling method
to integrate multiple shape prediction layers into one shape prediction layer.
Extensive experiments demonstrate that our method is effective for handling
complex occlusions and appearance variations with real-time performance. The
code for our method is available at
https://github.com/ZhiwenShao/MCNet-Extension.Comment: This paper has been accepted by Neurocomputin
Facial Landmark Detection: a Literature Survey
The locations of the fiducial facial landmark points around facial components
and facial contour capture the rigid and non-rigid facial deformations due to
head movements and facial expressions. They are hence important for various
facial analysis tasks. Many facial landmark detection algorithms have been
developed to automatically detect those key points over the years, and in this
paper, we perform an extensive review of them. We classify the facial landmark
detection algorithms into three major categories: holistic methods, Constrained
Local Model (CLM) methods, and the regression-based methods. They differ in the
ways to utilize the facial appearance and shape information. The holistic
methods explicitly build models to represent the global facial appearance and
shape information. The CLMs explicitly leverage the global shape model but
build the local appearance models. The regression-based methods implicitly
capture facial shape and appearance information. For algorithms within each
category, we discuss their underlying theories as well as their differences. We
also compare their performances on both controlled and in the wild benchmark
datasets, under varying facial expressions, head poses, and occlusion. Based on
the evaluations, we point out their respective strengths and weaknesses. There
is also a separate section to review the latest deep learning-based algorithms.
The survey also includes a listing of the benchmark databases and existing
software. Finally, we identify future research directions, including combining
methods in different categories to leverage their respective strengths to solve
landmark detection "in-the-wild"
- …