140,343 research outputs found
Disentanglement for Discriminative Visual Recognition
Recent successes of deep learning-based recognition rely on maintaining the
content related to the main-task label. However, how to explicitly dispel the
noisy signals for better generalization in a controllable manner remains an
open issue. For instance, various factors such as identity-specific attributes,
pose, illumination and expression affect the appearance of face images.
Disentangling the identity-specific factors is potentially beneficial for
facial expression recognition (FER). This chapter systematically summarize the
detrimental factors as task-relevant/irrelevant semantic variations and
unspecified latent variation. In this chapter, these problems are casted as
either a deep metric learning problem or an adversarial minimax game in the
latent space. For the former choice, a generalized adaptive (N+M)-tuplet
clusters loss function together with the identity-aware hard-negative mining
and online positive mining scheme can be used for identity-invariant FER. The
better FER performance can be achieved by combining the deep metric loss and
softmax loss in a unified two fully connected layer branches framework via
joint optimization. For the latter solution, it is possible to equipping an
end-to-end conditional adversarial network with the ability to decompose an
input sample into three complementary parts. The discriminative representation
inherits the desired invariance property guided by prior knowledge of the task,
which is marginal independent to the task-relevant/irrelevant semantic and
latent variations. The framework achieves top performance on a serial of tasks,
including lighting, makeup, disguise-tolerant face recognition and facial
attributes recognition. This chapter systematically summarize the popular and
practical solution for disentanglement to achieve more discriminative visual
recognition.Comment: Manuscript for book "Recognition and perception of images" Will
A Deeper Look at Facial Expression Dataset Bias
Datasets play an important role in the progress of facial expression
recognition algorithms, but they may suffer from obvious biases caused by
different cultures and collection conditions. To look deeper into this bias, we
first conduct comprehensive experiments on dataset recognition and crossdataset
generalization tasks, and for the first time explore the intrinsic causes of
the dataset discrepancy. The results quantitatively verify that current
datasets have a strong buildin bias and corresponding analyses indicate that
the conditional probability distributions between source and target datasets
are different. However, previous researches are mainly based on shallow
features with limited discriminative ability under the assumption that the
conditional distribution remains unchanged across domains. To address these
issues, we further propose a novel deep Emotion-Conditional Adaption Network
(ECAN) to learn domain-invariant and discriminative feature representations,
which can match both the marginal and the conditional distributions across
domains simultaneously. In addition, the largely ignored expression class
distribution bias is also addressed by a learnable re-weighting parameter, so
that the training and testing domains can share similar class distribution.
Extensive cross-database experiments on both lab-controlled datasets (CK+,
JAFFE, MMI and Oulu-CASIA) and real-world databases (AffectNet, FER2013, RAF-DB
2.0 and SFEW 2.0) demonstrate that our ECAN can yield competitive performances
across various facial expression transfer tasks and outperform the
state-of-theart methods
GroupFace: Learning Latent Groups and Constructing Group-based Representations for Face Recognition
In the field of face recognition, a model learns to distinguish millions of
face images with fewer dimensional embedding features, and such vast
information may not be properly encoded in the conventional model with a single
branch. We propose a novel face-recognition-specialized architecture called
GroupFace that utilizes multiple group-aware representations, simultaneously,
to improve the quality of the embedding feature. The proposed method provides
self-distributed labels that balance the number of samples belonging to each
group without additional human annotations, and learns the group-aware
representations that can narrow down the search space of the target identity.
We prove the effectiveness of the proposed method by showing extensive ablation
studies and visualizations. All the components of the proposed method can be
trained in an end-to-end manner with a marginal increase of computational
complexity. Finally, the proposed method achieves the state-of-the-art results
with significant improvements in 1:1 face verification and 1:N face
identification tasks on the following public datasets: LFW, YTF, CALFW, CPLFW,
CFP, AgeDB-30, MegaFace, IJB-B and IJB-C.Comment: Accepted to CVPR 202
Transfer Adaptation Learning: A Decade Survey
The world we see is ever-changing and it always changes with people, things,
and the environment. Domain is referred to as the state of the world at a
certain moment. A research problem is characterized as transfer adaptation
learning (TAL) when it needs knowledge correspondence between different
moments/domains. Conventional machine learning aims to find a model with the
minimum expected risk on test data by minimizing the regularized empirical risk
on the training data, which, however, supposes that the training and test data
share similar joint probability distribution. TAL aims to build models that can
perform tasks of target domain by learning knowledge from a semantic related
but distribution different source domain. It is an energetic research filed of
increasing influence and importance, which is presenting a blowout publication
trend. This paper surveys the advances of TAL methodologies in the past decade,
and the technical challenges and essential problems of TAL have been observed
and discussed with deep insights and new perspectives. Broader solutions of
transfer adaptation learning being created by researchers are identified, i.e.,
instance re-weighting adaptation, feature adaptation, classifier adaptation,
deep network adaptation and adversarial adaptation, which are beyond the early
semi-supervised and unsupervised split. The survey helps researchers rapidly
but comprehensively understand and identify the research foundation, research
status, theoretical limitations, future challenges and under-studied issues
(universality, interpretability, and credibility) to be broken in the field
toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure
Neural Architecture Search for Deep Face Recognition
By the widespread popularity of electronic devices, the emergence of
biometric technology has brought significant convenience to user authentication
compared with the traditional password and mode unlocking. Among many
biological characteristics, the face is a universal and irreplaceable feature
that does not need too much cooperation and can significantly improve the
user's experience at the same time. Face recognition is one of the main
functions of electronic equipment propaganda. Hence it's virtually worth
researching in computer vision. Previous work in this field has focused on two
directions: converting loss function to improve recognition accuracy in
traditional deep convolution neural networks (Resnet); combining the latest
loss function with the lightweight system (MobileNet) to reduce network size at
the minimal expense of accuracy. But none of these has changed the network
structure. With the development of AutoML, neural architecture search (NAS) has
shown excellent performance in the benchmark of image classification. In this
paper, we integrate NAS technology into face recognition to customize a more
suitable network. We quote the framework of neural architecture search which
trains child and controller network alternately. At the same time, we mutate
NAS by incorporating evaluation latency into rewards of reinforcement learning
and utilize policy gradient algorithm to search the architecture automatically
with the most classical cross-entropy loss. The network architectures we
searched out have got state-of-the-art accuracy in the large-scale face
dataset, which achieves 98.77% top-1 in MS-Celeb-1M and 99.89% in LFW with
relatively small network size. To the best of our knowledge, this proposal is
the first attempt to use NAS to solve the problem of Deep Face Recognition and
achieve the best results in this domain
Probabilistic Attribute Tree in Convolutional Neural Networks for Facial Expression Recognition
In this paper, we proposed a novel Probabilistic Attribute Tree-CNN (PAT-CNN)
to explicitly deal with the large intra-class variations caused by
identity-related attributes, e.g., age, race, and gender. Specifically, a novel
PAT module with an associated PAT loss was proposed to learn features in a
hierarchical tree structure organized according to attributes, where the final
features are less affected by the attributes. Then, expression-related features
are extracted from leaf nodes. Samples are probabilistically assigned to tree
nodes at different levels such that expression-related features can be learned
from all samples weighted by probabilities. We further proposed a
semi-supervised strategy to learn the PAT-CNN from limited attribute-annotated
samples to make the best use of available data. Experimental results on five
facial expression datasets have demonstrated that the proposed PAT-CNN
outperforms the baseline models by explicitly modeling attributes. More
impressively, the PAT-CNN using a single model achieves the best performance
for faces in the wild on the SFEW dataset, compared with the state-of-the-art
methods using an ensemble of hundreds of CNNs.Comment: 10 page
InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity
We demonstrate an approach to face attribute detection that retains or
improves attribute detection accuracy across gender and race subgroups by
learning demographic information prior to learning the attribute detection
task. The system, which we call InclusiveFaceNet, detects face attributes by
transferring race and gender representations learned from a held-out dataset of
public race and gender identities. Leveraging learned demographic
representations while withholding demographic inference from the downstream
face attribute detection task preserves potential users' demographic privacy
while resulting in some of the best reported numbers to date on attribute
detection in the Faces of the World and CelebA datasets.Comment: Presented as a talk at the 2018 Workshop on Fairness, Accountability,
and Transparency in Machine Learning (FAT/ML 2018
End-to-end learning potentials for structured attribute prediction
We present a structured inference approach in deep neural networks for
multiple attribute prediction. In attribute prediction, a common approach is to
learn independent classifiers on top of a good feature representation. However,
such classifiers assume conditional independence on features and do not
explicitly consider the dependency between attributes in the inference process.
We propose to formulate attribute prediction in terms of marginal inference in
the conditional random field. We model potential functions by deep neural
networks and apply the sum-product algorithm to solve for the approximate
marginal distribution in feed-forward networks. Our message passing layer
implements sparse pairwise potentials by a softplus-linear function that is
equivalent to a higher-order classifier, and learns all the model parameters by
end-to-end back propagation. The experimental results using SUN attributes and
CelebA datasets suggest that the structured inference improves the attribute
prediction performance, and possibly uncovers the hidden relationship between
attributes
NormFace: L2 Hypersphere Embedding for Face Verification
Thanks to the recent developments of Convolutional Neural Networks, the
performance of face verification methods has increased rapidly. In a typical
face verification method, feature normalization is a critical step for boosting
performance. This motivates us to introduce and study the effect of
normalization during training. But we find this is non-trivial, despite
normalization being differentiable. We identify and study four issues related
to normalization through mathematical analysis, which yields understanding and
helps with parameter settings. Based on this analysis we propose two strategies
for training using normalized features. The first is a modification of softmax
loss, which optimizes cosine similarity instead of inner-product. The second is
a reformulation of metric learning by introducing an agent vector for each
class. We show that both strategies, and small variants, consistently improve
performance by between 0.2% to 0.4% on the LFW dataset based on two models.
This is significant because the performance of the two models on LFW dataset is
close to saturation at over 98%. Codes and models are released on
https://github.com/happynear/NormFaceComment: camera-ready versio
Large Margin Learning in Set to Set Similarity Comparison for Person Re-identification
Person re-identification (Re-ID) aims at matching images of the same person
across disjoint camera views, which is a challenging problem in multimedia
analysis, multimedia editing and content-based media retrieval communities. The
major challenge lies in how to preserve similarity of the same person across
video footages with large appearance variations, while discriminating different
individuals. To address this problem, conventional methods usually consider the
pairwise similarity between persons by only measuring the point to point (P2P)
distance. In this paper, we propose to use deep learning technique to model a
novel set to set (S2S) distance, in which the underline objective focuses on
preserving the compactness of intra-class samples for each camera view, while
maximizing the margin between the intra-class set and inter-class set. The S2S
distance metric is consisted of three terms, namely the class-identity term,
the relative distance term and the regularization term. The class-identity term
keeps the intra-class samples within each camera view gathering together, the
relative distance term maximizes the distance between the intra-class class set
and inter-class set across different camera views, and the regularization term
smoothness the parameters of deep convolutional neural network (CNN). As a
result, the final learned deep model can effectively find out the matched
target to the probe object among various candidates in the video gallery by
learning discriminative and stable feature representations. Using the CUHK01,
CUHK03, PRID2011 and Market1501 benchmark datasets, we extensively conducted
comparative evaluations to demonstrate the advantages of our method over the
state-of-the-art approaches.Comment: Accepted by IEEE Transactions on Multimedi
- …