7,683 research outputs found
Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition
Occlusion and pose variations, which can change facial appearance
significantly, are two major obstacles for automatic Facial Expression
Recognition (FER). Though automatic FER has made substantial progresses in the
past few decades, occlusion-robust and pose-invariant issues of FER have
received relatively less attention, especially in real-world scenarios. This
paper addresses the real-world pose and occlusion robust FER problem with
three-fold contributions. First, to stimulate the research of FER under
real-world occlusions and variant poses, we build several in-the-wild facial
expression datasets with manual annotations for the community. Second, we
propose a novel Region Attention Network (RAN), to adaptively capture the
importance of facial regions for occlusion and pose variant FER. The RAN
aggregates and embeds varied number of region features produced by a backbone
convolutional neural network into a compact fixed-length representation. Last,
inspired by the fact that facial expressions are mainly defined by facial
action units, we propose a region biased loss to encourage high attention
weights for the most important regions. We validate our RAN and region biased
loss on both our built test datasets and four popular datasets: FERPlus,
AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region
biased loss largely improve the performance of FER with occlusion and variant
pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet,
RAF-DB, and SFEW. Code and the collected test data will be publicly available.Comment: The test set and the code of this paper will be available at
https://github.com/kaiwang960112/Challenge-condition-FER-datase
Deep Hierarchical Machine: a Flexible Divide-and-Conquer Architecture
We propose Deep Hierarchical Machine (DHM), a model inspired from the
divide-and-conquer strategy while emphasizing representation learning ability
and flexibility. A stochastic routing framework as used by recent deep neural
decision/regression forests is incorporated, but we remove the need to evaluate
unnecessary computation paths by utilizing a different topology and introducing
a probabilistic pruning technique. We also show a specified version of DHM
(DSHM) for efficiency, which inherits the sparse feature extraction process as
in traditional decision tree with pixel-difference feature. To achieve sparse
feature extraction, we propose to utilize sparse convolution operation in DSHM
and show one possibility of introducing sparse convolution kernels by using
local binary convolution layer. DHM can be applied to both classification and
regression problems, and we validate it on standard image classification and
face alignment tasks to show its advantages over past architectures
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!
Autoencoders are unsupervised deep learning models used for learning
representations. In literature, autoencoders have shown to perform well on a
variety of tasks spread across multiple domains, thereby establishing
widespread applicability. Typically, an autoencoder is trained to generate a
model that minimizes the reconstruction error between the input and the
reconstructed output, computed in terms of the Euclidean distance. While this
can be useful for applications related to unsupervised reconstruction, it may
not be optimal for classification. In this paper, we propose a novel Supervised
COSMOS Autoencoder which utilizes a multi-objective loss function to learn
representations that simultaneously encode the (i) "similarity" between the
input and reconstructed vectors in terms of their direction, (ii)
"distribution" of pixel values of the reconstruction with respect to the input
sample, while also incorporating (iii) "discriminability" in the feature
learning pipeline. The proposed autoencoder model incorporates a Cosine
similarity and Mahalanobis distance based loss function, along with supervision
via Mutual Information based loss. Detailed analysis of each component of the
proposed model motivates its applicability for feature learning in different
classification tasks. The efficacy of Supervised COSMOS autoencoder is
demonstrated via extensive experimental evaluations on different image
datasets. The proposed model outperforms existing algorithms on MNIST,
CIFAR-10, and SVHN databases. It also yields state-of-the-art results on
CelebA, LFWA, Adience, and IJB-A databases for attribute prediction and face
recognition, respectively
A Review on Facial Micro-Expressions Analysis: Datasets, Features and Metrics
Facial micro-expressions are very brief, spontaneous facial expressions that
appear on the face of humans when they either deliberately or unconsciously
conceal an emotion. Micro-expression has shorter duration than
macro-expression, which makes it more challenging for human and machine. Over
the past ten years, automatic micro-expressions recognition has attracted
increasing attention from researchers in psychology, computer science,
security, neuroscience and other related disciplines. The aim of this paper is
to provide the insights of automatic micro-expressions and recommendations for
future research. There has been a lot of datasets released over the last decade
that facilitated the rapid growth in this field. However, comparison across
different datasets is difficult due to the inconsistency in experiment
protocol, features used and evaluation methods. To address these issues, we
review the datasets, features and the performance metrics deployed in the
literature. Relevant challenges such as the spatial temporal settings during
data collection, emotional classes versus objective classes in data labelling,
face regions in data analysis, standardisation of metrics and the requirements
for real-world implementation are discussed. We conclude by proposing some
promising future directions to advancing micro-expressions research.Comment: Preprint submitted to IEEE Transaction
Going Deeper in Facial Expression Recognition using Deep Neural Networks
Automated Facial Expression Recognition (FER) has remained a challenging and
interesting problem. Despite efforts made in developing various methods for
FER, existing approaches traditionally lack generalizability when applied to
unseen images or those that are captured in wild setting. Most of the existing
approaches are based on engineered features (e.g. HOG, LBPH, and Gabor) where
the classifier's hyperparameters are tuned to give best recognition accuracies
across a single database, or a small collection of similar databases.
Nevertheless, the results are not significant when they are applied to novel
data. This paper proposes a deep neural network architecture to address the FER
problem across multiple well-known standard face datasets. Specifically, our
network consists of two convolutional layers each followed by max pooling and
then four Inception layers. The network is a single component architecture that
takes registered facial images as the input and classifies them into either of
the six basic or the neutral expressions. We conducted comprehensive
experiments on seven publically available facial expression databases, viz.
MultiPIE, MMI, CK+, DISFA, FERA, SFEW, and FER2013. The results of proposed
architecture are comparable to or better than the state-of-the-art methods and
better than traditional convolutional neural networks and in both accuracy and
training time.Comment: To be appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 2016 {Accepted in first round submission
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Residual Codean Autoencoder for Facial Attribute Analysis
Facial attributes can provide rich ancillary information which can be
utilized for different applications such as targeted marketing, human computer
interaction, and law enforcement. This research focuses on facial attribute
prediction using a novel deep learning formulation, termed as R-Codean
autoencoder. The paper first presents Cosine similarity based loss function in
an autoencoder which is then incorporated into the Euclidean distance based
autoencoder to formulate R-Codean. The proposed loss function thus aims to
incorporate both magnitude and direction of image vectors during feature
learning. Further, inspired by the utility of shortcut connections in deep
models to facilitate learning of optimal parameters, without incurring the
problem of vanishing gradient, the proposed formulation is extended to
incorporate shortcut connections in the architecture. The proposed R-Codean
autoencoder is utilized in facial attribute prediction framework which
incorporates patch-based weighting mechanism for assigning higher weights to
relevant patches for each attribute. The experimental results on publicly
available CelebA and LFWA datasets demonstrate the efficacy of the proposed
approach in addressing this challenging problem.Comment: Accepted in Pattern Recognition Letter
- …