45,454 research outputs found
Spontaneous Facial Micro-Expression Recognition using 3D Spatiotemporal Convolutional Neural Networks
Facial expression recognition in videos is an active area of research in
computer vision. However, fake facial expressions are difficult to be
recognized even by humans. On the other hand, facial micro-expressions
generally represent the actual emotion of a person, as it is a spontaneous
reaction expressed through human face. Despite of a few attempts made for
recognizing micro-expressions, still the problem is far from being a solved
problem, which is depicted by the poor rate of accuracy shown by the
state-of-the-art methods. A few CNN based approaches are found in the
literature to recognize micro-facial expressions from still images. Whereas, a
spontaneous micro-expression video contains multiple frames that have to be
processed together to encode both spatial and temporal information. This paper
proposes two 3D-CNN methods: MicroExpSTCNN and MicroExpFuseNet, for spontaneous
facial micro-expression recognition by exploiting the spatiotemporal
information in CNN framework. The MicroExpSTCNN considers the full spatial
information, whereas the MicroExpFuseNet is based on the 3D-CNN feature fusion
of the eyes and mouth regions. The experiments are performed over CAS(ME)^2 and
SMIC micro-expression databases. The proposed MicroExpSTCNN model outperforms
the state-of-the-art methods.Comment: Accepted in 2019 International Joint Conference on Neural Networks
(IJCNN
Group Emotion Recognition Using Machine Learning
Automatic facial emotion recognition is a challenging task that has gained
significant scientific interest over the past few years, but the problem of
emotion recognition for a group of people has been less extensively studied.
However, it is slowly gaining popularity due to the massive amount of data
available on social networking sites containing images of groups of people
participating in various social events. Group emotion recognition is a
challenging problem due to obstructions like head and body pose variations,
occlusions, variable lighting conditions, variance of actors, varied indoor and
outdoor settings and image quality. The objective of this task is to classify a
group's perceived emotion as Positive, Neutral or Negative. In this report, we
describe our solution which is a hybrid machine learning system that
incorporates deep neural networks and Bayesian classifiers. Deep Convolutional
Neural Networks (CNNs) work from bottom to top, analysing facial expressions
expressed by individual faces extracted from the image. The Bayesian network
works from top to bottom, inferring the global emotion for the image, by
integrating the visual features of the contents of the image obtained through a
scene descriptor. In the final pipeline, the group emotion category predicted
by an ensemble of CNNs in the bottom-up module is passed as input to the
Bayesian Network in the top-down module and an overall prediction for the image
is obtained. Experimental results show that the stated system achieves 65.27%
accuracy on the validation set which is in line with state-of-the-art results.
As an outcome of this project, a Progressive Web Application and an
accompanying Android app with a simple and intuitive user interface are
presented, allowing users to test out the system with their own pictures
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
A Deep Learning Perspective on the Origin of Facial Expressions
Facial expressions play a significant role in human communication and
behavior. Psychologists have long studied the relationship between facial
expressions and emotions. Paul Ekman et al., devised the Facial Action Coding
System (FACS) to taxonomize human facial expressions and model their behavior.
The ability to recognize facial expressions automatically, enables novel
applications in fields like human-computer interaction, social gaming, and
psychological research. There has been a tremendously active research in this
field, with several recent papers utilizing convolutional neural networks (CNN)
for feature extraction and inference. In this paper, we employ CNN
understanding methods to study the relation between the features these
computational networks are using, the FACS and Action Units (AU). We verify our
findings on the Extended Cohn-Kanade (CK+), NovaEmotions and FER2013 datasets.
We apply these models to various tasks and tests using transfer learning,
including cross-dataset validation and cross-task performance. Finally, we
exploit the nature of the FER based CNN models for the detection of
micro-expressions and achieve state-of-the-art accuracy using a simple
long-short-term-memory (LSTM) recurrent neural network (RNN)
A Review of Modularization Techniques in Artificial Neural Networks
Artificial neural networks (ANNs) have achieved significant success in
tackling classical and modern machine learning problems. As learning problems
grow in scale and complexity, and expand into multi-disciplinary territory, a
more modular approach for scaling ANNs will be needed. Modular neural networks
(MNNs) are neural networks that embody the concepts and principles of
modularity. MNNs adopt a large number of different techniques for achieving
modularization. Previous surveys of modularization techniques are relatively
scarce in their systematic analysis of MNNs, focusing mostly on empirical
comparisons and lacking an extensive taxonomical framework. In this review, we
aim to establish a solid taxonomy that captures the essential properties and
relationships of the different variants of MNNs. Based on an investigation of
the different levels at which modularization techniques act, we attempt to
provide a universal and systematic framework for theorists studying MNNs, also
trying along the way to emphasise the strengths and weaknesses of different
modularization approaches in order to highlight good practices for neural
network practitioners.Comment: Artif Intell Rev (2019
SensitiveNets: Learning Agnostic Representations with Application to Face Images
This work proposes a novel privacy-preserving neural network feature
representation to suppress the sensitive information of a learned space while
maintaining the utility of the data. The new international regulation for
personal data protection forces data controllers to guarantee privacy and avoid
discriminative hazards while managing sensitive data of users. In our approach,
privacy and discrimination are related to each other. Instead of existing
approaches aimed directly at fairness improvement, the proposed feature
representation enforces the privacy of selected attributes. This way fairness
is not the objective, but the result of a privacy-preserving learning method.
This approach guarantees that sensitive information cannot be exploited by any
agent who process the output of the model, ensuring both privacy and equality
of opportunity. Our method is based on an adversarial regularizer that
introduces a sensitive information removal function in the learning objective.
The method is evaluated on three different primary tasks (identity,
attractiveness, and smiling) and three publicly available benchmarks. In
addition, we present a new face annotation dataset with balanced distribution
between genders and ethnic origins. The experiments demonstrate that it is
possible to improve the privacy and equality of opportunity while retaining
competitive performance independently of the task.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Robust Spatial Filtering with Graph Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have recently led to incredible
breakthroughs on a variety of pattern recognition problems. Banks of finite
impulse response filters are learned on a hierarchy of layers, each
contributing more abstract information than the previous layer. The simplicity
and elegance of the convolutional filtering process makes them perfect for
structured problems such as image, video, or voice, where vertices are
homogeneous in the sense of number, location, and strength of neighbors. The
vast majority of classification problems, for example in the pharmaceutical,
homeland security, and financial domains are unstructured. As these problems
are formulated into unstructured graphs, the heterogeneity of these problems,
such as number of vertices, number of connections per vertex, and edge
strength, cannot be tackled with standard convolutional techniques. We propose
a novel neural learning framework that is capable of handling both homogeneous
and heterogeneous data, while retaining the benefits of traditional CNN
successes.
Recently, researchers have proposed variations of CNNs that can handle graph
data. In an effort to create learnable filter banks of graphs, these methods
either induce constraints on the data or require preprocessing. As opposed to
spectral methods, our framework, which we term Graph-CNNs, defines filters as
polynomials of functions of the graph adjacency matrix. Graph-CNNs can handle
both heterogeneous and homogeneous graph data, including graphs having entirely
different vertex or edge sets. We perform experiments to validate the
applicability of Graph-CNNs to a variety of structured and unstructured
classification problems and demonstrate state-of-the-art results on document
and molecule classification problems
Learned Features are better for Ethnicity Classification
Ethnicity is a key demographic attribute of human beings and it plays a vital
role in automatic facial recognition and have extensive real world applications
such as Human Computer Interaction (HCI); demographic based classification;
biometric based recognition; security and defense to name a few. In this paper
we present a novel approach for extracting ethnicity from the facial images.
The proposed method makes use of a pre trained Convolutional Neural Network
(CNN) to extract the features and then Support Vector Machine (SVM) with linear
kernel is used as a classifier. This technique uses translational invariant
hierarchical features learned by the network, in contrast to previous works,
which use hand crafted features such as Local Binary Pattern (LBP); Gabor etc.
Thorough experiments are presented on ten different facial databases which
strongly suggest that our approach is robust to different expressions and
illuminations conditions. Here we consider ethnicity classification as a three
class problem including Asian, African-American and Caucasian. Average
classification accuracy over all databases is 98.28%, 99.66% and 99.05% for
Asian, African-American and Caucasian respectively.Comment: 15 pages, 8 figures, 2 tables, code and framework available on
reques
Attended End-to-end Architecture for Age Estimation from Facial Expression Videos
The main challenges of age estimation from facial expression videos lie not
only in the modeling of the static facial appearance, but also in the capturing
of the temporal facial dynamics. Traditional techniques to this problem focus
on constructing handcrafted features to explore the discriminative information
contained in facial appearance and dynamics separately. This relies on
sophisticated feature-refinement and framework-design. In this paper, we
present an end-to-end architecture for age estimation, called Spatially-Indexed
Attention Model (SIAM), which is able to simultaneously learn both the
appearance and dynamics of age from raw videos of facial expressions.
Specifically, we employ convolutional neural networks to extract effective
latent appearance representations and feed them into recurrent networks to
model the temporal dynamics. More importantly, we propose to leverage attention
models for salience detection in both the spatial domain for each single image
and the temporal domain for the whole video as well. We design a specific
spatially-indexed attention mechanism among the convolutional layers to extract
the salient facial regions in each individual image, and a temporal attention
layer to assign attention weights to each frame. This two-pronged approach not
only improves the performance by allowing the model to focus on informative
frames and facial areas, but it also offers an interpretable correspondence
between the spatial facial regions as well as temporal frames, and the task of
age estimation. We demonstrate the strong performance of our model in
experiments on a large, gender-balanced database with 400 subjects with ages
spanning from 8 to 76 years. Experiments reveal that our model exhibits
significant superiority over the state-of-the-art methods given sufficient
training data.Comment: Accepted by Transactions on Image Processing (TIP
Towards Distortion-Predictable Embedding of Neural Networks
Current research in Computer Vision has shown that Convolutional Neural
Networks (CNN) give state-of-the-art performance in many classification tasks
and Computer Vision problems. The embedding of CNN, which is the internal
representation produced by the last layer, can indirectly learn topological and
relational properties. Moreover, by using a suitable loss function, CNN models
can learn invariance to a wide range of non-linear distortions such as
rotation, viewpoint angle or lighting condition. In this work, new insights are
discovered about CNN embeddings and a new loss function is proposed, derived
from the contrastive loss, that creates models with more predicable mappings
and also quantifies distortions. In typical distortion-dependent methods, there
is no simple relation between the features corresponding to one image and the
features of this image distorted. Therefore, these methods require to
feed-forward inputs under every distortions in order to find the corresponding
features representations. Our contribution makes a step towards embeddings
where features of distorted inputs are related and can be derived from each
others by the intensity of the distortion.Comment: 54 pages, 28 figures. Master project at EPFL (Switzerland) in 2015.
For source code on GitHub, see https://github.com/axel-angel/master-projec
- …