Search CORE

10 research outputs found

Automated Recognition of Facial Affect Using Deep Neural Networks

Author: Hasani Behzad
Publication venue: Digital Commons @ DU
Publication date: 01/01/2020
Field of study

Automated Facial Expression Recognition (FER) has been a topic of study in the field of computer vision and machine learning for decades. In spite of efforts made to improve the accuracy of FER systems, existing methods still are not generalizable and accurate enough for use in real-world applications. Many of the traditional methods use hand-crafted (a.k.a. engineered) features for representation of facial images. However, these methods often require rigorous hyper-parameter tuning to achieve favorable results. Recently, Deep Neural Networks (DNNs) have shown to outperform traditional methods in visual object recognition. DNNs require huge data as well as powerful computing units for training generalizable and robust classification models. The problem of automated FER especially with images captured in the wild setting is even more challenging since there are subtle differences between various facial emotions. This dissertation presents the recent efforts I made in 1) creating a large annotated database of facial expressions, 2) developing novel DNN-based methods for automated recognition of facial expressions described by two main models of affect, the categorical model and the dimensional model, and 3) developing a robust face detection and emotion recognition system based on our state-of-the-art DNN and trained on our proposed database of facial expressions. Existing annotated databases of facial expressions in the wild are small and mostly cover discrete emotions (aka the categorical model). There are very limited annotated facial databases for affective computing in the continuous dimensional model (e.g., valence and arousal). To address these needs, we developed the largest database of human affect (called AffectNet). For AffectNet, we collected, annotated, and prepared for public distribution a new database of facial emotions in the wild. AffectNet contains more than 1,000,000 facial images from the Internet by querying three major search engines using 1250 emotion related keywords in six different languages. About half of the retrieved images were manually annotated for the presence of seven discrete facial expressions and the intensity of valence and arousal. AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models. This dissertation also presents three major and novel DNN-based methods for automated facial affect estimation. The methods are: 1) 3D Inception-ResNet (3DIR), 2) BReGNet, and 3) BReG-NeXt architectures. These methods modify the residual unit -proposed in the original ResNets- with different operations. Comprehensive experiments are conducted to evaluate the performance of each of the proposed methods as well as their efficiency using Affect and few other facial expression databases. Our final proposed method -BReG-NeXt- achieves state-of-the-art results in predicting both dimensional and categorical models of affect with significantly fewer training parameters and less number of FLOPs. Additionally, a robust face detection network is developed based on the BReG-NeXt architecture which leverages AffectNet’s diverse training data and BReG-NeXt’s efficient feature extraction powers

University of Denver

Exploiting Emotional Dependencies with Graph Convolutional Networks for Facial Expression Recognition

Author: Antoniadis Panagiotis
Filntisis Panagiotis P.
Maragos Petros
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2021
Field of study

Over the past few years, deep learning methods have shown remarkable results in many face-related tasks including automatic facial expression recognition (FER) in-the-wild. Meanwhile, numerous models describing the human emotional states have been proposed by the psychology community. However, we have no clear evidence as to which representation is more appropriate and the majority of FER systems use either the categorical or the dimensional model of affect. Inspired by recent work in multi-label classification, this paper proposes a novel multi-task learning (MTL) framework that exploits the dependencies between these two models using a Graph Convolutional Network (GCN) to recognize facial expressions in-the-wild. Specifically, a shared feature representation is learned for both discrete and continuous recognition in a MTL setting. Moreover, the facial expression classifiers and the valence-arousal regressors are learned through a GCN that explicitly captures the dependencies between them. To evaluate the performance of our method under real-world conditions we perform extensive experiments on the AffectNet and Aff-Wild2 datasets. The results of our experiments show that our method is capable of improving the performance across different datasets and backbone architectures. Finally, we also surpass the previous state-of-the-art methods on the categorical model of AffectNet.Comment: 9 pages, 8 figures, 5 tables, revised submission to the 16th IEEE International Conference on Automatic Face and Gesture Recognitio

arXiv.org e-Print Archive

Deep Siamese Neural Networks for Facial Expression Recognition in the Wild

Author: Hayale Wassan
Publication venue: Digital Commons @ DU
Publication date: 01/01/2020
Field of study

The variation of facial images in the wild conditions due to head pose, face illumination, and occlusion can significantly affect the Facial Expression Recognition (FER) performance. Moreover, between subject variation introduced by age, gender, ethnic backgrounds, and identity can also influence the FER performance. This Ph.D. dissertation presents a novel algorithm for end-to-end facial expression recognition, valence and arousal estimation, and visual object matching based on deep Siamese Neural Networks to handle the extreme variation that exists in a facial dataset. In our main Siamese Neural Networks for facial expression recognition, the first network represents the classification framework, where we aim to achieve multi-class classification. The second network represents the verification framework, where we use pairwise similarity labels to map images to a feature space where similar inputs are close to each other, and dissimilar inputs are far from each other. Using Siamese architecture enabling us to obtain powerful discriminative features by taking full advantage of the training batches via our pairing strategy, and by dynamically transferring the learning from a local-adaptive verification space into a classification embedding space. These steps enable the algorithm to learn the state of the art features by optimizing the joint identification-verification embedding space. The verification model reduces the intra-class variation by minimizing the distance between the extracted features from the same identity using different strategies. In contrast, the identification model increases the inter-class variation by maximizing the distance between the features extracted from different classes. When a network is tuned carefully, we can rely on the powerful discriminative features to generalize the power of the network to unseen images. Further, we applied our proposed deep Siamese networks on two different challenging tasks in computer vision, valence and arousal estimation and visual object matching. The empirical results of the valence and arousal Siamese model demonstrate that transferring the learning from the classification space to the regression space enhances the regression task since each expression occupies a representation within a specified range of valence and arousal affect. On the other hand, Siamese model of visual object matching gives a better model performance since the classification framework helps to increase the inter-class variation in the verification framework. We evaluated the algorithm using state-of-the-art and challenging datasets such as AffectNet Mollahosseini et al. (2017), FERA2013 Goodfellow et al. (2013), categorical EmotioNet Du et al. (2014), and Cifar-100 Krizhevsky et al. (2009). To the best of our knowledge, this technique is the first to create a powerful recognition system by taking advantage of the features learned from different objective frameworks. We achieved comparable results with other deep learning models

University of Denver

GRATIS: Deep Learning Graph Representation with Task-specific Topology and Multi-dimensional Edge Features

Author: Gunes Hatice
Guo Zhijiang
Jia Xi
Kuzucu Selim
Luo Cheng
Shen Linlin
Song Siyang
Song Yuxin
Song Zhiyuan
Xie Weicheng
Publication venue
Publication date: 19/11/2022
Field of study

Graph is powerful for representing various types of real-world data. The topology (edges' presence) and edges' features of a graph decides the message passing mechanism among vertices within the graph. While most existing approaches only manually define a single-value edge to describe the connectivity or strength of association between a pair of vertices, task-specific and crucial relationship cues may be disregarded by such manually defined topology and single-value edge features. In this paper, we propose the first general graph representation learning framework (called GRATIS) which can generate a strong graph representation with a task-specific topology and task-specific multi-dimensional edge features from any arbitrary input. To learn each edge's presence and multi-dimensional feature, our framework takes both of the corresponding vertices pair and their global contextual information into consideration, enabling the generated graph representation to have a globally optimal message passing mechanism for different down-stream tasks. The principled investigation results achieved for various graph analysis tasks on 11 graph and non-graph datasets show that our GRATIS can not only largely enhance pre-defined graphs but also learns a strong graph representation for non-graph data, with clear performance improvements on all tasks. In particular, the learned topology and multi-dimensional edge features provide complementary task-related cues for graph analysis tasks. Our framework is effective, robust and flexible, and is a plug-and-play module that can be combined with different backbones and Graph Neural Networks (GNNs) to generate a task-specific graph representation from various graph and non-graph data. Our code is made publicly available at https://github.com/SSYSteve/Learning-Graph-Representation-with-Task-specific-Topology-and-Multi-dimensional-Edge-Features

arXiv.org e-Print Archive