6 research outputs found

    Improving Person-Independent Facial Expression Recognition Using Deep Learning

    Get PDF
    Over the past few years, deep learning, e.g., Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs), have shown promise on facial expression recog- nition. However, the performance degrades dramatically especially in close-to-real-world settings due to high intra-class variations and high inter-class similarities introduced by subtle facial appearance changes, head pose variations, illumination changes, occlusions, and identity-related attributes, e.g., age, race, and gender. In this work, we developed two novel CNN frameworks and one novel GAN approach to learn discriminative features for facial expression recognition. First, a novel island loss is proposed to enhance the discriminative power of learned deep features. Specifically, the island loss is designed to reduce the intra-class variations while enlarging the inter-class differences simultaneously. Experimental results on three posed facial expression datasets and, more importantly, two spontaneous facial expression datasets have shown that the proposed island loss outperforms the baseline CNNs with the traditional softmax loss or the center loss and achieves better or at least comparable performance compared with the state-of-the-art methods. Second, we proposed a novel Probabilistic Attribute Tree-CNN (PAT-CNN) to explic- itly deal with the large intra-class variations caused by identity-related attributes. Specif- ically, a novel PAT module with an associated PAT loss was proposed to learn features in a hierarchical tree structure organized according to identity-related attributes, where the final features are less affected by the attributes. We further proposed a semi-supervised strategy to learn the PAT-CNN from limited attribute-annotated samples to make the best use of available data. Experimental results on three posed facial expression datasets as well as four spontaneous facial expression datasets have demonstrated that the proposed PAT- CNN achieves the best performance compared with state-of-the-art methods by explicitly modeling attributes. Impressively, the PAT-CNN using a single model achieves the best performance on the SFEW test dataset, compared with the state-of-the-art methods using an ensemble of hundreds of CNNs. Last, we present a novel Identity-Free conditional Generative Adversarial Network (IF- GAN) to explicitly reduce high inter-subject variations caused by identity-related attributes, e.g., age, race, and gender, for facial expression recognition. Specifically, for any given in- put facial expression image, a conditional generative model was developed to transform it to an “average” identity expressive face with the same expression as the input face image. Since the generated images have the same synthetic “average” identity, they differ from each other only by the displayed expressions and thus can be used for identity-free facial expression classification. In this work, an end-to-end system was developed to perform facial expression generation and facial expression recognition in the IF-GAN framework. Experimental results on four well-known facial expression datasets including a sponta- neous facial expression dataset have demonstrated that the proposed IF-GAN outperforms the baseline CNN model and achieves the best performance compared with the state-of- the-art methods for facial expression recognition

    Automated Recognition of Facial Affect Using Deep Neural Networks

    Get PDF
    Automated Facial Expression Recognition (FER) has been a topic of study in the field of computer vision and machine learning for decades. In spite of efforts made to improve the accuracy of FER systems, existing methods still are not generalizable and accurate enough for use in real-world applications. Many of the traditional methods use hand-crafted (a.k.a. engineered) features for representation of facial images. However, these methods often require rigorous hyper-parameter tuning to achieve favorable results. Recently, Deep Neural Networks (DNNs) have shown to outperform traditional methods in visual object recognition. DNNs require huge data as well as powerful computing units for training generalizable and robust classification models. The problem of automated FER especially with images captured in the wild setting is even more challenging since there are subtle differences between various facial emotions. This dissertation presents the recent efforts I made in 1) creating a large annotated database of facial expressions, 2) developing novel DNN-based methods for automated recognition of facial expressions described by two main models of affect, the categorical model and the dimensional model, and 3) developing a robust face detection and emotion recognition system based on our state-of-the-art DNN and trained on our proposed database of facial expressions. Existing annotated databases of facial expressions in the wild are small and mostly cover discrete emotions (aka the categorical model). There are very limited annotated facial databases for affective computing in the continuous dimensional model (e.g., valence and arousal). To address these needs, we developed the largest database of human affect (called AffectNet). For AffectNet, we collected, annotated, and prepared for public distribution a new database of facial emotions in the wild. AffectNet contains more than 1,000,000 facial images from the Internet by querying three major search engines using 1250 emotion related keywords in six different languages. About half of the retrieved images were manually annotated for the presence of seven discrete facial expressions and the intensity of valence and arousal. AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models. This dissertation also presents three major and novel DNN-based methods for automated facial affect estimation. The methods are: 1) 3D Inception-ResNet (3DIR), 2) BReGNet, and 3) BReG-NeXt architectures. These methods modify the residual unit -proposed in the original ResNets- with different operations. Comprehensive experiments are conducted to evaluate the performance of each of the proposed methods as well as their efficiency using Affect and few other facial expression databases. Our final proposed method -BReG-NeXt- achieves state-of-the-art results in predicting both dimensional and categorical models of affect with significantly fewer training parameters and less number of FLOPs. Additionally, a robust face detection network is developed based on the BReG-NeXt architecture which leverages AffectNet’s diverse training data and BReG-NeXt’s efficient feature extraction powers
    corecore