12 research outputs found

    Discriminant Multi-Label Manifold Embedding for Facial Action Unit Detection

    Get PDF
    This article describes a system for participation in the Facial Expression Recognition and Analysis (FERA2015) sub-challenge for spontaneous action unit occurrence detection. The problem of AU detection is a multi-label classification problem by its nature, which is a fact overseen by most existing work. The correlation information between AUs has the potential of increasing the detection accuracy.We investigate the multi-label AU detection problem by embedding the data on low dimensional manifolds which preserve multi-label correlation. For this, we apply the multi-label Discriminant Laplacian Embedding (DLE) method as an extension to our base system. The system uses SIFT features around a set of facial landmarks that is enhanced with the use of additional non-salient points around transient facial features. Both the base system and the DLE extension show better performance than the challenge baseline results for the two databases in the challenge, and achieve close to 50% as F1-measure on the testing partition in average (9.9% higher than the baseline, in the best case). The DLE extension proves useful for certain AUs, but also shows the need for more analysis to assess the benefits in general

    Island Loss for Learning Discriminative Features in Facial Expression Recognition

    Full text link
    Over the past few years, Convolutional Neural Networks (CNNs) have shown promise on facial expression recognition. However, the performance degrades dramatically under real-world settings due to variations introduced by subtle facial appearance changes, head pose variations, illumination changes, and occlusions. In this paper, a novel island loss is proposed to enhance the discriminative power of the deeply learned features. Specifically, the IL is designed to reduce the intra-class variations while enlarging the inter-class differences simultaneously. Experimental results on four benchmark expression databases have demonstrated that the CNN with the proposed island loss (IL-CNN) outperforms the baseline CNN models with either traditional softmax loss or the center loss and achieves comparable or better performance compared with the state-of-the-art methods for facial expression recognition.Comment: 8 pages, 3 figure

    Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition

    Full text link
    Recognizing facial action units (AUs) during spontaneous facial displays is a challenging problem. Most recently, Convolutional Neural Networks (CNNs) have shown promise for facial AU recognition, where predefined and fixed convolution filter sizes are employed. In order to achieve the best performance, the optimal filter size is often empirically found by conducting extensive experimental validation. Such a training process suffers from expensive training cost, especially as the network becomes deeper. This paper proposes a novel Optimized Filter Size CNN (OFS-CNN), where the filter sizes and weights of all convolutional layers are learned simultaneously from the training data along with learning convolution filters. Specifically, the filter size is defined as a continuous variable, which is optimized by minimizing the training loss. Experimental results on two AU-coded spontaneous databases have shown that the proposed OFS-CNN is capable of estimating optimal filter size for varying image resolution and outperforms traditional CNNs with the best filter size obtained by exhaustive search. The OFS-CNN also beats the CNN using multiple filter sizes and more importantly, is much more efficient during testing with the proposed forward-backward propagation algorithm

    Redes neurais convolucionais para análise de expressões faciais

    Get PDF
    Orientador: Luciano SilvaCoorientadora: Olga R. P. BellonDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 31/08/2018Inclui referências: p.40-43Área de concentração: Ciência da ComputaçãoResumo: Este trabalho propõe uma rede neural convolucional (CNN) para efetuar a detecção e estimativa de intensidade de Action Units (AUs), de forma simultânea, em imagens de faces em poses arbitrárias. Na literatura existem vários métodos para detectar e estimar intensidades de AUs, entretanto, poucos lidam com as variações na pose e levam em consideração a correlação entre os AUs e as intensidades. Ainda, ao considerar a inferência conjunta surge o problema de desequilíbrio entre a quantidade de anotações para cada classe, o que dificulta o processo de otimização e generalização. Porém, é necessário lidar com essas restrições para que esses métodos possam ser utilizados em ambientes não controlados. Outro detalhe que dificulta a generalização para esses ambientes é a falta de bases de imagens anotadas. Nesse caso, é possível estender bases com modelos 3D para gerar poses arbitrárias de forma sintética assim como feito no Facial Expression Analysis and Recognition Challenge (FERA) 2017. Portanto, utilizando uma base de poses sintéticas, este trabalho propõe um modelo baseado em uma CNN, chamado AUMPNet, e aprendizado multi-tarefa para detectar e estimar a intensidade de AUs. Além do modelo para inferência conjunta, também é demonstrada uma abordagem para diminuir o desequilíbrio entre as intensidades dos AUs durante a otimização. O desempenho do modelo proposto, utilizando as bases do FERA 2015 e FERA 2017, é similar ao estado-da-arte, sendo superior para algumas AUs individualmente. Palavras-chave: análise de expressões faciais, visão computacional, redes neurais convolucionais.Abstract: This work presents a convolutional neural network (CNN) for joint Action Unit (AU) detection and intensity estimation on images of face in arbitrary head poses. There are a variety of approaches for AU detection and intensity estimation, however, few of them take into account head pose variations and the correlations among AUs and their intensities. Still, the problem of class imbalance appears when considering the joint inference of AUs, making optimization and generalization harder. Though, it is required to cope with these constraints in order to apply these methods in unconstrained environments. Another difficulty is the lack of labelled images in these conditions. In this case, it is possible to extend existing databases of 3D models to produce synthetic images in arbitrary head poses as in Facial Expression Recognition and Analysis Challenge (FERA) 2017. Thus, by using this database of synthetic head poses this work proposes a multi-task CNN based model, called AUMPNet, to detect AUs and estimate their intensity. Moreover, an approach to handle class imbalance among AUs during optimization is shown. The proposed model, when applied on the FERA 2015 and FERA 2017 databases, achieves average results comparable to the state-of-the-art, and surpasses them for some AUs individually. Keywords: facial expression analysis, computer vision, convolutional neural networks

    Improving Facial Action Unit Recognition Using Convolutional Neural Networks

    Get PDF
    Recognizing facial action units (AUs) from spontaneous facial expression is a challenging problem, because of subtle facial appearance changes, free head movements, occlusions, and limited AU-coded training data. Most recently, convolutional neural networks (CNNs) have shown promise on facial AU recognition. However, CNNs are often overfitted and do not generalize well to unseen subject due to limited AU-coded training images. In order to improve the performance of facial AU recognition, we developed two novel CNN frameworks, by substituting the traditional decision layer and convolutional layer with the incremental boosting layer and adaptive convolutional layer respectively, to recognize the AUs from static image. First, in order to handle the limited AU-coded training data and reduce the overfitting, we proposed a novel Incremental Boosting CNN (IB-CNN) to integrate boosting into the CNN via an incremental boosting layer that selects discriminative neurons from the lower layer and is incrementally updated on successive mini-batches. In addition, a novel loss function that accounts for errors from both the incremental boosted classifier and individual weak classifiers was proposed to fine-tune the IBCNN. Experimental results on four benchmark AU databases have demonstrated that the IB-CNN yields significant improvement over the traditional CNN and the boosting CNN without incremental learning, as well as outperforming the state-of-the-art CNN-based methods in AU recognition. The improvement is more impressive for the AUs that have the lowest frequencies in the databases. Second, all current CNNs use predefined and fixed convolutional filter size. However, AUs activated by different facial muscles cause facial appearance changes at different scales and thus favor different filter sizes. The traditional strategy is to experimentally select the best filter size for each AU in each convolutional layer, but it suffers from expensive training cost, especially when the networks become deeper and deeper. We proposed a novel Optimized Filter Size CNN (OFS-CNN), where the filter sizes and weights of all convolutional layers are learned simultaneously from the training data along with learning convolutional filters. Specifically, the filter size is defined as a continuous variable, which is optimized by minimizing the training loss. Experimental results on four AU-coded databases and one spontaneous facial expression database outperforms traditional CNNs with fixed filter sizes and achieves state-of-the-art recognition performance. Furthermore, the OFS-CNN also beats traditional CNNs using the best filter size obtained by exhaustive search and is capable of estimating optimal filter size for varying image resolution
    corecore