45,454 research outputs found

    Spontaneous Facial Micro-Expression Recognition using 3D Spatiotemporal Convolutional Neural Networks

    Full text link
    Facial expression recognition in videos is an active area of research in computer vision. However, fake facial expressions are difficult to be recognized even by humans. On the other hand, facial micro-expressions generally represent the actual emotion of a person, as it is a spontaneous reaction expressed through human face. Despite of a few attempts made for recognizing micro-expressions, still the problem is far from being a solved problem, which is depicted by the poor rate of accuracy shown by the state-of-the-art methods. A few CNN based approaches are found in the literature to recognize micro-facial expressions from still images. Whereas, a spontaneous micro-expression video contains multiple frames that have to be processed together to encode both spatial and temporal information. This paper proposes two 3D-CNN methods: MicroExpSTCNN and MicroExpFuseNet, for spontaneous facial micro-expression recognition by exploiting the spatiotemporal information in CNN framework. The MicroExpSTCNN considers the full spatial information, whereas the MicroExpFuseNet is based on the 3D-CNN feature fusion of the eyes and mouth regions. The experiments are performed over CAS(ME)^2 and SMIC micro-expression databases. The proposed MicroExpSTCNN model outperforms the state-of-the-art methods.Comment: Accepted in 2019 International Joint Conference on Neural Networks (IJCNN

    Group Emotion Recognition Using Machine Learning

    Full text link
    Automatic facial emotion recognition is a challenging task that has gained significant scientific interest over the past few years, but the problem of emotion recognition for a group of people has been less extensively studied. However, it is slowly gaining popularity due to the massive amount of data available on social networking sites containing images of groups of people participating in various social events. Group emotion recognition is a challenging problem due to obstructions like head and body pose variations, occlusions, variable lighting conditions, variance of actors, varied indoor and outdoor settings and image quality. The objective of this task is to classify a group's perceived emotion as Positive, Neutral or Negative. In this report, we describe our solution which is a hybrid machine learning system that incorporates deep neural networks and Bayesian classifiers. Deep Convolutional Neural Networks (CNNs) work from bottom to top, analysing facial expressions expressed by individual faces extracted from the image. The Bayesian network works from top to bottom, inferring the global emotion for the image, by integrating the visual features of the contents of the image obtained through a scene descriptor. In the final pipeline, the group emotion category predicted by an ensemble of CNNs in the bottom-up module is passed as input to the Bayesian Network in the top-down module and an overall prediction for the image is obtained. Experimental results show that the stated system achieves 65.27% accuracy on the validation set which is in line with state-of-the-art results. As an outcome of this project, a Progressive Web Application and an accompanying Android app with a simple and intuitive user interface are presented, allowing users to test out the system with their own pictures

    Deep Facial Expression Recognition: A Survey

    Full text link
    With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems

    A Deep Learning Perspective on the Origin of Facial Expressions

    Full text link
    Facial expressions play a significant role in human communication and behavior. Psychologists have long studied the relationship between facial expressions and emotions. Paul Ekman et al., devised the Facial Action Coding System (FACS) to taxonomize human facial expressions and model their behavior. The ability to recognize facial expressions automatically, enables novel applications in fields like human-computer interaction, social gaming, and psychological research. There has been a tremendously active research in this field, with several recent papers utilizing convolutional neural networks (CNN) for feature extraction and inference. In this paper, we employ CNN understanding methods to study the relation between the features these computational networks are using, the FACS and Action Units (AU). We verify our findings on the Extended Cohn-Kanade (CK+), NovaEmotions and FER2013 datasets. We apply these models to various tasks and tests using transfer learning, including cross-dataset validation and cross-task performance. Finally, we exploit the nature of the FER based CNN models for the detection of micro-expressions and achieve state-of-the-art accuracy using a simple long-short-term-memory (LSTM) recurrent neural network (RNN)

    A Review of Modularization Techniques in Artificial Neural Networks

    Full text link
    Artificial neural networks (ANNs) have achieved significant success in tackling classical and modern machine learning problems. As learning problems grow in scale and complexity, and expand into multi-disciplinary territory, a more modular approach for scaling ANNs will be needed. Modular neural networks (MNNs) are neural networks that embody the concepts and principles of modularity. MNNs adopt a large number of different techniques for achieving modularization. Previous surveys of modularization techniques are relatively scarce in their systematic analysis of MNNs, focusing mostly on empirical comparisons and lacking an extensive taxonomical framework. In this review, we aim to establish a solid taxonomy that captures the essential properties and relationships of the different variants of MNNs. Based on an investigation of the different levels at which modularization techniques act, we attempt to provide a universal and systematic framework for theorists studying MNNs, also trying along the way to emphasise the strengths and weaknesses of different modularization approaches in order to highlight good practices for neural network practitioners.Comment: Artif Intell Rev (2019

    SensitiveNets: Learning Agnostic Representations with Application to Face Images

    Full text link
    This work proposes a novel privacy-preserving neural network feature representation to suppress the sensitive information of a learned space while maintaining the utility of the data. The new international regulation for personal data protection forces data controllers to guarantee privacy and avoid discriminative hazards while managing sensitive data of users. In our approach, privacy and discrimination are related to each other. Instead of existing approaches aimed directly at fairness improvement, the proposed feature representation enforces the privacy of selected attributes. This way fairness is not the objective, but the result of a privacy-preserving learning method. This approach guarantees that sensitive information cannot be exploited by any agent who process the output of the model, ensuring both privacy and equality of opportunity. Our method is based on an adversarial regularizer that introduces a sensitive information removal function in the learning objective. The method is evaluated on three different primary tasks (identity, attractiveness, and smiling) and three publicly available benchmarks. In addition, we present a new face annotation dataset with balanced distribution between genders and ethnic origins. The experiments demonstrate that it is possible to improve the privacy and equality of opportunity while retaining competitive performance independently of the task.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Robust Spatial Filtering with Graph Convolutional Neural Networks

    Full text link
    Convolutional Neural Networks (CNNs) have recently led to incredible breakthroughs on a variety of pattern recognition problems. Banks of finite impulse response filters are learned on a hierarchy of layers, each contributing more abstract information than the previous layer. The simplicity and elegance of the convolutional filtering process makes them perfect for structured problems such as image, video, or voice, where vertices are homogeneous in the sense of number, location, and strength of neighbors. The vast majority of classification problems, for example in the pharmaceutical, homeland security, and financial domains are unstructured. As these problems are formulated into unstructured graphs, the heterogeneity of these problems, such as number of vertices, number of connections per vertex, and edge strength, cannot be tackled with standard convolutional techniques. We propose a novel neural learning framework that is capable of handling both homogeneous and heterogeneous data, while retaining the benefits of traditional CNN successes. Recently, researchers have proposed variations of CNNs that can handle graph data. In an effort to create learnable filter banks of graphs, these methods either induce constraints on the data or require preprocessing. As opposed to spectral methods, our framework, which we term Graph-CNNs, defines filters as polynomials of functions of the graph adjacency matrix. Graph-CNNs can handle both heterogeneous and homogeneous graph data, including graphs having entirely different vertex or edge sets. We perform experiments to validate the applicability of Graph-CNNs to a variety of structured and unstructured classification problems and demonstrate state-of-the-art results on document and molecule classification problems

    Learned Features are better for Ethnicity Classification

    Full text link
    Ethnicity is a key demographic attribute of human beings and it plays a vital role in automatic facial recognition and have extensive real world applications such as Human Computer Interaction (HCI); demographic based classification; biometric based recognition; security and defense to name a few. In this paper we present a novel approach for extracting ethnicity from the facial images. The proposed method makes use of a pre trained Convolutional Neural Network (CNN) to extract the features and then Support Vector Machine (SVM) with linear kernel is used as a classifier. This technique uses translational invariant hierarchical features learned by the network, in contrast to previous works, which use hand crafted features such as Local Binary Pattern (LBP); Gabor etc. Thorough experiments are presented on ten different facial databases which strongly suggest that our approach is robust to different expressions and illuminations conditions. Here we consider ethnicity classification as a three class problem including Asian, African-American and Caucasian. Average classification accuracy over all databases is 98.28%, 99.66% and 99.05% for Asian, African-American and Caucasian respectively.Comment: 15 pages, 8 figures, 2 tables, code and framework available on reques

    Attended End-to-end Architecture for Age Estimation from Facial Expression Videos

    Full text link
    The main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-to-end architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.Comment: Accepted by Transactions on Image Processing (TIP

    Towards Distortion-Predictable Embedding of Neural Networks

    Full text link
    Current research in Computer Vision has shown that Convolutional Neural Networks (CNN) give state-of-the-art performance in many classification tasks and Computer Vision problems. The embedding of CNN, which is the internal representation produced by the last layer, can indirectly learn topological and relational properties. Moreover, by using a suitable loss function, CNN models can learn invariance to a wide range of non-linear distortions such as rotation, viewpoint angle or lighting condition. In this work, new insights are discovered about CNN embeddings and a new loss function is proposed, derived from the contrastive loss, that creates models with more predicable mappings and also quantifies distortions. In typical distortion-dependent methods, there is no simple relation between the features corresponding to one image and the features of this image distorted. Therefore, these methods require to feed-forward inputs under every distortions in order to find the corresponding features representations. Our contribution makes a step towards embeddings where features of distorted inputs are related and can be derived from each others by the intensity of the distortion.Comment: 54 pages, 28 figures. Master project at EPFL (Switzerland) in 2015. For source code on GitHub, see https://github.com/axel-angel/master-projec
    corecore