92,624 research outputs found

    Subspace Representations for Robust Face and Facial Expression Recognition

    Get PDF
    Analyzing human faces and modeling their variations have always been of interest to the computer vision community. Face analysis based on 2D intensity images is a challenging problem, complicated by variations in pose, lighting, blur, and non-rigid facial deformations due to facial expressions. Among the different sources of variation, facial expressions are of interest as important channels of non-verbal communication. Facial expression analysis is also affected by changes in view-point and inter-subject variations in performing different expressions. This dissertation makes an attempt to address some of the challenges involved in developing robust algorithms for face and facial expression recognition by exploiting the idea of proper subspace representations for data. Variations in the visual appearance of an object mostly arise due to changes in illumination and pose. So we first present a video-based sequential algorithm for estimating the face albedo as an illumination-insensitive signature for face recognition. We show that by knowing/estimating the pose of the face at each frame of a sequence, the albedo can be efficiently estimated using a Kalman filter. Then we extend this to the case of unknown pose by simultaneously tracking the pose as well as updating the albedo through an efficient Bayesian inference method performed using a Rao-Blackwellized particle filter. Since understanding the effects of blur, especially motion blur, is an important problem in unconstrained visual analysis, we then propose a blur-robust recognition algorithm for faces with spatially varying blur. We model a blurred face as a weighted average of geometrically transformed instances of its clean face. We then build a matrix, for each gallery face, whose column space spans the space of all the motion blurred images obtained from the clean face. This matrix representation is then used to define a proper objective function and perform blur-robust face recognition. To develop robust and generalizable models for expression analysis one needs to break the dependence of the models on the choice of the coordinate frame of the camera. To this end, we build models for expressions on the affine shape-space (Grassmann manifold), as an approximation to the projective shape-space, by using a Riemannian interpretation of deformations that facial expressions cause on different parts of the face. This representation enables us to perform various expression analysis and recognition algorithms without the need for pose normalization as a preprocessing step. There is a large degree of inter-subject variations in performing various expressions. This poses an important challenge on developing robust facial expression recognition algorithms. To address this challenge, we propose a dictionary-based approach for facial expression analysis by decomposing expressions in terms of action units (AUs). First, we construct an AU-dictionary using domain experts' knowledge of AUs. To incorporate the high-level knowledge regarding expression decomposition and AUs, we then perform structure-preserving sparse coding by imposing two layers of grouping over AU-dictionary atoms as well as over the test image matrix columns. We use the computed sparse code matrix for each expressive face to perform expression decomposition and recognition. Most of the existing methods for the recognition of faces and expressions consider either the expression-invariant face recognition problem or the identity-independent facial expression recognition problem. We propose joint face and facial expression recognition using a dictionary-based component separation algorithm (DCS). In this approach, the given expressive face is viewed as a superposition of a neutral face component with a facial expression component, which is sparse with respect to the whole image. This assumption leads to a dictionary-based component separation algorithm, which benefits from the idea of sparsity and morphological diversity. The DCS algorithm uses the data-driven dictionaries to decompose an expressive test face into its constituent components. The sparse codes we obtain as a result of this decomposition are then used for joint face and expression recognition

    Face recognition and computer graphics for modelling expressive faces in 3D

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2007.Includes bibliographical references (leaves 47-48).This thesis addresses the problem of the lack of verisimilitude in animation. Since computer vision has been aimed at creating photo-realistic representations of environments and face recognition creates replicas of faces for recognition purposes, we research face recognition techniques to produce photo-realistic models of expressive faces that could be further developed and applied in animation. We use two methods that are commonly used in face recognition to gather information about the subject: 3D scanners and multiple 2D images. For the latter method, Maya is used for modeling. Both methods produced accurate 3D models for a neutral face, but Maya allowed us to manually build 3D models and was therefore more successful in creating exaggerated facial expressions.by Tufool Al-Nuaimi.M.Eng

    Human Centric Facial Expression Recognition

    Get PDF
    Facial expression recognition (FER) is an area of active research, both in computer science and in behavioural science. Across these domains there is evidence to suggest that humans and machines find it easier to recognise certain emotions, for example happiness, in comparison to others. Recent behavioural studies have explored human perceptions of emotion further, by evaluating the relative contribution of features in the face when evaluating human sensitivity to emotion. It has been identified that certain facial regions have more salient features for certain expressions of emotion, especially when emotions are subtle in nature. For example, it is easier to detect fearful expressions when the eyes are expressive. Using this observation as a starting point for analysis, we similarly examine the effectiveness with which knowledge of facial feature saliency may be integrated into current approaches to automated FER. Specifically, we compare and evaluate the accuracy of ‘full-face’ versus upper and lower facial area convolutional neural network (CNN) modelling for emotion recognition in static images, and propose a human centric CNN hierarchy which uses regional image inputs to leverage current understanding of how humans recognise emotions across the face. Evaluations using the CK+ dataset demonstrate that our hierarchy can enhance classification accuracy in comparison to individual CNN architectures, achieving overall true positive classification in 93.3% of cases

    Facial Motion Prior Networks for Facial Expression Recognition

    Full text link
    Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years. Most of the existing deep learning based FER methods do not consider domain knowledge well, which thereby fail to extract representative features. In this work, we propose a novel FER framework, named Facial Motion Prior Networks (FMPN). Particularly, we introduce an addition branch to generate a facial mask so as to focus on facial muscle moving regions. To guide the facial mask learning, we propose to incorporate prior domain knowledge by using the average differences between neutral faces and the corresponding expressive faces as the training guidance. Extensive experiments on three facial expression benchmark datasets demonstrate the effectiveness of the proposed method, compared with the state-of-the-art approaches.Comment: VCIP 2019, Oral. Code is available at https://github.com/donydchen/FMPN-FE

    A Few Days of A Robot's Life in the Human's World: Toward Incremental Individual Recognition

    Get PDF
    PhD thesisThis thesis presents an integrated framework and implementation for Mertz, an expressive robotic creature for exploring the task of face recognition through natural interaction in an incremental and unsupervised fashion. The goal of this thesis is to advance toward a framework which would allow robots to incrementally ``get to know'' a set of familiar individuals in a natural and extendable way. This thesis is motivated by the increasingly popular goal of integrating robots in the home. In order to be effective in human-centric tasks, the robots must be able to not only recognize each family member, but also to learn about the roles of various people in the household.In this thesis, we focus on two particular limitations of the current technology. Firstly, most of face recognition research concentrate on the supervised classification problem. Currently, one of the biggest problems in face recognition is how to generalize the system to be able to recognize new test data that vary from the training data. Thus, until this problem is solved completely, the existing supervised approaches may require multiple manual introduction and labelling sessions to include training data with enough variations. Secondly, there is typically a large gap between research prototypes and commercial products, largely due to lack of robustness and scalability to different environmental settings.In this thesis, we propose an unsupervised approach which wouldallow for a more adaptive system which can incrementally update thetraining set with more recent data or new individuals over time.Moreover, it gives the robots a more natural {\em socialrecognition} mechanism to learn not only to recognize each person'sappearance, but also to remember some relevant contextualinformation that the robot observed during previous interactionsessions. Therefore, this thesis focuses on integrating anunsupervised and incremental face recognition system within aphysical robot which interfaces directly with humans through naturalsocial interaction. The robot autonomously detects, tracks, andsegments face images during these interactions and automaticallygenerates a training set for its face recognition system. Moreover,in order to motivate robust solutions and address scalabilityissues, we chose to put the robot, Mertz, in unstructured publicenvironments to interact with naive passersby, instead of with onlythe researchers within the laboratory environment.While an unsupervised and incremental face recognition system is acrucial element toward our target goal, it is only a part of thestory. A face recognition system typically receives eitherpre-recorded face images or a streaming video from a static camera.As illustrated an ACLU review of a commercial face recognitioninstallation, a security application which interfaces with thelatter is already very challenging. In this case, our target goalis a robot that can recognize people in a home setting. Theinterface between robots and humans is even more dynamic. Both therobots and the humans move around.We present the robot implementation and its unsupervised incremental face recognition framework. We describe analgorithm for clustering local features extracted from a large set of automatically generated face data. We demonstrate the robot's capabilities and limitations in a series of experiments at a public lobby. In a final experiment, the robot interacted with a few hundred individuals in an eight day period and generated a training set of over a hundred thousand face images. We evaluate the clustering algorithm performance across a range of parameters on this automatically generated training data and also the Honda-UCSD video face database. Lastly, we present some recognition results using the self-labelled clusters

    End-to-end 3D face reconstruction with deep neural networks

    Full text link
    Monocular 3D facial shape reconstruction from a single 2D facial image has been an active research area due to its wide applications. Inspired by the success of deep neural networks (DNN), we propose a DNN-based approach for End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different from recent works that reconstruct and refine the 3D face in an iterative manner using both an RGB image and an initial 3D facial shape rendering, our DNN model is end-to-end, and thus the complicated 3D rendering process can be avoided. Moreover, we integrate in the DNN architecture two components, namely a multi-task loss function and a fusion convolutional neural network (CNN) to improve facial expression reconstruction. With the multi-task loss function, 3D face reconstruction is divided into neutral 3D facial shape reconstruction and expressive 3D facial shape reconstruction. The neutral 3D facial shape is class-specific. Therefore, higher layer features are useful. In comparison, the expressive 3D facial shape favors lower or intermediate layer features. With the fusion-CNN, features from different intermediate layers are fused and transformed for predicting the 3D expressive facial shape. Through extensive experiments, we demonstrate the superiority of our end-to-end framework in improving the accuracy of 3D face reconstruction.Comment: Accepted to CVPR1
    • …
    corecore