800 research outputs found

    Subspace Representations for Robust Face and Facial Expression Recognition

    Get PDF
    Analyzing human faces and modeling their variations have always been of interest to the computer vision community. Face analysis based on 2D intensity images is a challenging problem, complicated by variations in pose, lighting, blur, and non-rigid facial deformations due to facial expressions. Among the different sources of variation, facial expressions are of interest as important channels of non-verbal communication. Facial expression analysis is also affected by changes in view-point and inter-subject variations in performing different expressions. This dissertation makes an attempt to address some of the challenges involved in developing robust algorithms for face and facial expression recognition by exploiting the idea of proper subspace representations for data. Variations in the visual appearance of an object mostly arise due to changes in illumination and pose. So we first present a video-based sequential algorithm for estimating the face albedo as an illumination-insensitive signature for face recognition. We show that by knowing/estimating the pose of the face at each frame of a sequence, the albedo can be efficiently estimated using a Kalman filter. Then we extend this to the case of unknown pose by simultaneously tracking the pose as well as updating the albedo through an efficient Bayesian inference method performed using a Rao-Blackwellized particle filter. Since understanding the effects of blur, especially motion blur, is an important problem in unconstrained visual analysis, we then propose a blur-robust recognition algorithm for faces with spatially varying blur. We model a blurred face as a weighted average of geometrically transformed instances of its clean face. We then build a matrix, for each gallery face, whose column space spans the space of all the motion blurred images obtained from the clean face. This matrix representation is then used to define a proper objective function and perform blur-robust face recognition. To develop robust and generalizable models for expression analysis one needs to break the dependence of the models on the choice of the coordinate frame of the camera. To this end, we build models for expressions on the affine shape-space (Grassmann manifold), as an approximation to the projective shape-space, by using a Riemannian interpretation of deformations that facial expressions cause on different parts of the face. This representation enables us to perform various expression analysis and recognition algorithms without the need for pose normalization as a preprocessing step. There is a large degree of inter-subject variations in performing various expressions. This poses an important challenge on developing robust facial expression recognition algorithms. To address this challenge, we propose a dictionary-based approach for facial expression analysis by decomposing expressions in terms of action units (AUs). First, we construct an AU-dictionary using domain experts' knowledge of AUs. To incorporate the high-level knowledge regarding expression decomposition and AUs, we then perform structure-preserving sparse coding by imposing two layers of grouping over AU-dictionary atoms as well as over the test image matrix columns. We use the computed sparse code matrix for each expressive face to perform expression decomposition and recognition. Most of the existing methods for the recognition of faces and expressions consider either the expression-invariant face recognition problem or the identity-independent facial expression recognition problem. We propose joint face and facial expression recognition using a dictionary-based component separation algorithm (DCS). In this approach, the given expressive face is viewed as a superposition of a neutral face component with a facial expression component, which is sparse with respect to the whole image. This assumption leads to a dictionary-based component separation algorithm, which benefits from the idea of sparsity and morphological diversity. The DCS algorithm uses the data-driven dictionaries to decompose an expressive test face into its constituent components. The sparse codes we obtain as a result of this decomposition are then used for joint face and expression recognition

    Robust subspace learning for static and dynamic affect and behaviour modelling

    Get PDF
    Machine analysis of human affect and behavior in naturalistic contexts has witnessed a growing attention in the last decade from various disciplines ranging from social and cognitive sciences to machine learning and computer vision. Endowing machines with the ability to seamlessly detect, analyze, model, predict as well as simulate and synthesize manifestations of internal emotional and behavioral states in real-world data is deemed essential for the deployment of next-generation, emotionally- and socially-competent human-centered interfaces. In this thesis, we are primarily motivated by the problem of modeling, recognizing and predicting spontaneous expressions of non-verbal human affect and behavior manifested through either low-level facial attributes in static images or high-level semantic events in image sequences. Both visual data and annotations of naturalistic affect and behavior naturally contain noisy measurements of unbounded magnitude at random locations, commonly referred to as ‘outliers’. We present here machine learning methods that are robust to such gross, sparse noise. First, we deal with static analysis of face images, viewing the latter as a superposition of mutually-incoherent, low-complexity components corresponding to facial attributes, such as facial identity, expressions and activation of atomic facial muscle actions. We develop a robust, discriminant dictionary learning framework to extract these components from grossly corrupted training data and combine it with sparse representation to recognize the associated attributes. We demonstrate that our framework can jointly address interrelated classification tasks such as face and facial expression recognition. Inspired by the well-documented importance of the temporal aspect in perceiving affect and behavior, we direct the bulk of our research efforts into continuous-time modeling of dimensional affect and social behavior. Having identified a gap in the literature which is the lack of data containing annotations of social attitudes in continuous time and scale, we first curate a new audio-visual database of multi-party conversations from political debates annotated frame-by-frame in terms of real-valued conflict intensity and use it to conduct the first study on continuous-time conflict intensity estimation. Our experimental findings corroborate previous evidence indicating the inability of existing classifiers in capturing the hidden temporal structures of affective and behavioral displays. We present here a novel dynamic behavior analysis framework which models temporal dynamics in an explicit way, based on the natural assumption that continuous- time annotations of smoothly-varying affect or behavior can be viewed as outputs of a low-complexity linear dynamical system when behavioral cues (features) act as system inputs. A novel robust structured rank minimization framework is proposed to estimate the system parameters in the presence of gross corruptions and partially missing data. Experiments on prediction of dimensional conflict and affect as well as multi-object tracking from detection validate the effectiveness of our predictive framework and demonstrate that for the first time that complex human behavior and affect can be learned and predicted based on small training sets of person(s)-specific observations.Open Acces

    Gaussian processes for modeling of facial expressions

    Get PDF
    Automated analysis of facial expressions has been gaining significant attention over the past years. This stems from the fact that it constitutes the primal step toward developing some of the next-generation computer technologies that can make an impact in many domains, ranging from medical imaging and health assessment to marketing and education. No matter the target application, the need to deploy systems under demanding, real-world conditions that can generalize well across the population is urgent. Hence, careful consideration of numerous factors has to be taken prior to designing such a system. The work presented in this thesis focuses on tackling two important problems in automated analysis of facial expressions: (i) view-invariant facial expression analysis; (ii) modeling of the structural patterns in the face, in terms of well coordinated facial muscle movements. Driven by the necessity for efficient and accurate inference mechanisms we explore machine learning techniques based on the probabilistic framework of Gaussian processes (GPs). Our ultimate goal is to design powerful models that can efficiently handle imagery with spontaneously displayed facial expressions, and explain in detail the complex configurations behind the human face in real-world situations. To effectively decouple the head pose and expression in the presence of large out-of-plane head rotations we introduce a manifold learning approach based on multi-view learning strategies. Contrary to the majority of existing methods that typically treat the numerous poses as individual problems, in this model we first learn a discriminative manifold shared by multiple views of a facial expression. Subsequently, we perform facial expression classification in the expression manifold. Hence, the pose normalization problem is solved by aligning the facial expressions from different poses in a common latent space. We demonstrate that the recovered manifold can efficiently generalize to various poses and expressions even from a small amount of training data, while also being largely robust to corrupted image features due to illumination variations. State-of-the-art performance is achieved in the task of facial expression classification of basic emotions. The methods that we propose for learning the structure in the configuration of the muscle movements represent some of the first attempts in the field of analysis and intensity estimation of facial expressions. In these models, we extend our multi-view approach to exploit relationships not only in the input features but also in the multi-output labels. The structure of the outputs is imposed into the recovered manifold either from heuristically defined hard constraints, or in an auto-encoded manner, where the structure is learned automatically from the input data. The resulting models are proven to be robust to data with imbalanced expression categories, due to our proposed Bayesian learning of the target manifold. We also propose a novel regression approach based on product of GP experts where we take into account people's individual expressiveness in order to adapt the learned models on each subject. We demonstrate the superior performance of our proposed models on the task of facial expression recognition and intensity estimation.Open Acces

    Automatic analysis of facial actions: a survey

    Get PDF
    As one of the most comprehensive and objective ways to describe facial expressions, the Facial Action Coding System (FACS) has recently received significant attention. Over the past 30 years, extensive research has been conducted by psychologists and neuroscientists on various aspects of facial expression analysis using FACS. Automating FACS coding would make this research faster and more widely applicable, opening up new avenues to understanding how we communicate through facial expressions. Such an automated process can also potentially increase the reliability, precision and temporal resolution of coding. This paper provides a comprehensive survey of research into machine analysis of facial actions. We systematically review all components of such systems: pre-processing, feature extraction and machine coding of facial actions. In addition, the existing FACS-coded facial expression databases are summarised. Finally, challenges that have to be addressed to make automatic facial action analysis applicable in real-life situations are extensively discussed. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the future of machine recognition of facial actions: what are the challenges and opportunities that researchers in the field face

    Facial Expression Analysis under Partial Occlusion: A Survey

    Full text link
    Automatic machine-based Facial Expression Analysis (FEA) has made substantial progress in the past few decades driven by its importance for applications in psychology, security, health, entertainment and human computer interaction. The vast majority of completed FEA studies are based on non-occluded faces collected in a controlled laboratory environment. Automatic expression recognition tolerant to partial occlusion remains less understood, particularly in real-world scenarios. In recent years, efforts investigating techniques to handle partial occlusion for FEA have seen an increase. The context is right for a comprehensive perspective of these developments and the state of the art from this perspective. This survey provides such a comprehensive review of recent advances in dataset creation, algorithm development, and investigations of the effects of occlusion critical for robust performance in FEA systems. It outlines existing challenges in overcoming partial occlusion and discusses possible opportunities in advancing the technology. To the best of our knowledge, it is the first FEA survey dedicated to occlusion and aimed at promoting better informed and benchmarked future work.Comment: Authors pre-print of the article accepted for publication in ACM Computing Surveys (accepted on 02-Nov-2017

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras
    • …
    corecore