409 research outputs found

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Model based methods for locating, enhancing and recognising low resolution objects in video

    Get PDF
    Visual perception is our most important sense which enables us to detect and recognise objects even in low detail video scenes. While humans are able to perform such object detection and recognition tasks reliably, most computer vision algorithms struggle with wide angle surveillance videos that make automatic processing difficult due to low resolution and poor detail objects. Additional problems arise from varying pose and lighting conditions as well as non-cooperative subjects. All these constraints pose problems for automatic scene interpretation of surveillance video, including object detection, tracking and object recognition.Therefore, the aim of this thesis is to detect, enhance and recognise objects by incorporating a priori information and by using model based approaches. Motivated by the increasing demand for automatic methods for object detection, enhancement and recognition in video surveillance, different aspects of the video processing task are investigated with a focus on human faces. In particular, the challenge of fully automatic face pose and shape estimation by fitting a deformable 3D generic face model under varying pose and lighting conditions is tackled. Principal Component Analysis (PCA) is utilised to build an appearance model that is then used within a particle filter based approach to fit the 3D face mask to the image. This recovers face pose and person-specific shape information simultaneously. Experiments demonstrate the use in different resolution and under varying pose and lighting conditions. Following that, a combined tracking and super resolution approach enhances the quality of poor detail video objects. A 3D object mask is subdivided such that every mask triangle is smaller than a pixel when projected into the image and then used for model based tracking. The mask subdivision then allows for super resolution of the object by combining several video frames. This approach achieves better results than traditional super resolution methods without the use of interpolation or deblurring.Lastly, object recognition is performed in two different ways. The first recognition method is applied to characters and used for license plate recognition. A novel character model is proposed to create different appearances which are then matched with the image of unknown characters for recognition. This allows for simultaneous character segmentation and recognition and high recognition rates are achieved for low resolution characters down to only five pixels in size. While this approach is only feasible for objects with a limited number of different appearances, like characters, the second recognition method is applicable to any object, including human faces. Therefore, a generic 3D face model is automatically fitted to an image of a human face and recognition is performed on a mask level rather than image level. This approach does not require an initial pose estimation nor the selection of feature points, the face alignment is provided implicitly by the mask fitting process

    Facial expression recognition in dynamic sequences: An integrated approach

    Get PDF
    Automatic facial expression analysis aims to analyse human facial expressions and classify them into discrete categories. Methods based on existing work are reliant on extracting information from video sequences and employ either some form of subjective thresholding of dynamic information or attempt to identify the particular individual frames in which the expected behaviour occurs. These methods are inefficient as they require either additional subjective information, tedious manual work or fail to take advantage of the information contained in the dynamic signature from facial movements for the task of expression recognition. In this paper, a novel framework is proposed for automatic facial expression analysis which extracts salient information from video sequences but does not rely on any subjective preprocessing or additional user-supplied information to select frames with peak expressions. The experimental framework demonstrates that the proposed method outperforms static expression recognition systems in terms of recognition rate. The approach does not rely on action units (AUs) and therefore, eliminates errors which are otherwise propagated to the final result due to incorrect initial identification of AUs. The proposed framework explores a parametric space of over 300 dimensions and is tested with six state-of-the-art machine learning techniques. Such robust and extensive experimentation provides an important foundation for the assessment of the performance for future work. A further contribution of the paper is offered in the form of a user study. This was conducted in order to investigate the correlation between human cognitive systems and the proposed framework for the understanding of human emotion classification and the reliability of public databases

    3d Face Reconstruction And Emotion Analytics With Part-Based Morphable Models

    Get PDF
    3D face reconstruction and facial expression analytics using 3D facial data are new and hot research topics in computer graphics and computer vision. In this proposal, we first review the background knowledge for emotion analytics using 3D morphable face model, including geometry feature-based methods, statistic model-based methods and more advanced deep learning-bade methods. Then, we introduce a novel 3D face modeling and reconstruction solution that robustly and accurately acquires 3D face models from a couple of images captured by a single smartphone camera. Two selfie photos of a subject taken from the front and side are used to guide our Non-Negative Matrix Factorization (NMF) induced part-based face model to iteratively reconstruct an initial 3D face of the subject. Then, an iterative detail updating method is applied to the initial generated 3D face to reconstruct facial details through optimizing lighting parameters and local depths. Our iterative 3D face reconstruction method permits fully automatic registration of a part-based face representation to the acquired face data and the detailed 2D/3D features to build a high-quality 3D face model. The NMF part-based face representation learned from a 3D face database facilitates effective global and adaptive local detail data fitting alternatively. Our system is flexible and it allows users to conduct the capture in any uncontrolled environment. We demonstrate the capability of our method by allowing users to capture and reconstruct their 3D faces by themselves. Based on the 3D face model reconstruction, we can analyze the facial expression and the related emotion in 3D space. We present a novel approach to analyze the facial expressions from images and a quantitative information visualization scheme for exploring this type of visual data. From the reconstructed result using NMF part-based morphable 3D face model, basis parameters and a displacement map are extracted as features for facial emotion analysis and visualization. Based upon the features, two Support Vector Regressions (SVRs) are trained to determine the fuzzy Valence-Arousal (VA) values to quantify the emotions. The continuously changing emotion status can be intuitively analyzed by visualizing the VA values in VA-space. Our emotion analysis and visualization system, based on 3D NMF morphable face model, detects expressions robustly from various head poses, face sizes and lighting conditions, and is fully automatic to compute the VA values from images or a sequence of video with various facial expressions. To evaluate our novel method, we test our system on publicly available databases and evaluate the emotion analysis and visualization results. We also apply our method to quantifying emotion changes during motivational interviews. These experiments and applications demonstrate effectiveness and accuracy of our method. In order to improve the expression recognition accuracy, we present a facial expression recognition approach with 3D Mesh Convolutional Neural Network (3DMCNN) and a visual analytics guided 3DMCNN design and optimization scheme. The geometric properties of the surface is computed using the 3D face model of a subject with facial expressions. Instead of using regular Convolutional Neural Network (CNN) to learn intensities of the facial images, we convolve the geometric properties on the surface of the 3D model using 3DMCNN. We design a geodesic distance-based convolution method to overcome the difficulties raised from the irregular sampling of the face surface mesh. We further present an interactive visual analytics for the purpose of designing and modifying the networks to analyze the learned features and cluster similar nodes in 3DMCNN. By removing low activity nodes in the network, the performance of the network is greatly improved. We compare our method with the regular CNN-based method by interactively visualizing each layer of the networks and analyze the effectiveness of our method by studying representative cases. Testing on public datasets, our method achieves a higher recognition accuracy than traditional image-based CNN and other 3D CNNs. The presented framework, including 3DMCNN and interactive visual analytics of the CNN, can be extended to other applications

    Facial Expression Recognition System

    Get PDF
    A key requirement for developing any innovative system in a computing environment is to integrate a sufficiently friendly interface with the average end user. Accurate design of such a user-centered interface, however, means more than just the ergonomics of the panels and displays. It also requires that designers precisely define what information to use and how, where, and when to use it. Facial expression as a natural, non-intrusive and efficient way of communication has been considered as one of the potential inputs of such interfaces. The work of this thesis aims at designing a robust Facial Expression Recognition (FER) system by combining various techniques from computer vision and pattern recognition. Expression recognition is closely related to face recognition where a lot of research has been done and a vast array of algorithms have been introduced. FER can also be considered as a special case of a pattern recognition problem and many techniques are available. In the designing of an FER system, we can take advantage of these resources and use existing algorithms as building blocks of our system. So a major part of this work is to determine the optimal combination of algorithms. To do this, we first divide the system into 3 modules, i.e. Preprocessing, Feature Extraction and Classification, then for each of them some candidate methods are implemented, and eventually the optimal configuration is found by comparing the performance of different combinations. Another issue that is of great interest to facial expression recognition systems designers is the classifier which is the core of the system. Conventional classification algorithms assume the image is a single variable function of a underlying class label. However this is not true in face recognition area where the appearance of the face is influenced by multiple factors: identity, expression, illumination and so on. To solve this problem, in this thesis we propose two new algorithms, namely Higher Order Canonical Correlation Analysis and Simple Multifactor Analysis which model the image as a multivariable function. The addressed issues are challenging problems and are substantial for developing a facial expression recognition system

    Objects extraction and recognition for camera-based interaction : heuristic and statistical approaches

    Get PDF
    In this thesis, heuristic and probabilistic methods are applied to a number of problems for camera-based interactions. The goal is to provide solutions for a vision based system that is able to extract and analyze interested objects in camera images and to use that information for various interactions for mobile usage. New methods and new attempts of combination of existing methods are developed for different applications, including text extraction from complex scene images, bar code reading performed by camera phones, and face/facial feature detection and facial expression manipulation. The application-driven problems of camera-based interaction can not be modeled by a uniform and straightforward model that has very strong simplifications of reality. The solutions we learned to be efficient were to apply heuristic but easy of implementation approaches at first to reduce the complexity of the problems and search for possible means, then use developed statistical learning approaches to deal with the remaining difficult but well-defined problems and get much better accuracy. The process can be evolved in some or all of the stages, and the combination of the approaches is problem-dependent. Contribution of this thesis resides in two aspects: firstly, new features and approaches are proposed either as heuristics or statistical means for concrete applications; secondly engineering design combining seveal methods for system optimization is studied. Geometrical characteristics and the alignment of text, texture features of bar codes, and structures of faces can all be extracted as heuristics for object extraction and further recognition. The boosting algorithm is one of the proper choices to perform probabilistic learning and to achieve desired accuracy. New feature selection techniques are proposed for constructing the weak learner and applying the boosting output in concrete applications. Subspace methods such as manifold learning algorithms are introduced and tailored for facial expression analysis and synthesis. A modified generalized learning vector quantization method is proposed to deal with the blurring of bar code images. Efficient implementations that combine the approaches in a rational joint point are presented and the results are illustrated.reviewe

    An Analysis of Facial Expression Recognition Techniques

    Get PDF
    In present era of technology , we need applications which could be easy to use and are user-friendly , that even people with specific disabilities use them easily. Facial Expression Recognition has vital role and challenges in communities of computer vision, pattern recognition which provide much more attention due to potential application in many areas such as human machine interaction, surveillance , robotics , driver safety, non- verbal communication, entertainment, health- care and psychology study. Facial Expression Recognition has major importance ration in face recognition for significant image applications understanding and analysis. There are many algorithms have been implemented on different static (uniform background, identical poses, similar illuminations ) and dynamic (position variation, partial occlusion orientation, varying lighting )conditions. In general way face expression recognition consist of three main steps first is face detection then feature Extraction and at last classification. In this survey paper we discussed different types of facial expression recognition techniques and various methods which is used by them and their performance measures

    Facial expression recognition in the wild : from individual to group

    Get PDF
    The progress in computing technology has increased the demand for smart systems capable of understanding human affect and emotional manifestations. One of the crucial factors in designing systems equipped with such intelligence is to have accurate automatic Facial Expression Recognition (FER) methods. In computer vision, automatic facial expression analysis is an active field of research for over two decades now. However, there are still a lot of questions unanswered. The research presented in this thesis attempts to address some of the key issues of FER in challenging conditions mentioned as follows: 1) creating a facial expressions database representing real-world conditions; 2) devising Head Pose Normalisation (HPN) methods which are independent of facial parts location; 3) creating automatic methods for the analysis of mood of group of people. The central hypothesis of the thesis is that extracting close to real-world data from movies and performing facial expression analysis on movies is a stepping stone in the direction of moving the analysis of faces towards real-world, unconstrained condition. A temporal facial expressions database, Acted Facial Expressions in the Wild (AFEW) is proposed. The database is constructed and labelled using a semi-automatic process based on closed caption subtitle based keyword search. Currently, AFEW is the largest facial expressions database representing challenging conditions available to the research community. For providing a common platform to researchers in order to evaluate and extend their state-of-the-art FER methods, the first Emotion Recognition in the Wild (EmotiW) challenge based on AFEW is proposed. An image-only based facial expressions database Static Facial Expressions In The Wild (SFEW) extracted from AFEW is proposed. Furthermore, the thesis focuses on HPN for real-world images. Earlier methods were based on fiducial points. However, as fiducial points detection is an open problem for real-world images, HPN can be error-prone. A HPN method based on response maps generated from part-detectors is proposed. The proposed shape-constrained method does not require fiducial points and head pose information, which makes it suitable for real-world images. Data from movies and the internet, representing real-world conditions poses another major challenge of the presence of multiple subjects to the research community. This defines another focus of this thesis where a novel approach for modeling the perception of mood of a group of people in an image is presented. A new database is constructed from Flickr based on keywords related to social events. Three models are proposed: averaging based Group Expression Model (GEM), Weighted Group Expression Model (GEM_w) and Augmented Group Expression Model (GEM_LDA). GEM_w is based on social contextual attributes, which are used as weights on each person's contribution towards the overall group's mood. Further, GEM_LDA is based on topic model and feature augmentation. The proposed framework is applied to applications of group candid shot selection and event summarisation. The application of Structural SIMilarity (SSIM) index metric is explored for finding similar facial expressions. The proposed framework is applied to the problem of creating image albums based on facial expressions, finding corresponding expressions for training facial performance transfer algorithms

    Face tracking with active models for a driver monitoring application

    Get PDF
    La falta de atención durante la conducción es una de las principales causas de accidentes de tráfico. La \ud \ud monitorización del conductor para detectar inatención es un problema complejo, que incluye elementos fisiológicos y de \ud \ud comportamiento. Un sistema de Visión Computacional para detección de inatención se compone de varios etapas de procesado, y \ud \ud esta tesis se centra en el seguimiento de la cara del conductor. La tesis doctoral propone un nuevo conjunto de vídeos de \ud \ud conductores, grabados en un vehículo real y en dos simuladores realistas, que contienen la mayoría de los comportamientos \ud \ud presentes en la conducción, incluyendo gestos, giros de cabeza, interacción con el sistema de sonido y otras distracciones, \ud \ud y somnolencia. Esta base de datos, RS-DMV, se emplea para evaluar el rendimiento de los métodos que propone la tesis y \ud \ud otros del estado del arte. La tesis analiza el rendimiento de los Modelos Activos de Forma (ASM), y de los Modelos Locales \ud \ud Restringidos (CLM), por considerarlos a priori de interés. En concreto, se ha evaluado el método Stacked Trimmed ASM \ud \ud (STASM), que integra una serie de mejoras sobre el ASM original, mostrando una alta precisión en todas las pruebas cuando \ud \ud la cara es frontal a la cámara, si bien no funciona con la cara girada y su velocidad de ejecución es muy baja. CLM es \ud \ud capaz de ejecutarse con mayor rapidez, pero tiene una precisión mucho menor en todos los casos. El tercer método a evaluar \ud \ud es el Modelado y Seguimiento Simultáneo (SMAT), que caracteriza la forma y la textura de manera incremental, a partir de \ud \ud muestras encontradas previamente. La textura alrededor de cada punto de la forma que define la cara se modela mediante un \ud \ud conjunto de grupos (clusters) de muestras pasadas. El trabajo de tesis propone 3 métodos de clustering alternativos al \ud \ud original para la textura, y un modelo de forma entrenado off-line con una función de ajuste robusta. Los métodos \ud \ud alternativos propuestos obtienen una amplia mejora tanto en la precisión del seguimiento como en la robustez de éste frente \ud \ud a giros de cabeza, oclusiones, gestos y cambios de iluminación. Los métodos propuestos tienen, además, una baja carga \ud \ud computacional, y son capaces de ejecutarse a velocidades en torno a 100 imágenes por segundo en un computador de sobremesa

    Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis

    Get PDF
    Emotion is expressed in multiple modalities, yet most research has considered at most one or two. This stems in part from the lack of large, diverse, well-annotated, multimodal databases with which to develop and test algorithms. We present a well-annotated, multimodal, multidimensional spontaneous emotion corpus of 140 participants. Emotion inductions were highly varied. Data were acquired from a variety of sensors of the face that included high-resolution 3D dynamic imaging, high-resolution 2D video, and thermal (infrared) sensing, and contact physiological sensors that included electrical conductivity of the skin, respiration, blood pressure, and heart rate. Facial expression was annotated for both the occurrence and intensity of facial action units from 2D video by experts in the Facial Action Coding System (FACS). The corpus further includes derived features from 3D, 2D, and IR (infrared) sensors and baseline results for facial expression and action unit detection. The entire corpus will be made available to the research community
    corecore