269 research outputs found

    Lokalno diskriminantna projekcija difuzije i njena primjena za prepoznavanje emocionalnog stanja iz govornog signala

    Get PDF
    The existing Diffusion Maps method brings diffusion to data samples by Markov random walk. In this paper, to provide a general solution form of Diffusion Maps, first, we propose the generalized single-graph-diffusion embedding framework on the basis of graph embedding framework. Second, by designing the embedding graph of the framework, an algorithm, namely Locally Discriminant Diffusion Projection (LDDP), is proposed for speech emotion recognition. This algorithm is the projection form of the improved Diffusion Maps, which includes both discriminant information and local information. The linear or kernelized form of LDDP (i.e., LLDDP or KLDDP) is used to achieve the dimensionality reduction of original speech emotion features. We validate the proposed algorithm on two widely used speech emotion databases, EMO-DB and eNTERFACE\u2705. The experimental results show that the proposed LDDP methods, including LLDDP and KLDDP, outperform some other state-of-the-art dimensionality reduction methods which are based on graph embedding or discriminant analysis.Postojeće metode mapiranja difuzije u uzorke podataka primjenjuju Markovljevu slučajnu šetnju. U ovom radu, kako bismo pružili općenito rješenje za mapiranje difuzije, prvo predlažemo generalizirano okruženje za difuziju jednog grafa, zasnovano na okruženju za primjenu grafova. Drugo, konstruirajući ugrađeni graf, predlažemo algoritam lokalno diskriminantne projekcije difuzije (LDDP) za prepoznavanje emocionalnog stanja iz govornog signala. Ovaj algoritam je projekcija poboljšane difuzijske mape koja uključuje diskriminantnu i lokalnu informaciju. Linearna ili jezgrovita formulacija LDDP-a (i.e., LLDDP ili KLDDP) koristi se u svrhu redukcije dimenzionalnosti originalnog skupa značajki za prepoznavanje emocionalnog stanja iz govornog signala. Predloženi algoritam testiran je nad dvama široko korištenim bazama podataka za prepoznavanje emocionalnog stanja iz govornog signala, EMO-DB i eNTERFACE\u2705. Eksperimentalni rezultati pokazuju kako predložena LDDP metoda, uključujući LLDDP i KLDDP, pokazuje bolje ponašanje od nekih drugih najsuvremenijih metoda redukcije dimenzionalnosti, zasnovanim na ugrađenim grafovima ili analizi diskriminantnosti

    Discriminant Multi-Label Manifold Embedding for Facial Action Unit Detection

    Get PDF
    This article describes a system for participation in the Facial Expression Recognition and Analysis (FERA2015) sub-challenge for spontaneous action unit occurrence detection. The problem of AU detection is a multi-label classification problem by its nature, which is a fact overseen by most existing work. The correlation information between AUs has the potential of increasing the detection accuracy.We investigate the multi-label AU detection problem by embedding the data on low dimensional manifolds which preserve multi-label correlation. For this, we apply the multi-label Discriminant Laplacian Embedding (DLE) method as an extension to our base system. The system uses SIFT features around a set of facial landmarks that is enhanced with the use of additional non-salient points around transient facial features. Both the base system and the DLE extension show better performance than the challenge baseline results for the two databases in the challenge, and achieve close to 50% as F1-measure on the testing partition in average (9.9% higher than the baseline, in the best case). The DLE extension proves useful for certain AUs, but also shows the need for more analysis to assess the benefits in general

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Individual and Inter-related Action Unit Detection in Videos for Affect Recognition

    Get PDF
    The human face has evolved to become the most important source of non-verbal information that conveys our affective, cognitive and mental state to others. Apart from human to human communication facial expressions have also become an indispensable component of human-machine interaction (HMI). Systems capable of understanding how users feel allow for a wide variety of applications in medical, learning, entertainment and marketing technologies in addition to advancements in neuroscience and psychology research and many others. The Facial Action Coding System (FACS) has been built to objectively define and quantify every possible facial movement through what is called Action Units (AU), each representing an individual facial action. In this thesis we focus on the automatic detection and exploitation of these AUs using novel appearance representation techniques as well as incorporation of the prior co-occurrence information between them. Our contributions can be grouped in three parts. In the first part, we propose to improve the detection accuracy of appearance features based on local binary patterns (LBP) for AU detection in videos. For this purpose, we propose two novel methodologies. The first one uses three fundamental image processing tools as a pre-processing step prior to the application of the LBP transform on the facial texture. These tools each enhance the descriptive ability of LBP by emphasizing different transient appearance characteristics, and are proven to increase the AU detection accuracy significantly in our experiments. The second one uses multiple local curvature Gabor binary patterns (LCGBP) for the same problem and achieves state-of-the-art performance on a dataset of mostly posed facial expressions. The curvature information of the face, as well as the proposed multiple filter size scheme is very effective in recognizing these individual facial actions. In the second part, we propose to take advantage of the co-occurrence relation between the AUs, that we can learn through training examples. We use this information in a multi-label discriminant Laplacian embedding (DLE) scheme to train our system with SIFT features extracted around the salient and transient landmarks on the face. The system is first validated on a challenging (containing lots of occlusions and head pose variations) dataset without the DLE, then we show the performance of the full system on the FERA 2015 challenge on AU occurence detection. The challenge consists of two difficult datasets that contain spontaneous facial actions at different intensities. We demonstrate that our proposed system achieves the best results on these datasets for detecting AUs. The third and last part of the thesis contains an application on how this automatic AU detection system can be used in real-life situations, particularly for detecting cognitive distraction. Our contribution in this part is two-fold: First, we present a novel visual database of people driving a simulator while being induced visual and cognitive distraction via secondary tasks. The subjects have been recorded using three near-infrared camera-lighting systems, which makes it a very suitable configuration to use in real driving conditions, i.e. with large head pose and ambient light variations. Secondly, we propose an original framework to automatically discriminate cognitive distraction sequences from baseline sequences by extracting features from continuous AU signals and by exploiting the cross-correlations between them. We achieve a very high classification accuracy in our subject-based experiments and a lower yet acceptable performance for the subject-independent tests. Based on these results we discuss how facial expressions related to this complex mental state are individual, rather than universal, and also how the proposed system can be used in a vehicle to help decrease human error in traffic accidents
    corecore