2,361 research outputs found

    Efficient Recognition of authentic dynamic facial expressions on the FEEDTUM database

    Get PDF
    In order to allow for fast recognition of a user’s affective state we discuss innovative holistic and self organizing approaches for efficient facial expression analysis. The feature set is thereby formed by global descriptors and MPEG based DCT coefficients. In view of subsequent classification we compare modelling by pseudo multidimensional Hidden Markov Models and Support Vector Machines. Within the latter case super-vectors are constructed based on Sequential Floating Search Methods. Extensive test-runs as a proof of concept are carried out on our publicly available FEEDTUM database consisting of elicited spontaneous emotions of 18 subjects within the MPEG-4 emotion-set plus added neutrality. Maximum recognition performance reaches the benchmark-rate gained by a human perception test with 20 test-persons and manifest the effectiveness of the introduced novel concepts. 1

    Offline Face Recognition System Based on GaborFisher Descriptors and Hidden Markov Models

    Get PDF
    This paper presents a new offline face recognition system. The proposed system is built on one dimensional left-to- right Hidden Markov Models (1D-HMMs). Facial image features are extracted using Gabor wavelets. The dimensionality of these features is reduced using the Fisher’s Discriminant Analysis method to keep only the most relevant information. Unlike existing techniques using 1D-HMMs, in classification step, the proposed system employs 1D-HMMs to find the relationship between reduced features components directly without any additional segmentation step of interest regions in the face image. The performance evaluation of the proposed method was performed with AR database and the proposed method showed a high recognition rate for this database

    The application of manifold based visual speech units for visual speech recognition

    Get PDF
    This dissertation presents a new learning-based representation that is referred to as a Visual Speech Unit for visual speech recognition (VSR). The automated recognition of human speech using only features from the visual domain has become a significant research topic that plays an essential role in the development of many multimedia systems such as audio visual speech recognition(AVSR), mobile phone applications, human-computer interaction (HCI) and sign language recognition. The inclusion of the lip visual information is opportune since it can improve the overall accuracy of audio or hand recognition algorithms especially when such systems are operated in environments characterized by a high level of acoustic noise. The main contribution of the work presented in this thesis is located in the development of a new learning-based representation that is referred to as Visual Speech Unit for Visual Speech Recognition (VSR). The main components of the developed Visual Speech Recognition system are applied to: (a) segment the mouth region of interest, (b) extract the visual features from the real time input video image and (c) to identify the visual speech units. The major difficulty associated with the VSR systems resides in the identification of the smallest elements contained in the image sequences that represent the lip movements in the visual domain. The Visual Speech Unit concept as proposed represents an extension of the standard viseme model that is currently applied for VSR. The VSU model augments the standard viseme approach by including in this new representation not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG- 4) viseme models. Two experimental results indicate that: 1. The developed VSR system achieved 80-90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 62-72%. 2. 15 words are identified when VSU and viseme are employed as the visual speech element. The accuracy rate for word recognition based on VSUs is 7%-12% higher than the accuracy rate based on visemes

    Facial Expression Recognition Based on 3D Dynamic Range Model Sequences

    Full text link
    corecore