Search CORE

61 research outputs found

Visual Speech and Speaker Recognition

Author: Luettin Juergen
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 10/03/2006
Field of study

This thesis presents a learning based approach to speech recognition and person recognition from image sequences. An appearance based model of the articulators is learned from example images and is used to locate, track, and recover visual speech features. A major difficulty in model based approaches is to develop a scheme which is general enough to account for the large appearance variability of objects but which does not lack in specificity. The method described here decomposes the lip shape and the intensities in the mouth region into weighted sums of basis shapes and basis intensities, respectively, using a Karhunen-Loéve expansion. The intensities deform with the shape model to provide shape independent intensity information. This information is used in image search, which is based on a similarity measure between the model and the image. Visual speech features can be recovered from the tracking results and represent shape and intensity information. A speechreading (lip-reading) system is presented which models these features by Gaussian distributions and their temporal dependencies by hidden Markov models. The models are trained using the EM-algorithm and speech recognition is performed based on maximum posterior probability classification. It is shown that, besides speech information, the recovered model parameters also contain person dependent information and a novel method for person recognition is presented which is based on these features. Talking persons are represented by spatio-temporal models which describe the appearance of the articulators and their temporal changes during speech production. Two different topologies for speaker models are described: Gaussian mixture models and hidden Markov models. The proposed methods were evaluated for lip localisation, lip tracking, speech recognition, and speaker recognition on an isolated digit database of 12 subjects, and on a continuous digit database of 37 subjects. The techniques were found to achieve good performance for all tasks listed above. For an isolated digit recognition task, the speechreading system outperformed previously reported systems and performed slightly better than untrained human speechreaders

Infoscience - École polytechnique fédérale de Lausanne

Speaker verification experiments on the XM2VTS database

Author: Luettin Juergen
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Towards Speaker Independent Continuous Speechreading

Author: Luettin Juergen
Publication venue
Publication date: 10/03/2006
Field of study

This paper describes recent speechreading experiments for a speaker independent continuous digit recognition task. Visual feature extraction is performed by a lip tracker which recovers information about the lip shape and information about the grey-level intensity around the mouth. These features are used to train visual word models using continuous density HMMs. Results show that the method generalises well to new speakers and that the recognition rate is highly variable across digits as expected due to the high visual confusability of certain words

Infoscience - École polytechnique fédérale de Lausanne

Off-Line Cursive Script Recognition Based on Continuous Density HMM

Author: Luettin Juergen
Vinciarelli Alessandro
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

A system for off-line cursive script recognition is presented. A new normalization technique (based on statistical methods) to compensate for the variability of writing style is described. The key problem of segmentation is avoided by applying a sliding window on the handwritten words. A feature vector is extracted from each frame isolated by the window. The feature vectors are used as observations in letter-oriented continuous density HMMs that perform the recognition. Feature extraction and modeling techniques are illustrated. In order to allow the comparison of the results, the system has been trained and tested using the same data and experimental conditions as in other published works. The performance of the system is evaluated in terms of character and word (with and without lexicon) recognition rate. Results comparable to those of more complex systems have been achieved

Infoscience - École polytechnique fédérale de Lausanne

Using the Multi-Stream Approach for Continuous Audio-Visual Speech Recognition

Author: Dupont Stéphane
Luettin Juergen
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

The Multi-Stream automatic speech recognition approach was investigated in this work as a framework for Audio-Visual data fusion and speech recognition. This method presents many potential advantages for such a task. It particularly allows for synchronous decoding of continuous speech while still allowing for some asynchrony of the visual and acoustic information streams. First, the Multi-Stream formalism is briefly recalled. Then, on top of the Multi-Stream motivations, experiments on the {\sc M2VTS} multimodal database are presented and discussed. To our knowledge, these are the first experiments about multi-speaker continuous Audio-Visual Speech Recognition (AVSR). It is shown that the Multi-Stream approach can yield improved Audio-Visual speech recognition performance when the acoustic signal is corrupted by noise as well as for clean speech

Infoscience - École polytechnique fédérale de Lausanne

A Survey of Text Detection and Recognition in Images and Videos

Author: Chen Datong
Luettin Juergen
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

A Survey of Text Detection and Recognition in Images and Videos, including the state-of-the-art methods and systems

Infoscience - École polytechnique fédérale de Lausanne

Recognition of Asymmetric Facial Action Unit Activities and Intensities

Author: Fasel B.
Luettin Juergen
Publication venue: Barcelona, Spain
Publication date: 10/03/2006
Field of study

Most automatic facial expression analysis systems try to analyze emotion categories. However, psychologists argue that there is no straight forward way to classify emotions from facial expressions. Instead, they propose FACS (Facial Action Coding System), a de-facto standard for categorizing facial actions independent from emotional categories. We describe a system that recognizes asymmetric FACS Action Unit activities and intensities without the use of markers. Facial expression extraction is achieved by difference images that are projected into a sub-space using either PCA or ICA, followed by nearest neighbor classification. Experiments show that this holistic approach achieves a recognition performance comparable to marker-based facial expression analysis systems or human FACS experts for a single-subject database recorded under controlled conditions

Infoscience - École polytechnique fédérale de Lausanne

Optimal Parameterization of Point Distribution Models

Author: Luettin Juergen
Thimm Georg
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

We address the problem of determining the \emph{optimal model complexity} for shape modeling. This complexity is a compromise between model specificity and generality. We show that the error of a model can be split into two components, the model error and the fitting error, of which the first one can be used to optimize the model complexity based on the specific application. This strategy improves over traditional approaches, where the model complexity is only determined by vague heuristics or trial-and-error. A method for the determination of optimal active shape models is proposed and its efficiency is validated in several experiments. Furthermore, this method gives an indication on the range of valid shape parameters and on whether or not an increased number of training data will reduce the number of shape parameters further

Infoscience - École polytechnique fédérale de Lausanne

Illumination-robust Pattern Matching Using Distorted Color Histograms

Author: Luettin Juergen
Thimm Georg
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

It is argued that global illumination should be modeled separately from other incidents that change the appearance of objects. The effects of intensity variations of the global illumination are discussed and constraints deduced that restrict the shape of a function that maps the histogram of a template to the histogram of an image location. This approach is illustrated for simple pattern matching and for a combination with a PCA (\emph{Eigenface}) model of the grey-level appearance

Infoscience - École polytechnique fédérale de Lausanne

Automatic Facial Expression Analysis: A Survey

Author: Fasel B.
Luettin Juergen
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Over the last decade, automatic facial expression analysis has become an active research area that finds potential applications in areas such as more engaging human-computer interfaces, talking heads, image retrieval and human emotion analysis. Facial expressions reflect not only emotions, but other mental activities, social interaction and physiological signals. In this survey we introduce the most prominent automatic facial expression analysis methods and systems presented in the literature. Facial motion and deformation extraction approaches as well as classification methods are discussed with respect to issues such as face normalization, facial expression dynamics and facial expression intensity, but also with regard to their robustness towards environmental changes

Infoscience - École polytechnique fédérale de Lausanne