Search CORE

210 research outputs found

A Unified Approach to Multi-Pose Audio-Visual ASR

Author: Lucey Patrick
Potamianos Gerasimos
Sridharan Subramanian
Publication venue: Causal Productions Pty Ltd
Publication date: 01/01/2007
Field of study

The vast majority of studies in the field of audio-visual automatic speech recognition (AVASR) assumes frontal images of a speaker's face, but this cannot always be guaranteed in practice. Hence our recent research efforts have concentrated on extracting visual speech information from non-frontal faces, in particular the profile view. The introduction of additional views to an AVASR system increases the complexity of the system, as it has to deal with the different visual features associated with the various views. In this paper, we propose the use of linear regression to find a transformation matrix based on synchronous frontal and profile visual speech data, which is used to normalize the visual speech in each viewpoint into a single uniform view. In our experiments for the task of multi-speaker lipreading, we show that this "pose-invariant" technique reduces train/test mismatch between visual speech features of different views, and is of particular benefit when there is more training data for one viewpoint over another (e.g. frontal over profile)

Queensland University of Technology ePrints Archive

Combining Multiple Views for Visual Speech Recognition

Author: Ekenel Hazım Kemal
Ghazi Mostafa Mehdipour
Thiran Jean-Philippe
Zimmermann Marina
Publication venue
Publication date: 07/07/2017
Field of study

Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view (

30^\circ

) to up to 83% when combining this view with the frontal and

60^\circ

view angles

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Recommended from our members

A note on the robust stability of uncertain stochastic fuzzy systems with time-delays

Author: Ho DWC
Liu X
Wang Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2004
Field of study

Copyright [2004] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.Takagi-Sugeno (T-S) fuzzy models are now often used to describe complex nonlinear systems in terms of fuzzy sets and fuzzy reasoning applied to a set of linear submodels. In this note, the T-S fuzzy model approach is exploited to establish stability criteria for a class of nonlinear stochastic systems with time delay. Sufficient conditions are derived in the format of linear matrix inequalities (LMIs), such that for all admissible parameter uncertainties, the overall fuzzy system is stochastically exponentially stable in the mean square, independent of the time delay. Therefore, with the numerically attractive Matlab LMI toolbox, the robust stability of the uncertain stochastic fuzzy systems with time delays can be easily checked

Brunel University Research Archive

Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping

Author: David Mulvaney (1252071)
M.Z. Ibrahim (7204967)
Publication venue
Publication date: 05/05/2015
Field of study

By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio

Loughborough University Institutional Repository

Selected extended abstracts from the 7th Annual Postgraduate Conference

Author: Battersby Stuart
Pachoud Samuel
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online