Search CORE

579 research outputs found

Real-time lip-tracking for lipreading

Author: Meier Uwe
Stiefelhagen Rainer
Yang Jie
Publication venue
Publication date: 02/08/2007
Field of study

The Role of Multiple Articulatory Channels of Sign-Supported Speech Revealed by Visual Processing

Author: Burigo Michele
Mastrantuono Eliana
Rodríguez Ortiz Isabel de los Reyes
Saldaña Sage David
Publication venue: 'American Speech Language Hearing Association'
Publication date: 01/01/2019
Field of study

Purpose The use of sign-supported speech (SSS) in the education of deaf students has been recently discussed in relation to its usefulness with deaf children using cochlear implants. To clarify the benefits of SSS for comprehension, 2 eye-tracking experiments aimed to detect the extent to which signs are actively processed in this mode of communication. Method Participants were 36 deaf adolescents, including cochlear implant users and native deaf signers. Experiment 1 attempted to shift observers' foveal attention to the linguistic source in SSS from which most information is extracted, lip movements or signs, by magnifying the face area, thus modifying lip movements perceptual accessibility (magnified condition), and by constraining the visual field to either the face or the sign through a moving window paradigm (gaze contingent condition). Experiment 2 aimed to explore the reliance on signs in SSS by occasionally producing a mismatch between sign and speech. Participants were required to concentrate upon the orally transmitted message. Results In Experiment 1, analyses revealed a greater number of fixations toward the signs and a reduction in accuracy in the gaze contingent condition across all participants. Fixations toward signs were also increased in the magnified condition. In Experiment 2, results indicated less accuracy in the mismatching condition across all participants. Participants looked more at the sign when it was inconsistent with speech. Conclusions All participants, even those with residual hearing, rely on signs when attending SSS, either peripherally or through overt attention, depending on the perceptual conditions.Unión Europea, Grant Agreement 31674

idUS. Depósito de Investigación Universidad de Sevilla

Recommended from our members

A note on the robust stability of uncertain stochastic fuzzy systems with time-delays

Author: Ho DWC
Liu X
Wang Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2004
Field of study

Copyright [2004] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.Takagi-Sugeno (T-S) fuzzy models are now often used to describe complex nonlinear systems in terms of fuzzy sets and fuzzy reasoning applied to a set of linear submodels. In this note, the T-S fuzzy model approach is exploited to establish stability criteria for a class of nonlinear stochastic systems with time delay. Sufficient conditions are derived in the format of linear matrix inequalities (LMIs), such that for all admissible parameter uncertainties, the overall fuzzy system is stochastically exponentially stable in the mean square, independent of the time delay. Therefore, with the numerically attractive Matlab LMI toolbox, the robust stability of the uncertain stochastic fuzzy systems with time delays can be easily checked

Brunel University Research Archive

Lip Reading Sentences in the Wild

Author: Chung Joon Son
Senior Andrew
Vinyals Oriol
Zisserman Andrew
Publication venue
Publication date: 01/01/2017
Field of study

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem - unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) a 'Watch, Listen, Attend and Spell' (WLAS) network that learns to transcribe videos of mouth motion to characters; (2) a curriculum learning strategy to accelerate training and to reduce overfitting; (3) a 'Lip Reading Sentences' (LRS) dataset for visual speech recognition, consisting of over 100,000 natural sentences from British television. The WLAS model trained on the LRS dataset surpasses the performance of all previous work on standard lip reading benchmark datasets, often by a significant margin. This lip reading performance beats a professional lip reader on videos from BBC television, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is available

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Lipreading with Long Short-Term Memory

Author: Koutník Jan
Schmidhuber Jürgen
Wand Michael
Publication venue
Publication date: 29/01/2016
Field of study

Lipreading, i.e. speech recognition from visual-only recordings of a speaker's face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feed-forward and recurrent neural network layers (namely Long Short-Term Memory; LSTM) are stacked to form a single structure which is trained by back-propagating error gradients through all the layers. The performance of such a stacked network was experimentally evaluated and compared to a standard Support Vector Machine classifier using conventional computer vision features (Eigenlips and Histograms of Oriented Gradients). The evaluation was performed on data from 19 speakers of the publicly available GRID corpus. With 51 different words to classify, we report a best word accuracy on held-out evaluation speakers of 79.6% using the end-to-end neural network-based solution (11.6% improvement over the best feature-based solution evaluated).Comment: Accepted for publication at ICASSP 201

arXiv.org e-Print Archive

Crossref