Search CORE

424 research outputs found

Speachreading using shape and intensity information

Author: Beet Steve W.
Luettin Juergen
Thacker Neil A.
Publication venue
Publication date: 10/03/2006
Field of study

We describe a speechreading system that uses both, shape information from the lip contours and intensity information from the mouth area. Shape information is obtained by tracking and parameterising the inner and outer lip boundary in an image sequence. Intensity information is extracted from a grey level model, based on principal component analysis. In comparison to other approaches, the intensity area deforms with the shape model to ensure that similar object features are represented after non-rigid deformation of the lips. We describe speaker independent recognition experiments based on these features and Hidden Markov Models. Preliminary results suggest that similar performance can be achieved by using either shape or intensity information and slightly higher performance by their combined use

Infoscience - École polytechnique fédérale de Lausanne

Towards Speaker Independent Continuous Speechreading

Author: Luettin Juergen
Publication venue
Publication date: 10/03/2006
Field of study

This paper describes recent speechreading experiments for a speaker independent continuous digit recognition task. Visual feature extraction is performed by a lip tracker which recovers information about the lip shape and information about the grey-level intensity around the mouth. These features are used to train visual word models using continuous density HMMs. Results show that the method generalises well to new speakers and that the recognition rate is highly variable across digits as expected due to the high visual confusability of certain words

Infoscience - École polytechnique fédérale de Lausanne

Contributions of temporal encodings of voicing, voicelessness, fundamental frequency, and amplitude variation to audiovisual and auditory speech perception

Author: Andrew Faulkner
Drullman R.
Faulkner A.
Faulkner A.
Fourcin A.
Grant K. W.
Rosen S.
Rosen S.
Shinn P.
Stuart Rosen
Van Tasell D. J.
Waldstein R. S.
Publication venue: AMER INST PHYSICS
Publication date: 01/10/1999
Field of study

Auditory and audio-visual speech perception was investigated using auditory signals of invariant spectral envelope that temporally encoded the presence of voiced and voiceless excitation, variations in amplitude envelope and F-0. In experiment 1, the contribution of the timing of voicing was compared in consonant identification to the additional effects of variations in F-0 and the amplitude of voiced speech. In audio-visual conditions only, amplitude variation slightly increased accuracy globally and for manner features. F-0 variation slightly increased overall accuracy and manner perception in auditory and audio-visual conditions. Experiment 2 examined consonant information derived from the presence and amplitude variation of voiceless speech in addition to that from voicing, F-0, and voiced speech amplitude. Binary indication of voiceless excitation improved accuracy overall and for voicing and manner. The amplitude variation of voiceless speech produced only a small increment in place of articulation scores. A final experiment examined audio-visual sentence perception using encodings of voiceless excitation and amplitude variation added to a signal representing voicing and F-0. There was a contribution of amplitude variation to sentence perception, but not of voiceless excitation. The timing of voiced and voiceless excitation appears to be the major temporal cues to consonant identity. (C) 1999 Acoustical Society of America. [S0001-4966(99)01410-1]

Crossref

UCL Discovery

A Framework for Speechreading Acquisition Tools

Author: Cornett Richard Orin
Gatehouse Stuart
Hickson Louise
Hiki Shizuo
Kaplan Harriet
Lucey Patrick
Roxburgh Zoe
Shoup June E.
Upton Hubert W
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/05/2017
Field of study

Crossref

University of Dundee Online Publications

Sensory Communication

Author: Alvarez Daniel A.
Braida Louis D.
Chen Jyh-Shing
Coffman Bridget L.
Dandekar Kiran B.
Delhorne Lorraine A.
Desloge Joseph G.
Duchnowski Paul
Durlach Nathaniel I.
Eddington Donald K.
Frisbie Joseph A.
Fuchs Eric M.
Goldish Andrew C.
Goldman Susan L.
Greenberg Julie E.
Gulati Rogeve J.
Held Richard M.
Jandura Louise
Keagy Michael T.
Lum David S.
Martin Gregory R.
Mueller Jason
Nadeau Philip P.
Nadelski Mark T.
Power Matthew H.
Rabinowitz William M.
Rankovic Christine M.
Reed Charlotte M.
Salisbury J. Kenneth
Shao Yun
Shinn-Cunningham Barbara G.
Srinivasan Mandayam A.
Stadler Robert W.
Tan Hong Z.
Uchanski Rosalie M.
Wei Min
Wozniak Jennifer A.
Zue Victor W.
Zurek Patrick M.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents on Section 2, an introduction, reports on eleven research projects and a list of publications.National Institutes of Health Grant 5 R01 DC00117National Institutes of Health Grant 5 R01 DC00270National Institutes of Health Contract 2 P01 DC00361National Institutes of Health Grant 5 R01 DC00100National Institutes of Health Contract 7 R29 DC00428National Institutes of Health Grant 2 R01 DC00126U.S. Air Force - Office of Scientific Research Grant AFOSR 90-0200U.S. Navy - Office of Naval Research Grant N00014-90-J-1935National Institutes of Health Grant 5 R29 DC00625U.S. Navy - Office of Naval Research Grant N00014-91-J-1454U.S. Navy - Office of Naval Research Grant N00014-92-J-181

DSpace@MIT

Visual Speech and Speaker Recognition

Author: Luettin Juergen
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 10/03/2006
Field of study

This thesis presents a learning based approach to speech recognition and person recognition from image sequences. An appearance based model of the articulators is learned from example images and is used to locate, track, and recover visual speech features. A major difficulty in model based approaches is to develop a scheme which is general enough to account for the large appearance variability of objects but which does not lack in specificity. The method described here decomposes the lip shape and the intensities in the mouth region into weighted sums of basis shapes and basis intensities, respectively, using a Karhunen-Loéve expansion. The intensities deform with the shape model to provide shape independent intensity information. This information is used in image search, which is based on a similarity measure between the model and the image. Visual speech features can be recovered from the tracking results and represent shape and intensity information. A speechreading (lip-reading) system is presented which models these features by Gaussian distributions and their temporal dependencies by hidden Markov models. The models are trained using the EM-algorithm and speech recognition is performed based on maximum posterior probability classification. It is shown that, besides speech information, the recovered model parameters also contain person dependent information and a novel method for person recognition is presented which is based on these features. Talking persons are represented by spatio-temporal models which describe the appearance of the articulators and their temporal changes during speech production. Two different topologies for speaker models are described: Gaussian mixture models and hidden Markov models. The proposed methods were evaluated for lip localisation, lip tracking, speech recognition, and speaker recognition on an isolated digit database of 12 subjects, and on a continuous digit database of 37 subjects. The techniques were found to achieve good performance for all tasks listed above. For an isolated digit recognition task, the speechreading system outperformed previously reported systems and performed slightly better than untrained human speechreaders

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

A note on the robust stability of uncertain stochastic fuzzy systems with time-delays

Author: Ho DWC
Liu X
Wang Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2004
Field of study

Copyright [2004] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.Takagi-Sugeno (T-S) fuzzy models are now often used to describe complex nonlinear systems in terms of fuzzy sets and fuzzy reasoning applied to a set of linear submodels. In this note, the T-S fuzzy model approach is exploited to establish stability criteria for a class of nonlinear stochastic systems with time delay. Sufficient conditions are derived in the format of linear matrix inequalities (LMIs), such that for all admissible parameter uncertainties, the overall fuzzy system is stochastically exponentially stable in the mean square, independent of the time delay. Therefore, with the numerically attractive Matlab LMI toolbox, the robust stability of the uncertain stochastic fuzzy systems with time delays can be easily checked

Brunel University Research Archive

Automatic Speechreading with Applications to Human-Computer Interfaces

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Sensory Communication

Author: Annaswamy Anuradha M.
Aviles Walter A.
Babiec Walter E.
Baird Stephen V.
Bandy James H.
Beauregard Gerald L.
Born Susan E.
Braida Louis D.
Bratakos Maroula S.
Brock David L.
Carmel Erika N.
Chen Jyh-Shing
Dandekar Kiran B.
Delhorne Lorraine A.
Denesvich Gail
Desloge Joseph G.
Dix Ann K.
Duchnowski Paul
Durlach Nathaniel I.
Eddington Donald K.
Foxlin Eric M.
Frisbie Joseph A.
Gajaweera Ashanthi
Goldman Susan L.
Greenberg Julie E.
Gulati Rogeve J.
Gupta Rakesh
Hall Dorrie
Hall Seth M.
Held Richard M.
Howe Robert D.
Jandura Louise
Jones Gabrielle
Jones Lynette A.
Karason Steingrimur P.
Krause Jean C.
LaMotte Robert H.
Lee Jeng-Feng
Lemay Danielle G.
Lossos David C.
Lum David S.
Morgenbesser Hugh B.
Nadeau Philip M.
O'Connell Michael P.
Pfautz Jonathan D.
Pioch Nicholas
Plant Geoffrey L.
Power Matthew H.
Rabinowitz William M.
Rankovic Christine M.
Reed Charlotte M.
Roby Frederick L.
Rodriguez-Perez Tomas
Sachtler Wendelin L.
Salisbury J. Kenneth
Santos Jonathan R.
Sexton Matthew G.
Shinn-Cunningham Barbara G.
Srikantiah Ranjini
Srinivasan Mandayam A.
Takeuchi Anne H.
Tan Hong Z.
Towles Joseph D.
Tuyo Mabayoje
Weber Lukasz A.
Welker Daniel P.
Wiegand Thomas E. v.
Wies Evan F.
Zeltzer David
Zilles Craig B.
Zurek Patrick M.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Section 2 and reports on five research projects.National Institutes of Health Contract 2 R01 DC00117National Institutes of Health Contract 1 R01 DC02032National Institutes of Health Contract 2 P01 DC00361National Institutes of Health Contract N01 DC22402National Institutes of Health Grant R01-DC001001National Institutes of Health Grant R01-DC00270National Institutes of Health Grant 5 R01 DC00126National Institutes of Health Grant R29-DC00625U.S. Navy - Office of Naval Research Grant N00014-88-K-0604U.S. Navy - Office of Naval Research Grant N00014-91-J-1454U.S. Navy - Office of Naval Research Grant N00014-92-J-1814U.S. Navy - Naval Air Warfare Center Training Systems Division Contract N61339-94-C-0087U.S. Navy - Naval Air Warfare Center Training System Division Contract N61339-93-C-0055U.S. Navy - Office of Naval Research Grant N00014-93-1-1198National Aeronautics and Space Administration/Ames Research Center Grant NCC 2-77

DSpace@MIT

Differential activity in Heschl's gyrus between deaf and hearing individuals is due to auditory deprivation rather than language modality

Author: Bavelier
Bencie Woll
Calvert
Calvert
Campbell
Capek
Cardin
Cheryl M. Capek
Da Costa
Dale
De Martino
Dick
Eleni Orfanidou
Emmorey
Fine
Finney
Fischl
Fischl
Fischl
Fischl
Formisano
Geers
Geers
Giraud
Giraud
Grosvald
Hackett
Han
Hensch
Herman
Hubel
Humphries
Hunt
Jerker Rönnberg
Johnson
Jovicich
Karns
Kral
Kral
Kral
Lambertz
Langers
Laurienti
Leonard
Lewkowicz
Lomber
Lyness
Lyness
MacSweeney
MacSweeney
Mangus
Mary Rudner
McGurk
Merabet
Meredith
Meredith
Meredith
Mills
Morosan
Neville
O'Donoghue
Olulade
Orfanidou
Pekkola
Penhune
Rebecca C. Smittenaar
Reich
Rouger
Rönnberg
Sadato
Sadato
Sakai
Schönwiesner
Scott
Scott
Sharma
Shore
Stevenson
Striem-Amit
Striem-Amit
Ségonne
Söderfeldt
Tahmasebi
Talavage
Teoh
van den Bogaerde
Velia Cardin
Wang
Wild
Wong
Wong
Woods
Zeng
Publication venue: 'Elsevier BV'
Publication date: 05/09/2015
Field of study

Sensory cortices undergo crossmodal reorganisation as a consequence of sensory deprivation. Congenital deafness in humans represents a particular case with respect to other types of sensory deprivation, because cortical reorganisation is not only a consequence of auditory deprivation, but also of language-driven mechanisms. Visual crossmodal plasticity has been found in secondary auditory cortices of deaf individuals, but it is still unclear if reorganisation also takes place in primary auditory areas, and how this relates to language modality and auditory deprivation. Here, we dissociated the effects of language modality and auditory deprivation on crossmodal plasticity in Heschl's gyrus as a whole, and in cytoarchitectonic region Te1.0 (likely to contain the core auditory cortex). Using fMRI, we measured the BOLD response to viewing sign language in congenitally or early deaf individuals with and without sign language knowledge, and in hearing controls. Results show that differences between hearing and deaf individuals are due to a reduction in activation caused by visual stimulation in the hearing group, which is more significant in Te1.0 than in Heschl's gyrus as a whole. Furthermore, differences between deaf and hearing groups are due to auditory deprivation, and there is no evidence that the modality of language used by deaf individuals contributes to crossmodal plasticity in Heschl's gyrus

Publikationer från Linköpings universitet

Crossref

UCL Discovery

Digitala Vetenskapliga Arkivet - Academic Archive On-line

The University of Manchester - Institutional Repository

University of East Anglia digital repository