424 research outputs found

    Speachreading using shape and intensity information

    Get PDF
    We describe a speechreading system that uses both, shape information from the lip contours and intensity information from the mouth area. Shape information is obtained by tracking and parameterising the inner and outer lip boundary in an image sequence. Intensity information is extracted from a grey level model, based on principal component analysis. In comparison to other approaches, the intensity area deforms with the shape model to ensure that similar object features are represented after non-rigid deformation of the lips. We describe speaker independent recognition experiments based on these features and Hidden Markov Models. Preliminary results suggest that similar performance can be achieved by using either shape or intensity information and slightly higher performance by their combined use

    Towards Speaker Independent Continuous Speechreading

    Get PDF
    This paper describes recent speechreading experiments for a speaker independent continuous digit recognition task. Visual feature extraction is performed by a lip tracker which recovers information about the lip shape and information about the grey-level intensity around the mouth. These features are used to train visual word models using continuous density HMMs. Results show that the method generalises well to new speakers and that the recognition rate is highly variable across digits as expected due to the high visual confusability of certain words

    Contributions of temporal encodings of voicing, voicelessness, fundamental frequency, and amplitude variation to audiovisual and auditory speech perception

    Get PDF
    Auditory and audio-visual speech perception was investigated using auditory signals of invariant spectral envelope that temporally encoded the presence of voiced and voiceless excitation, variations in amplitude envelope and F-0. In experiment 1, the contribution of the timing of voicing was compared in consonant identification to the additional effects of variations in F-0 and the amplitude of voiced speech. In audio-visual conditions only, amplitude variation slightly increased accuracy globally and for manner features. F-0 variation slightly increased overall accuracy and manner perception in auditory and audio-visual conditions. Experiment 2 examined consonant information derived from the presence and amplitude variation of voiceless speech in addition to that from voicing, F-0, and voiced speech amplitude. Binary indication of voiceless excitation improved accuracy overall and for voicing and manner. The amplitude variation of voiceless speech produced only a small increment in place of articulation scores. A final experiment examined audio-visual sentence perception using encodings of voiceless excitation and amplitude variation added to a signal representing voicing and F-0. There was a contribution of amplitude variation to sentence perception, but not of voiceless excitation. The timing of voiced and voiceless excitation appears to be the major temporal cues to consonant identity. (C) 1999 Acoustical Society of America. [S0001-4966(99)01410-1]

    Sensory Communication

    Get PDF
    Contains table of contents on Section 2, an introduction, reports on eleven research projects and a list of publications.National Institutes of Health Grant 5 R01 DC00117National Institutes of Health Grant 5 R01 DC00270National Institutes of Health Contract 2 P01 DC00361National Institutes of Health Grant 5 R01 DC00100National Institutes of Health Contract 7 R29 DC00428National Institutes of Health Grant 2 R01 DC00126U.S. Air Force - Office of Scientific Research Grant AFOSR 90-0200U.S. Navy - Office of Naval Research Grant N00014-90-J-1935National Institutes of Health Grant 5 R29 DC00625U.S. Navy - Office of Naval Research Grant N00014-91-J-1454U.S. Navy - Office of Naval Research Grant N00014-92-J-181

    Visual Speech and Speaker Recognition

    Get PDF
    This thesis presents a learning based approach to speech recognition and person recognition from image sequences. An appearance based model of the articulators is learned from example images and is used to locate, track, and recover visual speech features. A major difficulty in model based approaches is to develop a scheme which is general enough to account for the large appearance variability of objects but which does not lack in specificity. The method described here decomposes the lip shape and the intensities in the mouth region into weighted sums of basis shapes and basis intensities, respectively, using a Karhunen-Loéve expansion. The intensities deform with the shape model to provide shape independent intensity information. This information is used in image search, which is based on a similarity measure between the model and the image. Visual speech features can be recovered from the tracking results and represent shape and intensity information. A speechreading (lip-reading) system is presented which models these features by Gaussian distributions and their temporal dependencies by hidden Markov models. The models are trained using the EM-algorithm and speech recognition is performed based on maximum posterior probability classification. It is shown that, besides speech information, the recovered model parameters also contain person dependent information and a novel method for person recognition is presented which is based on these features. Talking persons are represented by spatio-temporal models which describe the appearance of the articulators and their temporal changes during speech production. Two different topologies for speaker models are described: Gaussian mixture models and hidden Markov models. The proposed methods were evaluated for lip localisation, lip tracking, speech recognition, and speaker recognition on an isolated digit database of 12 subjects, and on a continuous digit database of 37 subjects. The techniques were found to achieve good performance for all tasks listed above. For an isolated digit recognition task, the speechreading system outperformed previously reported systems and performed slightly better than untrained human speechreaders

    Sensory Communication

    Get PDF
    Contains table of contents for Section 2 and reports on five research projects.National Institutes of Health Contract 2 R01 DC00117National Institutes of Health Contract 1 R01 DC02032National Institutes of Health Contract 2 P01 DC00361National Institutes of Health Contract N01 DC22402National Institutes of Health Grant R01-DC001001National Institutes of Health Grant R01-DC00270National Institutes of Health Grant 5 R01 DC00126National Institutes of Health Grant R29-DC00625U.S. Navy - Office of Naval Research Grant N00014-88-K-0604U.S. Navy - Office of Naval Research Grant N00014-91-J-1454U.S. Navy - Office of Naval Research Grant N00014-92-J-1814U.S. Navy - Naval Air Warfare Center Training Systems Division Contract N61339-94-C-0087U.S. Navy - Naval Air Warfare Center Training System Division Contract N61339-93-C-0055U.S. Navy - Office of Naval Research Grant N00014-93-1-1198National Aeronautics and Space Administration/Ames Research Center Grant NCC 2-77

    Differential activity in Heschl's gyrus between deaf and hearing individuals is due to auditory deprivation rather than language modality

    Get PDF
    Sensory cortices undergo crossmodal reorganisation as a consequence of sensory deprivation. Congenital deafness in humans represents a particular case with respect to other types of sensory deprivation, because cortical reorganisation is not only a consequence of auditory deprivation, but also of language-driven mechanisms. Visual crossmodal plasticity has been found in secondary auditory cortices of deaf individuals, but it is still unclear if reorganisation also takes place in primary auditory areas, and how this relates to language modality and auditory deprivation.  Here, we dissociated the effects of language modality and auditory deprivation on crossmodal plasticity in Heschl's gyrus as a whole, and in cytoarchitectonic region Te1.0 (likely to contain the core auditory cortex). Using fMRI, we measured the BOLD response to viewing sign language in congenitally or early deaf individuals with and without sign language knowledge, and in hearing controls.  Results show that differences between hearing and deaf individuals are due to a reduction in activation caused by visual stimulation in the hearing group, which is more significant in Te1.0 than in Heschl's gyrus as a whole. Furthermore, differences between deaf and hearing groups are due to auditory deprivation, and there is no evidence that the modality of language used by deaf individuals contributes to crossmodal plasticity in Heschl's gyrus
    • …
    corecore