3 research outputs found

    Can you "read tongue movements"?

    Get PDF
    International audienceLip reading relies on visible articulators to ease audiovisual speech understanding. However, lips and face alone provide very incomplete phonetic information: the tongue, that is generally not entirely seen, carries an important part of the articulatory information not accessible through lip reading. The question was thus whether the direct and full vision of the tongue allows tongue reading. We have therefore generated a set of audiovisual VCV stimuli by controlling an audiovisual talking head that can display all speech articulators, including tongue, in an augmented speech mode, from articulators movements tracked on a speaker. These stimuli have been played to subjects in a series of audiovisual perception tests in various presentation conditions (audio signal alone, audiovisual signal with profile cutaway display with or without tongue, complete face), at various Signal-to-Noise Ratios. The results show a given implicit effect of tongue reading learning, a preference for the more ecological rendering of the complete face in comparison with the cutaway presentation, a predominance of lip reading over tongue reading, but the capability of tongue reading to take over when the audio signal is strongly degraded or absent. We conclude that these tongue reading capabilities could be used for applications in the domain of speech therapy for speech retarded children, perception and production rehabilitation of hearing impaired children, and pronunciation training for second language learner

    Tracking Talking Faces With Shape and Appearance Models

    No full text
    This paper presents a system that can recover and track the 3D speech movements of a speaker's face for each image of a monocular sequence. To handle both the individual specificities of the speaker's articulation and the complexity of the facial deformations during speech, speaker-specific articulated models of the face geometry and appearance are first built from real data. These face models are used for tracking: articulatory parameters are extracted for each image by an analysis-by-synthesis loop. The geometric model is linearly controlled by only seven articulatory parameters. Appearance is seen either as a classical texture map or through local appearance of a relevant subset of 3D points. We compare several appearance models: they are either constant or depend linearly on the articulatory parameters. We compare tracking results using these different appearance models with ground truth data not only in terms of recovery errors of the 3D geometry but also in terms of intelligibility enhancement provided by the movement
    corecore