Search CORE

60,927 research outputs found

Can you "read tongue movements"?

Author: Badin Pierre
Bailly Gérard
Elisei Frédéric
Tarabalka Yuliya
Publication venue: HAL CCSD
Publication date: 22/09/2008
Field of study

International audienceLip reading relies on visible articulators to ease audiovisual speech understanding. However, lips and face alone provide very incomplete phonetic information: the tongue, that is generally not entirely seen, carries an important part of the articulatory information not accessible through lip reading. The question was thus whether the direct and full vision of the tongue allows tongue reading. We have therefore generated a set of audiovisual VCV stimuli by controlling an audiovisual talking head that can display all speech articulators, including tongue, in an augmented speech mode, from articulators movements tracked on a speaker. These stimuli have been played to subjects in a series of audiovisual perception tests in various presentation conditions (audio signal alone, audiovisual signal with profile cutaway display with or without tongue, complete face), at various Signal-to-Noise Ratios. The results show a given implicit effect of tongue reading learning, a preference for the more ecological rendering of the complete face in comparison with the cutaway presentation, a predominance of lip reading over tongue reading, but the capability of tongue reading to take over when the audio signal is strongly degraded or absent. We conclude that these tongue reading capabilities could be used for applications in the domain of speech therapy for speech retarded children, perception and production rehabilitation of hearing impaired children, and pronunciation training for second language learner

Hal - Université Grenoble Alpes

Exploring face perception in disorders of development: evidence from Williams syndrome and autism

Author: Baron-Cohen
Baron-Cohen
Bellugi
Bellugi
Bellugi
Boucher
Brothers
Bruce
Bruce
Calder
Castelli
Deborah M. Riby
Deruelle
Deruelle
Dunn
Frigerio
Frith
Gepner
Gwyneth Doherty-Sneddon
Haxby
Hefter
Johnson
Jones
Karmiloff-Smith
Karmiloff-Smith
Karmiloff-Smith
Klin
Kylliäinen
Lopéz
Meyer-Lindenberg
Morris
Mottron
Peeters
Plesa-Skwerer
Raven
Reiss
Rutter
Schopler
Searcy
Singh
Strømme
Tager-Flusberg
Tassabehji
Teunisse
Teunisse
Thomas
Tranel
Vicki Bruce
Wing
Zebrowitz
Publication venue: 'Wiley'
Publication date: 01/03/2008
Field of study

Individuals with Williams syndrome (WS) and autism are characterized by different social phenotypes but have been said to show similar atypicalities of face-processing style. Although the structural encoding of faces may be similarly atypical in these two developmental disorders, there are clear differences in overall face skills. The inclusion of both populations in the same study can address how the profile of face skills varies across disorders. The current paper explored the processing of identity, eye-gaze, lip-reading, and expressions of emotion using the same participants across face domains. The tasks had previously been used to make claims of a modular structure to face perception in typical development. Participants with WS (N=15) and autism (N=20) could be dissociated from each other, and from individuals with general developmental delay, in the domains of eye-gaze and expression processing. Individuals with WS were stronger at these skills than individuals with autism. Even if the structural encoding of faces appears similarly atypical in these groups, the overall profile of face skills, as well as the underlying architecture of face perception, varies greatly. The research provides insights into typical and atypical models of face perception in WS and autism

Northumbria Research Link

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Author: Beerends John G
Chung Joon Son
Cornu Thomas Le
Lan Yuxuan
Lee Daehyun
Ngiam Jiquan
Pachoud Samuel
Summerfield Quentin
Thiede Thilo
Zimmermann Marina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2018
Field of study

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

arXiv.org e-Print Archive

Crossref