60,927 research outputs found

    Can you "read tongue movements"?

    Get PDF
    International audienceLip reading relies on visible articulators to ease audiovisual speech understanding. However, lips and face alone provide very incomplete phonetic information: the tongue, that is generally not entirely seen, carries an important part of the articulatory information not accessible through lip reading. The question was thus whether the direct and full vision of the tongue allows tongue reading. We have therefore generated a set of audiovisual VCV stimuli by controlling an audiovisual talking head that can display all speech articulators, including tongue, in an augmented speech mode, from articulators movements tracked on a speaker. These stimuli have been played to subjects in a series of audiovisual perception tests in various presentation conditions (audio signal alone, audiovisual signal with profile cutaway display with or without tongue, complete face), at various Signal-to-Noise Ratios. The results show a given implicit effect of tongue reading learning, a preference for the more ecological rendering of the complete face in comparison with the cutaway presentation, a predominance of lip reading over tongue reading, but the capability of tongue reading to take over when the audio signal is strongly degraded or absent. We conclude that these tongue reading capabilities could be used for applications in the domain of speech therapy for speech retarded children, perception and production rehabilitation of hearing impaired children, and pronunciation training for second language learner

    Exploring face perception in disorders of development: evidence from Williams syndrome and autism

    Get PDF
    Individuals with Williams syndrome (WS) and autism are characterized by different social phenotypes but have been said to show similar atypicalities of face-processing style. Although the structural encoding of faces may be similarly atypical in these two developmental disorders, there are clear differences in overall face skills. The inclusion of both populations in the same study can address how the profile of face skills varies across disorders. The current paper explored the processing of identity, eye-gaze, lip-reading, and expressions of emotion using the same participants across face domains. The tasks had previously been used to make claims of a modular structure to face perception in typical development. Participants with WS (N=15) and autism (N=20) could be dissociated from each other, and from individuals with general developmental delay, in the domains of eye-gaze and expression processing. Individuals with WS were stronger at these skills than individuals with autism. Even if the structural encoding of faces appears similarly atypical in these groups, the overall profile of face skills, as well as the underlying architecture of face perception, varies greatly. The research provides insights into typical and atypical models of face perception in WS and autism

    Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Full text link
    Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore
    • …
    corecore