902 research outputs found
Evaluation of auditory-visual speech perception in individuals diagnosed with dementia of the Alzheimer\u27s type
Auditory-visual speech perception testing was completed using wordandconsonant-level stimuli in individuals with known degrees of dementia of theAlzheimerâs type. The correlations with the cognitive measures and the speechperception measures (A-only, V-only, AV, VE or AE) did not reveal significantrelationships
The Neurobiology of Audiovisual Integration: A Voxel-Based Lesion Symptom Mapping Study
abstract: Audiovisual (AV) integration is a fundamental component of face-to-face communication. Visual cues generally aid auditory comprehension of communicative intent through our innate ability to âfuseâ auditory and visual information. However, our ability for multisensory integration can be affected by damage to the brain. Previous neuroimaging studies have indicated the superior temporal sulcus (STS) as the center for AV integration, while others suggest inferior frontal and motor regions. However, few studies have analyzed the effect of stroke or other brain damage on multisensory integration in humans. The present study examines the effect of lesion location on auditory and AV speech perception through behavioral and structural imaging methodologies in 41 left-hemisphere participants with chronic focal cerebral damage. Participants completed two behavioral tasks of speech perception: an auditory speech perception task and a classic McGurk paradigm measuring congruent (auditory and visual stimuli match) and incongruent (auditory and visual stimuli do not match, creating a âfusedâ percept of a novel stimulus) AV speech perception. Overall, participants performed well above chance on both tasks. Voxel-based lesion symptom mapping (VLSM) across all 41 participants identified several regions as critical for speech perception depending on trial type. Heschlâs gyrus and the supramarginal gyrus were identified as critical for auditory speech perception, the basal ganglia was critical for speech perception in AV congruent trials, and the middle temporal gyrus/STS were critical in AV incongruent trials. VLSM analyses of the AV incongruent trials were used to further clarify the origin of âerrorsâ, i.e. lack of fusion. Auditory capture (auditory stimulus) responses were attributed to visual processing deficits caused by lesions in the posterior temporal lobe, whereas visual capture (visual stimulus) responses were attributed to lesions in the anterior temporal cortex, including the temporal pole, which is widely considered to be an amodal semantic hub. The implication of anterior temporal regions in AV integration is novel and warrants further study. The behavioral and VLSM results are discussed in relation to previous neuroimaging and case-study evidence; broadly, our findings coincide with previous work indicating that multisensory superior temporal cortex, not frontal motor circuits, are critical for AV integration.Dissertation/ThesisMasters Thesis Communication Disorders 201
The impact of automatic exaggeration of the visual articulatory features of a talker on the intelligibility of spectrally distorted speech
Visual speech information plays a key role in supporting speech perception, especially
when acoustic features are distorted or inaccessible. Recent research suggests that for
spectrally distorted speech, the use of visual speech in auditory training improves not only
subjectsâ audiovisual speech recognition, but also their subsequent auditory-only speech
recognition. Visual speech cues, however, can be affected by a number of facial visual signals
that vary across talkers, such as lip emphasis and speaking style. In a previous study, we
enhanced the visual speech videos used in perception training by automatically tracking
and colouring a talkerâs lips. This improved the subjectsâ audiovisual and subsequent
auditory speech recognition compared with those who were trained via unmodified videos or
audio-only methods. In this paper, we report on two issues related to automatic exaggeration
of the movement of the lips/ mouth area. First, we investigate subjectsâ ability to adapt
to the conflict between the articulation energy in the visual signals and the vocal effort in
the acoustic signals (since the acoustic signals remained unexaggerated). Second, we have
examined whether or not
this visual exaggeration can improve the subjectsâ performance of auditory and audiovisual
speech recognition when used in perception training. To test this concept, we used spectrally
distorted speech to train groups of listeners using four different training regimes: (1) audio
only, (2) audiovisual, (3) audiovisual visually exaggerated, and (4) audiovisual visually
exaggerated and lip-coloured. We used spectrally distorted speech (cochlear-implant-simulated
speech) because the longer-term aim of our work is to employ these concepts in a training
system for cochlear-implant (CI) users.
The results suggest that after exposure to visually exaggerated speech, listeners had the
ability to adapt alongside the conflicting audiovisual signals. In addition, subjects trained
with enhanced visual cues (regimes 3 and 4) achieved better audiovisual recognition for
a number of phoneme classes than those who were trained with unmodified visual speech
(regime 2). There was no evidence of an improvement in the subsequent audio-only listening
skills, however. The subjectsâ adaptation to the conflicting audiovisual signals may have
slowed down auditory perceptual learning, and impeded the ability of the visual speech to
improve the training gains
Recommended from our members
The influence of visual information on the perception of auditory speech in quiet and noise
Audio-visual (AV) integration involves the combining of auditory and visual information which is often required for everyday face to face communication. Speech perception becomes difficult in situations when it is harder to hear the voice of the speaker. When the ability to identify speech in noise is reduced, people with normal hearing improve with the addition of visual information; when they can see the talker's face (Sumby & Pollack, 1954). Exactly how visual information is used in background noise is not well understood. The goal of the thesis was to understand the influence of visual information on auditory speech perception using a famous measure of AV integration (The McGurk effect). Four experiments are reported which aimed to a) explore the use of the McGurk effect as a measure of AV integration, b) understand the influence of visual information in quiet and noise, and how auditory and visual information interact when one or both of the modalities is degraded, and c) provide insight into theories of AV integration through using behavioural measures. The main findings were that 1) instances of the McGurk effect are influenced by the type of task used, and vary according to different stimuli and participants, 2) The McGurk effect can still be perceived even when the visual stimulus is highly degraded although the illusion decreases as visual blur increases, 3) fixating the mouth is not necessary for perceiving the McGurk effect, 4) Visual benefit increases as the clarity of the visual stimulus increases. Overall, the findings suggest that visual information is of most benefit when it is clear, looking at the mouth is not necessary for AV integration in quiet but increases the likelihood of successful integration when speech is presented in auditory noise
Models of Speech Processing
One of the fundamental questions about language is how listeners map the acoustic signal onto
syllables, words, and sentences, resulting in understanding of speech. For normal listeners, this
mapping is so effortless that one rarely stops to consider just how it takes place. However, studies
of speech have shown that this acoustic signal contains a great deal of underlying complexity.
A number of competing models seek to explain how these intricate processes work. Such models
have often narrowed the problem to mapping the speech signal onto isolated words, setting aside
the complexity of segmenting continuous speech. Continuous speech has presented a significant
challenge for many models because of the high variability of the signal and the difficulties involved
in resolving the signal into individual words.
The importance of understanding speech becomes particularly apparent when neurological
disease affects this seemingly basic ability. Lesion studies have explored impairments of speech
sound processing to determine whether deficits occur in perceptual analysis of acoustic-phonetic
information or in stored abstract phonological representations (e.g., Basso, Casati,& Vignolo, 1977;
Blumstein, Cooper, Zurif,& Caramazza, 1977). Furthermore, researchers have attempted to determine
in what ways underlying phonological/phonetic impairments may contribute to auditory
comprehension deficits (Blumstein, Baker, & Goodglass, 1977).
In this chapter, we discuss several psycholinguistic models of word recognition (the process of
mapping the speech signal onto the lexicon), and outline how components of such models might
correspond to the functional anatomy of the brain. We will also relate evidence from brain lesion
and brain activation studies to components of such models. We then present some approaches that
deal with speech perception more generally, and touch on a few current topics of debate.National Institutes of Health under grant NIH DC R01â3378 to the senior author (SLS
Where on the face do we look during phonemic restoration: An eye-tracking study
Face to face communication typically involves audio and visual components to the speech signal. To examine the effect of task demands on gaze patterns in response to a speaking face, adults participated in two eye-tracking experiments with an audiovisual (articulatory information from the mouth was visible) and a pixelated condition (articulatory information was not visible). Further, task demands were manipulated by having listeners respond in a passive (no response) or an active (button press response) context. The active experiment required participants to discriminate between speech stimuli and was designed to mimic environmental situations which require one to use visual information to disambiguate the speakerâs message, simulating different listening conditions in real-world settings. Stimuli included a clear exemplar of the syllable /ba/ and a second exemplar in which the formant initial consonant was reduced creating an /a/âlike consonant. Consistent with our hypothesis, results revealed that the greatest fixations to the mouth were present in the audiovisual active experiment and visual articulatory information led to a phonemic restoration effect for the /a/ speech token. In the pixelated condition, participants fixated on the eyes, and discrimination of the deviant token within the active experiment was significantly greater than the audiovisual condition. These results suggest that when required to disambiguate changes in speech, adults may look to the mouth for additional cues to support processing when it is available
- âŠ