169 research outputs found

    Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age

    Get PDF
    The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers’ lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations

    Cross-modal functional connectivity supports speech understanding in cochlear implant users

    Get PDF
    Sensory deprivation can lead to cross-modal cortical changes, whereby sensory brain regions deprived of input may be recruited to perform atypical function. Enhanced cross-modal responses to visual stimuli observed in auditory cortex of postlingually deaf cochlear implant (CI) users are hypothesized to reflect increased activation of cortical language regions, but it is unclear if this cross-modal activity is adaptive or mal-adaptive for speech understanding. To determine if increased activation of language regions is correlated with better speech understanding in CI users, we assessed task-related activation and functional connectivity of auditory and visual cortices to auditory and visual speech and non-speech stimuli in CI users (n = 14) and normal-hearing listeners (n = 17) and used functional near-infrared spectroscopy to measure hemodynamic responses. We used visually presented speech and non-speech to investigate neural processes related to linguistic content and observed that CI users show beneficial cross-modal effects. Specifically, an increase in connectivity between the left auditory and visual cortices-presumed primary sites of cortical language processing-was positively correlated with CI users\u27 abilities to understand speech in background noise. Cross-modal activity in auditory cortex of postlingually deaf CI users may reflect adaptive activity of a distributed, multimodal speech network, recruited to enhance speech understanding

    Shared and modality-specific brain regions that mediate auditory and visual word comprehension

    Get PDF
    Keitel A, Gross J, Kayser C. Shared and modality-specific brain regions that mediate auditory and visual word comprehension. eLife. 2020;9: e56972.Visual speech carried by lip movements is an integral part of communication. Yet, it remains unclear in how far visual and acoustic speech comprehension are mediated by the same brain regions. Using multivariate classification of full-brain MEG data, we first probed where the brain represents acoustically and visually conveyed word identities. We then tested where these sensory-driven representations are predictive of participants' trial-wise comprehension. The comprehension-relevant representations of auditory and visual speech converged only in anterior angular and inferior frontal regions and were spatially dissociated from those representations that best reflected the sensory-driven word identity. These results provide a neural explanation for the behavioural dissociation of acoustic and visual speech comprehension and suggest that cerebral representations encoding word identities may be more modality-specific than often upheld. © 2020, Keitel et al

    Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information

    Get PDF
    Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility

    Event Related Potential Evidence of Enhanced Visual Processing in Auditory Associated Cortex in Adults with Hearing Loss

    Get PDF
    Objective: The present study investigated the characteristics of visual processing in the auditory-associated cortex in adults with hearing loss using event-related potentials. Methods: Ten subjects with bilateral postlingual hearing loss were recruited. Ten age- and sex-matched normal-hearing subjects were included as controls. Visual (“sound” and “non-sound” photos)-evoked potentials were performed. The P170 response in the occipital area as well as N1 and N2 responses in FC3 and FC4 were analyzed. Results: Adults with hearing loss had higher P170 amplitudes, significantly higher N2 amplitudes, and shorter N2 latency in response to “sound” and “non-sound” photo stimuli at both FC3 and FC4, with the exception of the N2 amplitude which responded to “sound” photo stimuli at FC3. Further topographic mapping analysis revealed that patients had a large difference in response to “sound” and “non-sound” photos in the right frontotemporal area, starting from approximately 200 to 400 ms. Localization of source showed the difference to be located in the middle frontal gyrus region (BA10) at around 266 ms. Conclusions: The significantly stronger responses to visual stimuli indicate enhanced visual processing in the auditory-associated cortex in adults with hearing loss, which may be attributed to cortical visual reorganization involving the right frontotemporal cortex

    Phonetic recalibration in audiovisual speech

    Get PDF

    Investigating the Neural Basis of Audiovisual Speech Perception with Intracranial Recordings in Humans

    Get PDF
    Speech is inherently multisensory, containing auditory information from the voice and visual information from the mouth movements of the talker. Hearing the voice is usually sufficient to understand speech, however in noisy environments or when audition is impaired due to aging or disabilities, seeing mouth movements greatly improves speech perception. Although behavioral studies have well established this perceptual benefit, it is still not clear how the brain processes visual information from mouth movements to improve speech perception. To clarify this issue, I studied the neural activity recorded from the brain surfaces of human subjects using intracranial electrodes, a technique known as electrocorticography (ECoG). First, I studied responses to noisy speech in the auditory cortex, specifically in the superior temporal gyrus (STG). Previous studies identified the anterior parts of the STG as unisensory, responding only to auditory stimulus. On the other hand, posterior parts of the STG are known to be multisensory, responding to both auditory and visual stimuli, which makes it a key region for audiovisual speech perception. I examined how these different parts of the STG respond to clear versus noisy speech. I found that noisy speech decreased the amplitude and increased the across-trial variability of the response in the anterior STG. However, possibly due to its multisensory composition, posterior STG was not as sensitive to auditory noise as the anterior STG and responded similarly to clear and noisy speech. I also found that these two response patterns in the STG were separated by a sharp boundary demarcated by the posterior-most portion of the Heschl’s gyrus. Second, I studied responses to silent speech in the visual cortex. Previous studies demonstrated that visual cortex shows response enhancement when the auditory component of speech is noisy or absent, however it was not clear which regions of the visual cortex specifically show this response enhancement and whether this response enhancement is a result of top-down modulation from a higher region. To test this, I first mapped the receptive fields of different regions in the visual cortex and then measured their responses to visual (silent) and audiovisual speech stimuli. I found that visual regions that have central receptive fields show greater response enhancement to visual speech, possibly because these regions receive more visual information from mouth movements. I found similar response enhancement to visual speech in frontal cortex, specifically in the inferior frontal gyrus, premotor and dorsolateral prefrontal cortices, which have been implicated in speech reading in previous studies. I showed that these frontal regions display strong functional connectivity with visual regions that have central receptive fields during speech perception

    Speech-evoked activation in adult temporal cortex measured using functional near-infrared spectroscopy (fNIRS): Are the measurements reliable?

    Get PDF
    Functional near-infrared spectroscopy (fNIRS) is a silent, non-invasive neuroimaging technique that is potentially well suited to auditory research. However, the reliability of auditory-evoked activation measured using fNIRS is largely unknown. The present study investigated the test-retest reliability of speech-evoked fNIRS responses in normally-hearing adults. Seventeen participants underwent fNIRS imaging in two sessions separated by three months. In a block design, participants were presented with auditory speech, visual speech (silent speechreading), and audiovisual speech conditions. Optode arrays were placed bilaterally over the temporal lobes, targeting auditory brain regions. A range of established metrics was used to quantify the reproducibility of cortical activation patterns, as well as the amplitude and time course of the haemodynamic response within predefined regions of interest. The use of a signal processing algorithm designed to reduce the influence of systemic physiological signals was found to be crucial to achieving reliable detection of significant activation at the group level. For auditory speech (with or without visual cues), reliability was good to excellent at the group level, but highly variable among individuals. Temporal-lobe activation in response to visual speech was less reliable, especially in the right hemisphere. Consistent with previous reports, fNIRS reliability was improved by averaging across a small number of channels overlying a cortical region of interest. Overall, the present results confirm that fNIRS can measure speech-evoked auditory responses in adults that are highly reliable at the group level, and indicate that signal processing to reduce physiological noise may substantially improve the reliability of fNIRS measurements
    • …
    corecore