102 research outputs found

    Multi-Level Audio-Visual Interactions in Speech and Language Perception

    Get PDF
    That we perceive our environment as a unified scene rather than individual streams of auditory, visual, and other sensory information has recently provided motivation to move past the long-held tradition of studying these systems separately. Although they are each unique in their transduction organs, neural pathways, and cortical primary areas, the senses are ultimately merged in a meaningful way which allows us to navigate the multisensory world. Investigating how the senses are merged has become an increasingly wide field of research in recent decades, with the introduction and increased availability of neuroimaging techniques. Areas of study range from multisensory object perception to cross-modal attention, multisensory interactions, and integration. This thesis focuses on audio-visual speech perception, with special focus on facilitatory effects of visual information on auditory processing. When visual information is concordant with auditory information, it provides an advantage that is measurable in behavioral response times and evoked auditory fields (Chapter 3) and in increased entrainment to multisensory periodic stimuli reflected by steady-state responses (Chapter 4). When the audio-visual information is incongruent, the combination can often, but not always, combine to form a third, non-physically present percept (known as the McGurk effect). This effect is investigated (Chapter 5) using real word stimuli. McGurk percepts were not robustly elicited for a majority of stimulus types, but patterns of responses suggest that the physical and lexical properties of the auditory and visual stimulus may affect the likelihood of obtaining the illusion. Together, these experiments add to the growing body of knowledge that suggests that audio-visual interactions occur at multiple stages of processing

    Audio-visual speech perception: a developmental ERP investigation

    Get PDF
    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-visual speech perception in adults, with visual cues reliably modulating auditory ERP responses to speech. Previous work has shown congruence-dependent shortening of auditory N1/P2 latency and congruence-independent attenuation of amplitude in the presence of auditory and visual speech signals, compared to auditory alone. The aim of this study was to chart the development of these well-established modulatory effects over mid-to-late childhood. Experiment 1 employed an adult sample to validate a child-friendly stimulus set and paradigm by replicating previously observed effects of N1/P2 amplitude and latency modulation by visual speech cues; it also revealed greater attenuation of component amplitude given incongruent audio-visual stimuli, pointing to a new interpretation of the amplitude modulation effect. Experiment 2 used the same paradigm to map cross-sectional developmental change in these ERP responses between 6 and 11 years of age. The effect of amplitude modulation by visual cues emerged over development, while the effect of latency modulation was stable over the child sample. These data suggest that auditory ERP modulation by visual speech represents separable underlying cognitive processes, some of which show earlier maturation than others over the course of development

    Dynamic Facial Expressions Prime the Processing of Emotional Prosody

    Get PDF
    Evidence suggests that emotion is represented supramodally in the human brain. Emotional facial expressions, which often precede vocally expressed emotion in real life, can modulate event-related potentials (N100 and P200) during emotional prosody processing. To investigate these cross-modal emotional interactions, two lines of research have been put forward: cross-modal integration and cross-modal priming. In cross-modal integration studies, visual and auditory channels are temporally aligned, while in priming studies they are presented consecutively. Here we used cross-modal emotional priming to study the interaction of dynamic visual and auditory emotional information. Specifically, we presented dynamic facial expressions (angry, happy, neutral) as primes and emotionally-intoned pseudo-speech sentences (angry, happy) as targets. We were interested in how prime-target congruency would affect early auditory event-related potentials, i.e., N100 and P200, in order to shed more light on how dynamic facial information is used in cross-modal emotional prediction. Results showed enhanced N100 amplitudes for incongruently primed compared to congruently and neutrally primed emotional prosody, while the latter two conditions did not significantly differ. However, N100 peak latency was significantly delayed in the neutral condition compared to the other two conditions. Source reconstruction revealed that the right parahippocampal gyrus was activated in incongruent compared to congruent trials in the N100 time window. No significant ERP effects were observed in the P200 range. Our results indicate that dynamic facial expressions influence vocal emotion processing at an early point in time, and that an emotional mismatch between a facial expression and its ensuing vocal emotional signal induces additional processing costs in the brain, potentially because the cross-modal emotional prediction mechanism is violated in case of emotional prime-target incongruency

    Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex

    Get PDF
    Speech perception is a central component of social communication. While principally an auditory process, accurate speech perception in everyday settings is supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research has shown that visual speech modulates activity in cortical areas subserving auditory speech perception, including the superior temporal gyrus (STG), potentially through feedback connections from the multisensory posterior superior temporal sulcus (pSTS). However, it is unknown whether visual modulation of auditory processing in the STG is a unitary phenomenon or, rather, consists of multiple temporally, spatially, or functionally distinct processes. To explore these questions, we examined neural responses to audiovisual speech measured from intracranially implanted electrodes within the temporal cortex of 21 patients undergoing clinical monitoring for epilepsy. We found that visual speech modulates auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differ across theta, beta, and high-gamma frequency bands. Before speech onset, visual information increased high-gamma power in the posterior STG and suppressed beta power in mid-STG regions, suggesting crossmodal prediction of speech signals in these areas. After sound onset, visual speech decreased theta power in the middle and posterior STG, potentially reflecting a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception and provide a crucial map for subsequent studies to identify the types of visual features that are encoded by these separate mechanisms.This study was supported by NIH Grant R00 DC013828 A. Beltz was supported by the Jacobs Foundation.http://deepblue.lib.umich.edu/bitstream/2027.42/167729/1/OriginalManuscript.pdfDescription of OriginalManuscript.pdf : Preprint of the article "Multiple auditory responses to visual speech"SEL

    Implicit agency in observed actions: evidence for N1 suppression of tones caused by self-made and observed actions

    Get PDF
    Every day we make attributions about how our actions and the actions of others cause consequences in the world around us. It is unknown whether we use the same implicit process in attributing causality when observing others' actions as we do when making our own. The aim of this research was to investigate the neural processes involved in the implicit sense of agency we form between actions and effects, for both our own actions and when watching others' actions. Using an interval estimation paradigm to elicit intentional binding in self-made and observed actions, we measured the EEG responses indicative of anticipatory processes before an action and the ERPs in response to the sensory consequence. We replicated our previous findings that we form a sense of implicit agency over our own and others' actions. Crucially, EEG results showed that tones caused by either self-made or observed actions both resulted in suppression of the N1 component of the sensory ERP, with no difference in suppression between consequences caused by observed actions compared with self-made actions. Furthermore, this N1 suppression was greatest for tones caused by observed goal-directed actions rather than non-action or non-goal-related visual events. This suggests that top–down processes act upon the neural responses to sensory events caused by goal-directed actions in the same way for events caused by the self or those made by other agents

    The timecourse of multisensory speech processing in unilaterally stimulated cochlear implant users revealed by ERPs.

    Get PDF
    A cochlear implant (CI) is an auditory prosthesis which can partially restore the auditory function in patients with severe to profound hearing loss. However, this bionic device provides only limited auditory information, and CI patients may compensate for this limitation by means of a stronger interaction between the auditory and visual system. To better understand the electrophysiological correlates of audiovisual speech perception, the present study used electroencephalography (EEG) and a redundant target paradigm. Postlingually deafened CI users and normal-hearing (NH) listeners were compared in auditory, visual and audiovisual speech conditions. The behavioural results revealed multisensory integration for both groups, as indicated by shortened response times for the audiovisual as compared to the two unisensory conditions. The analysis of the N1 and P2 event-related potentials (ERPs), including topographic and source analyses, confirmed a multisensory effect for both groups and showed a cortical auditory response which was modulated by the simultaneous processing of the visual stimulus. Nevertheless, the CI users in particular revealed a distinct pattern of N1 topography, pointing to a strong visual impact on auditory speech processing. Apart from these condition effects, the results revealed ERP differences between CI users and NH listeners, not only in N1/P2 ERP topographies, but also in the cortical source configuration. When compared to the NH listeners, the CI users showed an additional activation in the visual cortex at N1 latency, which was positively correlated with CI experience, and a delayed auditory-cortex activation with a reversed, rightward functional lateralisation. In sum, our behavioural and ERP findings demonstrate a clear audiovisual benefit for both groups, and a CI-specific alteration in cortical activation at N1 latency when auditory and visual input is combined. These cortical alterations may reflect a compensatory strategy to overcome the limited CI input, which allows the CI users to improve the lip-reading skills and to approximate the behavioural performance of NH listeners in audiovisual speech conditions. Our results are clinically relevant, as they highlight the importance of assessing the CI outcome not only in auditory-only, but also in audiovisual speech conditions
    corecore