3,174 research outputs found

    Gaze-Direction-Based MEG Averaging During Audiovisual Speech Perception

    Get PDF
    To take a step towards real-life-like experimental setups, we simultaneously recorded magnetoencephalographic (MEG) signals and subject's gaze direction during audiovisual speech perception. The stimuli were utterances of /apa/ dubbed onto two side-by-side female faces articulating /apa/ (congruent) and /aka/ (incongruent) in synchrony, repeated once every 3 s. Subjects (N = 10) were free to decide which face they viewed, and responses were averaged to two categories according to the gaze direction. The right-hemisphere 100-ms response to the onset of the second vowel (N100m’) was a fifth smaller to incongruent than congruent stimuli. The results demonstrate the feasibility of realistic viewing conditions with gaze-based averaging of MEG signals

    The Conversation: Deep Audio-Visual Speech Enhancement

    Full text link
    Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos. Existing works in this area have focussed on trying to separate utterances from known speakers in controlled environments. In this paper, we propose a deep audio-visual speech enhancement network that is able to separate a speaker's voice given lip regions in the corresponding video, by predicting both the magnitude and the phase of the target signal. The method is applicable to speakers unheard and unseen during training, and for unconstrained environments. We demonstrate strong quantitative and qualitative results, isolating extremely challenging real-world examples.Comment: To appear in Interspeech 2018. We provide supplementary material with interactive demonstrations on http://www.robots.ox.ac.uk/~vgg/demo/theconversatio

    Reading-Related Brain Changes in Audiovisual Processing: Cross-Sectional and Longitudinal MEG Evidence

    Get PDF
    Published July 7, 2021The ability to establish associations between visual objects and speech sounds is essential for human reading. Understanding the neural adjustments required for acquisition of these arbitrary audiovisual associations can shed light on fundamental reading mechanisms and help reveal how literacy builds on pre-existing brain circuits. To address these questions, the present longitudinal and cross-sectional MEG studies characterize the temporal and spatial neural correlates of audiovisual syllable congruency in children (age range, 4–9 years; 22 males and 20 females) learning to read. Both studies showed that during the first years of reading instruction children gradually set up audiovisual correspondences between letters and speech sounds, which can be detected within the first 400 ms of a bimodal presentation and recruit the superior portions of the left temporal cortex. These findings suggest that children progressively change the way they treat audiovisual syllables as a function of their reading experience. This reading-specific brain plasticity implies (partial) recruitment of pre-existing brain circuits for audiovisual analysis.This project received funding from the European Union’s Horizon 2020 research and innovation program under Marie Sklodowska-Curie Grant Agreement No. 837228 (H2020-MSCA-IF-2018-837228-ENGRAVING). The project was also funded by the Spanish Ministry of Economy, Industry and Competitiveness (Grant PSI2017- 82941-P), the Basque Government through the BERC 2018-2021 Program, and the Agencia Estatal de Investigación through BCBL (Basque Center on Cognition, Brain and Language) Severo Ochoa excellence accreditation SEV-2015-0490

    MEG, PSYCHOPHYSICAL AND COMPUTATIONAL STUDIES OF LOUDNESS, TIMBRE, AND AUDIOVISUAL INTEGRATION

    Get PDF
    Natural scenes and ecological signals are inherently complex and understanding of their perception and processing is incomplete. For example, a speech signal contains not only information at various frequencies, but is also not static; the signal is concurrently modulated temporally. In addition, an auditory signal may be paired with additional sensory information, as in the case of audiovisual speech. In order to make sense of the signal, a human observer must process the information provided by low-level sensory systems and integrate it across sensory modalities and with cognitive information (e.g., object identification information, phonetic information). The observer must then create functional relationships between the signals encountered to form a coherent percept. The neuronal and cognitive mechanisms underlying this integration can be quantified in several ways: by taking physiological measurements, assessing behavioral output for a given task and modeling signal relationships. While ecological tokens are complex in a way that exceeds our current understanding, progress can be made by utilizing synthetic signals that encompass specific essential features of ecological signals. The experiments presented here cover five aspects of complex signal processing using approximations of ecological signals : (i) auditory integration of complex tones comprised of different frequencies and component power levels; (ii) audiovisual integration approximating that of human speech; (iii) behavioral measurement of signal discrimination; (iv) signal classification via simple computational analyses and (v) neuronal processing of synthesized auditory signals approximating speech tokens. To investigate neuronal processing, magnetoencephalography (MEG) is employed to assess cortical processing non-invasively. Behavioral measures are employed to evaluate observer acuity in signal discrimination and to test the limits of perceptual resolution. Computational methods are used to examine the relationships in perceptual space and physiological processing between synthetic auditory signals, using features of the signals themselves as well as biologically-motivated models of auditory representation. Together, the various methodologies and experimental paradigms advance the understanding of ecological signal analytics concerning the complex interactions in ecological signal structure

    Audiovisual speech perception in cochlear implant patients

    Get PDF
    Hearing with a cochlear implant (CI) is very different compared to a normal-hearing (NH) experience, as the CI can only provide limited auditory input. Nevertheless, the central auditory system is capable of learning how to interpret such limited auditory input such that it can extract meaningful information within a few months after implant switch-on. The capacity of the auditory cortex to adapt to new auditory stimuli is an example of intra-modal plasticity — changes within a sensory cortical region as a result of altered statistics of the respective sensory input. However, hearing deprivation before implantation and restoration of hearing capacities after implantation can also induce cross-modal plasticity — changes within a sensory cortical region as a result of altered statistics of a different sensory input. Thereby, a preserved cortical region can, for example, support a deprived cortical region, as in the case of CI users which have been shown to exhibit cross-modal visual-cortex activation for purely auditory stimuli. Before implantation, during the period of hearing deprivation, CI users typically rely on additional visual cues like lip-movements for understanding speech. Therefore, it has been suggested that CI users show a pronounced binding of the auditory and visual systems, which may allow them to integrate auditory and visual speech information more efficiently. The projects included in this thesis investigate auditory, and particularly audiovisual speech processing in CI users. Four event-related potential (ERP) studies approach the matter from different perspectives, each with a distinct focus. The first project investigates how audiovisually presented syllables are processed by CI users with bilateral hearing loss compared to NH controls. Previous ERP studies employing non-linguistic stimuli and studies using different neuroimaging techniques found distinct audiovisual interactions in CI users. However, the precise timecourse of cross-modal visual-cortex recruitment and enhanced audiovisual interaction for speech related stimuli is unknown. With our ERP study we fill this gap, and we present differences in the timecourse of audiovisual interactions as well as in cortical source configurations between CI users and NH controls. The second study focuses on auditory processing in single-sided deaf (SSD) CI users. SSD CI patients experience a maximally asymmetric hearing condition, as they have a CI on one ear and a contralateral NH ear. Despite the intact ear, several behavioural studies have demonstrated a variety of beneficial effects of restoring binaural hearing, but there are only few ERP studies which investigate auditory processing in SSD CI users. Our study investigates whether the side of implantation affects auditory processing and whether auditory processing via the NH ear of SSD CI users works similarly as in NH controls. Given the distinct hearing conditions of SSD CI users, the question arises whether there are any quantifiable differences between CI user with unilateral hearing loss and bilateral hearing loss. In general, ERP studies on SSD CI users are rather scarce, and there is no study on audiovisual processing in particular. Furthermore, there are no reports on lip-reading abilities of SSD CI users. To this end, in the third project we extend the first study by including SSD CI users as a third experimental group. The study discusses both differences and similarities between CI users with bilateral hearing loss and CI users with unilateral hearing loss as well as NH controls and provides — for the first time — insights into audiovisual interactions in SSD CI users. The fourth project investigates the influence of background noise on audiovisual interactions in CI users and whether a noise-reduction algorithm can modulate these interactions. It is known that in environments with competing background noise listeners generally rely more strongly on visual cues for understanding speech and that such situations are particularly difficult for CI users. As shown in previous auditory behavioural studies, the recently introduced noise-reduction algorithm "ForwardFocus" can be a useful aid in such cases. However, the questions whether employing the algorithm is beneficial in audiovisual conditions as well and whether using the algorithm has a measurable effect on cortical processing have not been investigated yet. In this ERP study, we address these questions with an auditory and audiovisual syllable discrimination task. Taken together, the projects included in this thesis contribute to a better understanding of auditory and especially audiovisual speech processing in CI users, revealing distinct processing strategies employed to overcome the limited input provided by a CI. The results have clinical implications, as they suggest that clinical hearing assessments, which are currently purely auditory, should be extended to audiovisual assessments. Furthermore, they imply that rehabilitation including audiovisual training methods may be beneficial for all CI user groups for quickly achieving the most effective CI implantation outcome

    Video-aided model-based source separation in real reverberant rooms

    Get PDF
    Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete timefrequency points. The model parameters are refined with the wellknown expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better timefrequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited

    Electrophysiological differences and similarities in audiovisual speech processing in CI users with unilateral and bilateral hearing loss.

    Get PDF
    Hearing with a cochlear implant (CI) is limited compared to natural hearing. Although CI users may develop compensatory strategies, it is currently unknown whether these extend from auditory to visual functions, and whether compensatory strategies vary between different CI user groups. To better understand the experience-dependent contributions to multisensory plasticity in audiovisual speech perception, the current event-related potential (ERP) study presented syllables in auditory, visual, and audiovisual conditions to CI users with unilateral or bilateral hearing loss, as well as to normal-hearing (NH) controls. Behavioural results revealed shorter audiovisual response times compared to unisensory conditions for all groups. Multisensory integration was confirmed by electrical neuroimaging, including topographic and ERP source analysis, showing a visual modulation of the auditory-cortex response at N1 and P2 latency. However, CI users with bilateral hearing loss showed a distinct pattern of N1 topography, indicating a stronger visual impact on auditory speech processing compared to CI users with unilateral hearing loss and NH listeners. Furthermore, both CI user groups showed a delayed auditory-cortex activation and an additional recruitment of the visual cortex, and a better lip-reading ability compared to NH listeners. In sum, these results extend previous findings by showing distinct multisensory processes not only between NH listeners and CI users in general, but even between CI users with unilateral and bilateral hearing loss. However, the comparably enhanced lip-reading ability and visual-cortex activation in both CI user groups suggest that these visual improvements are evident regardless of the hearing status of the contralateral ear
    • 

    corecore