824 research outputs found

    Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

    Full text link
    Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of “neuro-steered” hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)1 in which the information about the attended speech, as decoded from the subject’s brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices

    Computational modelling of neural mechanisms underlying natural speech perception

    Get PDF
    Humans are highly skilled at the analysis of complex auditory scenes. In particular, the human auditory system is characterized by incredible robustness to noise and can nearly effortlessly isolate the voice of a specific talker from even the busiest of mixtures. However, neural mechanisms underlying these remarkable properties remain poorly understood. This is mainly due to the inherent complexity of speech signals and multi-stage, intricate processing performed in the human auditory system. Understanding these neural mechanisms underlying speech perception is of interest for clinical practice, brain-computer interfacing and automatic speech processing systems. In this thesis, we developed computational models characterizing neural speech processing across different stages of the human auditory pathways. In particular, we studied the active role of slow cortical oscillations in speech-in-noise comprehension through a spiking neural network model for encoding spoken sentences. The neural dynamics of the model during noisy speech encoding reflected speech comprehension of young, normal-hearing adults. The proposed theoretical model was validated by predicting the effects of non-invasive brain stimulation on speech comprehension in an experimental study involving a cohort of volunteers. Moreover, we developed a modelling framework for detecting the early, high-frequency neural response to the uninterrupted speech in non-invasive neural recordings. We applied the method to investigate top-down modulation of this response by the listener's selective attention and linguistic properties of different words from a spoken narrative. We found that in both cases, the detected responses of predominantly subcortical origin were significantly modulated, which supports the functional role of feedback, between higher- and lower levels stages of the auditory pathways, in speech perception. The proposed computational models shed light on some of the poorly understood neural mechanisms underlying speech perception. The developed methods can be readily employed in future studies involving a range of experimental paradigms beyond these considered in this thesis.Open Acces

    Noise processing in the auditory system with applications in speech enhancement

    Get PDF
    Abstract: The auditory system is extremely efficient in extracting auditory information in the presence of background noise. However, speech enhancement algorithms, aimed at removing the background noise from a degraded speech signal, are not achieving results that are near the efficacy of the auditory system. The purpose of this study is thus to first investigate how noise affects the spiking activity of neurons in the auditory system and then use the brain activity in the presence of noise to design better speech enhancement algorithms. In order to investigate how noise affects the spiking activity of neurons, we first design a generalized linear model that relates the spiking activity of neurons to intrinsic and extrinsic covariates that can affect their activity, such as noise. From this model, we extract two metrics, one that shows the effects of noise on the spiking activity and another the relative effects of vocalization compared to noise. We use these metrics to analyze neural data, recorded from a structure of the auditory system named the inferior colliculus (IC), while presenting noisy vocalizations. We studied the effect of different kinds of noises (non-stationary, white and natural stationary), different vocalizations, different input sound levels and signal-to-noise ratios (SNR). We found that the presence of non-stationary noise increases the spiking activity of neurons, regardless of the SNR, input level or vocalization type. The presence of white or natural stationary noises however causes a great diversity of responses where the activity of sites could increase, decrease or remain unchanged. This shows that the noise invariance previously reported in the IC depends on the noisy conditions, which had not been observed before. We then address the problem of speech enhancement using information from the brain's processing in the presence of noise. It has been shown before that the brain waves of a listener strongly correlates with the speaker to which the listener attends. Given this, we design two speech enhancement algorithms with a denoising autoencoder structure, namely the Brain Enhanced Speech Denoiser (BESD) and U-shaped Brain Enhanced Speech Denoiser (U-BESD). These algorithms take advantage of the attended auditory information present in the brain activity of the listener to denoise a multi-talker speech. The U-BESD is built upon the BESD with the addition of skip connections and dilated convolutions. Compared to previously proposed approaches, BESD and U-BESD are trained in a single neural architecture, lowering the complexity of the algorithm. We investigate two experimental settings. In the first one, the attended speaker is known, referred to as the speaker-specific setting, and in the second one no prior information is available about the attended speaker, referred to as the speaker-independent setting. In the speaker-specific setting, we show that both the BESD and U-BESD algorithms surpass a similar denoising autoencoder. Moreover, we also show that in the speaker-independent setting, U-BESD surpasses the performance of the only known approach that also uses the brain's activity.Le systĂšme auditif est extrĂȘmement efficace pour extraire de l’information pertinente en prĂ©sence d’un bruit de fond. Par contre, les algorithmes de rehaussement de la parole, visant Ă  supprimer le bruit d’un signal de parole bruitĂ©, n’atteignent pas des rĂ©sultats proches de l’efficacitĂ© du systĂšme auditif. Le but de cette Ă©tude est donc d’abord d’étudier comment le bruit affecte l’activitĂ© neuronale dans le systĂšme auditif, puis d’utiliser l’activitĂ© cĂ©rĂ©brale en prĂ©sence de bruit pour concevoir de meilleurs algorithmes de rehaussement. Afin d’étudier comment le bruit peut affecter l’activitĂ© des neurones, nous concevons d’abord un modĂšle linĂ©aire gĂ©nĂ©ralisĂ© qui relie l’activitĂ© des neurones aux covariables intrinsĂšques et extrinsĂšques qui peuvent affecter leur activitĂ©, comme le bruit. De ce modĂšle, nous extrayons deux mĂ©triques, l’une qui permet d’étudier les effets du bruit sur l’activitĂ© neuronale et l’autre les effets relatifs sur cette activitĂ© de la vocalisation par rapport au bruit. Nous utilisons ces mĂ©triques pour analyser l’activitĂ© neuronale d’une structure du systĂšme auditif, nomĂ©e le colliculus infĂ©rieur (IC), enregistrĂ©e lors de la prĂ©sentation de vocalisations bruitĂ©es. Nous avons Ă©tudiĂ© l’effet de diffĂ©rents types de bruits, diffĂ©rentes vocalisations, diffĂ©rents niveaux sonores d’entrĂ©e et diffĂ©rents rapports signal sur bruit (SNR). Nous avons constatĂ© que la prĂ©sence de bruit non stationnaire augmente l’activitĂ© des neurones, quel que soit le SNR, le niveau d’entrĂ©e ou le type de vocalisation. La prĂ©sence de bruits stationnaires blancs ou naturels provoque cependant une grande diversitĂ© de rĂ©ponses oĂč l’activitĂ© des sites d’enregistrement pouvait augmenter, diminuer ou rester inchangĂ©e. Cela montre que l’invariance du bruit prĂ©cĂ©demment signalĂ©e dans l’IC dĂ©pend des conditions de bruit, ce qui n’avait pas Ă©tĂ© observĂ© auparavant. Nous abordons ensuite le problĂšme du rehaussement de la parole en utilisant de l’information provenant du cerveau. Il a Ă©tĂ© dĂ©montrĂ© auparavant que les ondes cĂ©rĂ©brales d’un auditeur sont fortement corrĂ©lĂ©es avec le locuteur auquel l’auditeur porte attention. Compte tenu de cette corrĂ©lation, nous concevons deux algorithmes de rehaussement de la parole, le Brain Enhanced Speech Denoiser (BESD) et le U-shaped Brain Enhanced Speech Denoiser (U-BESD), qui tirent parti de l’information prĂ©sente dans l’activitĂ© cĂ©rĂ©brale de l’auditeur pour dĂ©bruiter un signal de parole multi-locuteurs. L’U-BESD est construit Ă  partir du BESD avec l’ajout de sauts de connexions (skip connections) et de convolutions dilatĂ©es. De plus, BESD et U-BESD sont constituĂ©s respectivement d’un seul rĂ©seau qui nĂ©cessite un seul entraĂźnement, ce qui rĂ©duit la complexitĂ© de l’algorithme en comparaison avec les approches existantes. Nous Ă©tudions deux conditions expĂ©rimentales. Dans la premiĂšre, le locuteur auquel l’auditeur porte attention est connu, et dans la seconde, ce locuteur n’est pas connu. Dans le cadre du locuteur connu, nous montrons que les algorithmes BESD et U-BESD surpassent un autoencodeur similaire. De plus, nous montrons Ă©galement que dans le cadre du locuteur inconnu, le U-BESD surpasse les performances de la seule approche existante connue qui utilise Ă©galement l’activitĂ© cĂ©rĂ©brale
    • 

    corecore