310 research outputs found

    TEMPORAL CODING OF SPEECH IN HUMAN AUDITORY CORTEX

    Get PDF
    Human listeners can reliably recognize speech in complex listening environments. The underlying neural mechanisms, however, remain unclear and cannot yet be emulated by any artificial system. In this dissertation, we study how speech is represented in the human auditory cortex and how the neural representation contributes to reliable speech recognition. Cortical activity from normal hearing human subjects is noninvasively recorded using magnetoencephalography, during natural speech listening. It is first demonstrated that neural activity from auditory cortex is precisely synchronized to the slow temporal modulations of speech, when the speech signal is presented in a quiet listening environment. How this neural representation is affected by acoustic interference is then investigated. Acoustic interference degrades speech perception via two mechanisms, informational masking and energetic masking, which are addressed respectively by using a competing speech stream and a stationary noise as the interfering sound. When two speech streams are presented simultaneously, cortical activity is predominantly synchronized to the speech stream the listener attends to, even if the unattended, competing speech stream is 8 dB more intense. When speech is presented together with spectrally matched stationary noise, cortical activity remains precisely synchronized to the temporal modulations of speech until the noise is 9 dB more intense. Critically, the accuracy of neural synchronization to speech predicts how well individual listeners can understand speech in noise. Further analysis reveals that two neural sources contribute to speech synchronized cortical activity, one with a shorter response latency of about 50 ms and the other with a longer response latency of about 100 ms. The longer-latency component, but not the shorter-latency component, shows selectivity to the attended speech and invariance to background noise, indicating a transition from encoding the acoustic scene to encoding the behaviorally important auditory object, in auditory cortex. Taken together, we have demonstrated that during natural speech comprehension, neural activity in the human auditory cortex is precisely synchronized to the slow temporal modulations of speech. This neural synchronization is robust to acoustic interference, whether speech or noise, and therefore provides a strong candidate for the neural basis of acoustic background invariant speech recognition

    Neural and computational approaches to auditory scene analysis

    Get PDF
    Our perception of the world is highly dependent on the complex processing of the sensory inputs by the brain. Hearing is one of those seemingly effortless sensory tasks that enables us to perceive the auditory world and integrate acoustic information from the environment into cognitive experiences. The main purpose of studying auditory system is to shed light on the neural mechanisms underlying our hearing ability. Understanding the systematic approach of the brain in performing such complicated tasks is an ultimate goal with numerous clinical and intellectual applications. In this thesis, we take advantage of various experimental and computational approaches to understand the functionality of the brain in analyzing complex auditory scenes. We first focus on investigating the behavioral and neural mechanisms underlying auditory sound segregation, also known as auditory streaming. Employing an informational masking paradigm, we explore the interaction between stimulus-driven and task-driven attentional process in the auditory cortex using magnetoencephalography (MEG) recordings from the human brain. The results demonstrate close links between perceptual and neural consequences of the auditory stream segregation, suggesting the neural activity to be viewed as an indicator of the auditory streaming percept. We examine more realistic auditory scenarios consisted of two speakers simultaneously present in an auditory scene and introduce a novel computational approach for decoding the attentional state of listeners in such environment. The proposed model focuses on an efficient implementation of a decoder for tracking the cognitive state of the brain, inspired from neural representation of auditory objects in the auditory cortex. The structure is based on an state-space model with the recorded MEG signal and individual speech envelopes as the input and the probability of attending to the target speaker as the output of the model. The proposed approach benefits from accurate and highly resolved estimation of attentional state in time as well as the inherent model-based dynamic denoising of the underlying state-space model, which makes it possible to reliably decode the attentional state under very low SNR conditions. As part of this research work, we investigate the neural representation of ambiguous auditory stimuli at the level of the auditory cortex. In perceiving a typical auditory scene, we may receive incomplete or ambiguous auditory information from the environment. This can lead to multiple interpretations of the same acoustic scene and formation of an ambitious perceptual state in the brain. Here, in a series of experimental studies, we focus on a particular example of ambitious stimulus (ambitious Shepard tone pair) and investigate the neural correlates of the contextual effect and perceptual biasing using MEG. The results from psychoacoustic and neural recordings suggest a set of hypothesis about the underlying neural mechanism of short-term memory and expectation modulation in the nervous system

    Efficient Solutions to High-Dimensional and Nonlinear Neural Inverse Problems

    Get PDF
    Development of various data acquisition techniques has enabled researchers to study the brain as a complex system and gain insight into the high-level functions performed by different regions of the brain. These data are typically high-dimensional as they pertain to hundreds of sensors and span hours of recording. In many experiments involving sensory or cognitive tasks, the underlying cortical activity admits sparse and structured representations in the temporal, spatial, or spectral domains, or combinations thereof. However, current neural data analysis approaches do not take account of sparsity in order to harness the high-dimensionality. Also, many existing approaches suffer from high bias due to the heavy usage of linear models and estimation techniques, given that cortical activity is known to exhibit various degrees of non-linearity. Finally, the majority of current methods in computational neuroscience are tailored for static estimation in batch-mode and offline settings, and with the advancement of brain-computer interface technologies, these methods need to be extended to capture neural dynamics in a real-time fashion. The objective of this dissertation is to devise novel algorithms for real-time estimation settings and to incorporate the sparsity and non-linear properties of brain activity for providing efficient solutions to neural inverse problems involving high-dimensional data. Along the same line, our goal is to provide efficient representations of these high-dimensional data that are easy to interpret and assess statistically. First, we consider the problem of spectral estimation from binary neuronal spiking data. Due to the non-linearities involved in spiking dynamics, classical spectral representation methods fail to capture the spectral properties of these data. To address this challenge, we integrate point process theory, sparse estimation, and non-linear signal processing methods to propose a spectral representation modeling and estimation framework for spiking data. Our model takes into account the sparse spectral structure of spiking data, which is crucial in the analysis of electrophysiology data in conditions such as sleep and anesthesia. We validate the performance of our spectral estimation framework using simulated spiking data as well as multi-unit spike recordings from human subjects under general anesthesia. Next, we tackle the problem of real-time auditory attention decoding from electroencephalography (EEG) or magnetoencephalography (MEG) data in a competing-speaker environment. Most existing algorithms for this purpose operate offline and require access to multiple trials for a reliable performance; hence, they are not suitable for real-time applications. To address these shortcomings, we integrate techniques from state-space modeling, Bayesian filtering, and sparse estimation to propose a real-time algorithm for attention decoding that provides robust, statistically interpretable, and dynamic measures of the attentional state of the listener. We validate the performance of our proposed algorithm using simulated and experimentally-recorded M/EEG data. Our analysis reveals that our algorithms perform comparable to the state-of-the-art offline attention decoding techniques, while providing significant computational savings. Finally, we study the problem of dynamic estimation of Temporal Response Functions (TRFs) for analyzing neural response to auditory stimuli. A TRF can be viewed as the impulse response of the brain in a linear stimulus-response model. Over the past few years, TRF analysis has provided researchers with great insight into auditory processing, specially under competing speaker environments. However, most existing results correspond to static TRF estimates and do not examine TRF dynamics, especially in multi-speaker environments with attentional modulation. Using state-space models, we provide a framework for a robust and comprehensive dynamic analysis of TRFs using single trial data. TRF components at specific lags may exhibit peaks which arise, persist, and disappear over time according to the attentional state of the listener. To account for this specific behavior in our model, we consider a state-space model with a Gaussian mixture process noise, and devise an algorithm to efficiently estimate the process noise parameters from the recorded M/EEG data. Application to simulated and recorded MEG data shows that the {proposed state-space modeling and inference framework can reliably capture the dynamic changes in the TRF, which can in turn improve our access to the attentional state in competing-speaker environments

    Computational modelling of neural mechanisms underlying natural speech perception

    Get PDF
    Humans are highly skilled at the analysis of complex auditory scenes. In particular, the human auditory system is characterized by incredible robustness to noise and can nearly effortlessly isolate the voice of a specific talker from even the busiest of mixtures. However, neural mechanisms underlying these remarkable properties remain poorly understood. This is mainly due to the inherent complexity of speech signals and multi-stage, intricate processing performed in the human auditory system. Understanding these neural mechanisms underlying speech perception is of interest for clinical practice, brain-computer interfacing and automatic speech processing systems. In this thesis, we developed computational models characterizing neural speech processing across different stages of the human auditory pathways. In particular, we studied the active role of slow cortical oscillations in speech-in-noise comprehension through a spiking neural network model for encoding spoken sentences. The neural dynamics of the model during noisy speech encoding reflected speech comprehension of young, normal-hearing adults. The proposed theoretical model was validated by predicting the effects of non-invasive brain stimulation on speech comprehension in an experimental study involving a cohort of volunteers. Moreover, we developed a modelling framework for detecting the early, high-frequency neural response to the uninterrupted speech in non-invasive neural recordings. We applied the method to investigate top-down modulation of this response by the listener's selective attention and linguistic properties of different words from a spoken narrative. We found that in both cases, the detected responses of predominantly subcortical origin were significantly modulated, which supports the functional role of feedback, between higher- and lower levels stages of the auditory pathways, in speech perception. The proposed computational models shed light on some of the poorly understood neural mechanisms underlying speech perception. The developed methods can be readily employed in future studies involving a range of experimental paradigms beyond these considered in this thesis.Open Acces

    Estimation of the Temporal Response Function and Tracking Selective Auditory Attention using Deep Kalman Filter

    Get PDF
    The cocktail party effect refers to the phenomenon that people can focus on a single sound source in a noisy environment with multiple speakers talking at the same time. This effect reflects the human brain's ability of selective auditory attention, whose decoding from non-invasive electroencephalogram (EEG) or magnetoencephalography (MEG) has recently been a topic of active research. The mapping between auditory stimuli and their neural responses can be measured by the auditory temporal response functions (TRF). It has been shown that the TRF estimates derived with the envelopes of speech streams and auditory neural responses can be used to make predictions that discriminate between attended and unattended speakers. l_1 regularized least squares estimation has been adopted in previous research for the estimation of the linear TRF model. However, most real-world systems exhibit a degree of non-linearity. We thus have to use new models for complex, realistic auditory environments. In this thesis, we proposed to estimate TRFs with the deep Kalman filter model, for the cases where the observations are a noisy, non-linear function of the latent states. The deep Kalman filter (DKF) algorithm is developed by referring to the techniques in variational inference. Replacing all the linear transformations in the classic Kalman filter model with non-linear transformations makes the posterior distribution intractable to compute due to the non-linearity. Thus, a recognition network is introduced to approximate the intractable posterior and optimize the variational lower bound of the objective function. We implemented the deep Kalman filter model with a two-layer Bidirectional LSTM and a MLP. The performance is first evaluated by applying our algorithm to simulated MEG data. In addition, we also combined the new model for TRF estimation with a previously proposed framework by replacing the dynamic encoding/decoding module in the framework with a deep Kalman filter to conduct real-time tracking of selective auditory attention. This performance is validated by applying the general framework to simulated EEG data

    Decoding auditory attention and neural language processing in adverse conditions and different listener groups

    Get PDF
    This thesis investigated subjective, behavioural and neurophysiological (EEG) measures of speech processing in various adverse conditions and with different listener groups. In particular, this thesis focused on different neural processing stages and their relationship with auditory attention, effort, and measures of speech intelligibility. Study 1 set the groundwork by establishing a toolbox of various neural measures to investigate online speech processing, from the frequency following response (FFR) and cortical measures of speech processing, to the N400, a measure of lexico-semantic processing. Results showed that peripheral processing is heavily influenced by stimulus characteristics such as degradation, whereas central processing units are more closely linked to higher-order phenomena such as speech intelligibility. In Study 2, a similar experimental paradigm was used to investigate differences in neural processing between a hearing-impaired and a normal-hearing group. Subjects were presented with short stories in different levels of multi-talker babble noise, and with different settings on their hearing aids. Findings indicate that, particularly at lower noise levels, the hearing-impaired group showed much higher cortical entrainment than the normal- hearing group, despite similar levels of speech recognition. Intersubject correlation, another global neural measure of auditory attention, however, was similarly affected by noise levels in both the hearing-impaired and the normal-hearing group. This finding indicates extra processing in the hearing-impaired group only on the level of the auditory cortex. Study 3, in contrast to Studies 1 and 2 (which both investigated the effects of bottom-up factors on neural processing), examined the links between entrainment and top-down factors, specifically motivation; as well as reasons for the 5 higher entrainment found in hearing-impaired subjects in Study 2. Results indicated that, while behaviourally there was no difference between incentive and non-incentive conditions, neurophysiological measures of attention such as intersubject correlation were affected by the presence of an incentive to perform better. Moreover, using a specific degradation type resulted in subjects’ increased cortical entrainment under degraded conditions. These findings support the hypothesis that top-down factors such as motivation influence neurophysiological measures; and that higher entrainment to degraded speech might be triggered specifically by the reduced availability of spectral detail contained in speech

    Neuromorphic model for sound source segregation

    Get PDF
    While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence

    CORTICAL REPRESENTATION OF SPEECH IN COMPLEX AUDITORY ENVIRONMENTS AND APPLICATIONS

    Get PDF
    Being able to attend and recognize speech or a particular sound in complex listening environments is a feat performed by humans effortlessly. The underlying neural mechanisms, however, remain unclear and cannot yet be emulated by artificial systems. Understanding the internal (cortical) representation of external acoustic world is a key step in deciphering the mechanisms of human auditory processing. Further, understanding neural representation of sound finds numerous applications in clinical research for psychiatric disorders with auditory processing deficits such as schizophrenia. In the first part of this dissertation, cortical activity from normal hearing human subjects is recorded, non-invasively, using magnetoencephalography in two different real-life listening scenarios. First, when natural speech is distorted by reverberation as well as stationary additive noise. Second, when the attended speech is degraded by the presence of multiple additional talkers in the background, simulating a cocktail party. Using natural speech affected by reverberation and noise, it was demonstrated that the auditory cortex maintains both distorted as well as distortion-free representations of speech. Additionally, we show that, while the neural representation of speech remained robust to additive noise in absence of reverberation, noise had detrimental effect in presence of reverberation, suggesting differential mechanisms of speech processing for additive and reverberation distortions. In the cocktail party paradigm, we demonstrated that primary like areas represent the external auditory world in terms of acoustics, whereas higher-order areas maintained an object based representation. Further, it was demonstrated that background speech streams were represented as an unsegregated auditory object. The results suggest that object based representation of auditory scene emerge in higher-order auditory cortices. In the second part of this dissertation, using electroencephalographic recordings from normal human subjects and patients suffering from schizophrenia, it was demonstrated, for the first time, that delta band steady state responses are more affected in schizophrenia patients compared with healthy individuals, contrary to the prevailing dominance of gamma band studies in literature. Further, the results from this study suggest that the inadequate ability to sustain neural responses in this low frequency range may play a vital role in auditory perceptual and cognitive deficit mechanisms in schizophrenia. Overall this dissertation furthers current understanding of cortical representation of speech in complex listening environments and how auditory representation of sounds is affected in psychiatric disorders involving aberrant auditory processing

    A new unifying account of the roles of neuronal entrainment

    Get PDF
    Rhythms are a fundamental and defining feature of neuronal activity in animals including humans. This rhythmic brain activity interacts in complex ways with rhythms in the internal and external environment through the phenomenon of ‘neuronal entrainment’, which is attracting increasing attention due to its suggested role in a multitude of sensory and cognitive processes. Some senses, such as touch and vision, sample the environment rhythmically, while others, like audition, are faced with mostly rhythmic inputs. Entrainment couples rhythmic brain activity to external and internal rhythmic events, serving fine-grained routing and modulation of external and internal signals across multiple spatial and temporal hierarchies. This interaction between a brain and its environment can be experimentally investigated and even modified by rhythmic sensory stimuli or invasive and non-invasive neuromodulation techniques. We provide a comprehensive overview of the topic and propose a theoretical framework of how neuronal entrainment dynamically structures information from incoming neuronal, bodily and environmental sources. We discuss the different types of neuronal entrainment, the conceptual advances in the field, and converging evidence for general principles

    Sensorimotor Modulations by Cognitive Processes During Accurate Speech Discrimination: An EEG Investigation of Dorsal Stream Processing

    Get PDF
    Internal models mediate the transmission of information between anterior and posterior regions of the dorsal stream in support of speech perception, though it remains unclear how this mechanism responds to cognitive processes in service of task demands. The purpose of the current study was to identify the influences of attention and working memory on sensorimotor activity across the dorsal stream during speech discrimination, with set size and signal clarity employed to modulate stimulus predictability and the time course of increased task demands, respectively. Independent Component Analysis of 64–channel EEG data identified bilateral sensorimotor mu and auditory alpha components from a cohort of 42 participants, indexing activity from anterior (mu) and posterior (auditory) aspects of the dorsal stream. Time frequency (ERSP) analysis evaluated task-related changes in focal activation patterns with phase coherence measures employed to track patterns of information flow across the dorsal stream. ERSP decomposition of mu clusters revealed event-related desynchronization (ERD) in beta and alpha bands, which were interpreted as evidence of forward (beta) and inverse (alpha) internal modeling across the time course of perception events. Stronger pre-stimulus mu alpha ERD in small set discrimination tasks was interpreted as more efficient attentional allocation due to the reduced sensory search space enabled by predictable stimuli. Mu-alpha and mu-beta ERD in peri- and post-stimulus periods were interpreted within the framework of Analysis by Synthesis as evidence of working memory activity for stimulus processing and maintenance, with weaker activity in degraded conditions suggesting that covert rehearsal mechanisms are sensitive to the quality of the stimulus being retained in working memory. Similar ERSP patterns across conditions despite the differences in stimulus predictability and clarity, suggest that subjects may have adapted to tasks. In light of this, future studies of sensorimotor processing should consider the ecological validity of the tasks employed, as well as the larger cognitive environment in which tasks are performed. The absence of interpretable patterns of mu-auditory coherence modulation across the time course of speech discrimination highlights the need for more sensitive analyses to probe dorsal stream connectivity
    • …
    corecore