80 research outputs found

    Selective attention and speech processing in the cortex

    Full text link
    In noisy and complex environments, human listeners must segregate the mixture of sound sources arriving at their ears and selectively attend a single source, thereby solving a computationally difficult problem called the cocktail party problem. However, the neural mechanisms underlying these computations are still largely a mystery. Oscillatory synchronization of neuronal activity between cortical areas is thought to provide a crucial role in facilitating information transmission between spatially separated populations of neurons, enabling the formation of functional networks. In this thesis, we seek to analyze and model the functional neuronal networks underlying attention to speech stimuli and find that the Frontal Eye Fields play a central 'hub' role in the auditory spatial attention network in a cocktail party experiment. We use magnetoencephalography (MEG) to measure neural signals with high temporal precision, while sampling from the whole cortex. However, several methodological issues arise when undertaking functional connectivity analysis with MEG data. Specifically, volume conduction of electrical and magnetic fields in the brain complicates interpretation of results. We compare several approaches through simulations, and analyze the trade-offs among various measures of neural phase-locking in the presence of volume conduction. We use these insights to study functional networks in a cocktail party experiment. We then construct a linear dynamical system model of neural responses to ongoing speech. Using this model, we are able to correctly predict which of two speakers is being attended by a listener. We then apply this model to data from a task where people were attending to stories with synchronous and scrambled videos of the speakers' faces to explore how the presence of visual information modifies the underlying neuronal mechanisms of speech perception. This model allows us to probe neural processes as subjects listen to long stimuli, without the need for a trial-based experimental design. We model the neural activity with latent states, and model the neural noise spectrum and functional connectivity with multivariate autoregressive dynamics, along with impulse responses for external stimulus processing. We also develop a new regularized Expectation-Maximization (EM) algorithm to fit this model to electroencephalography (EEG) data

    Characterization and Decoding of Speech Representations From the Electrocorticogram

    Get PDF
    Millions of people worldwide suffer from various neuromuscular disorders such as amyotrophic lateral sclerosis (ALS), brainstem stroke, muscular dystrophy, cerebral palsy, and others, which adversely affect the neural control of muscles or the muscles themselves. The patients who are the most severely affected lose all voluntary muscle control and are completely locked-in, i.e., they are unable to communicate with the outside world in any manner. In the direction of developing neuro-rehabilitation techniques for these patients, several studies have used brain signals related to mental imagery and attention in order to control an external device, a technology known as a brain-computer interface (BCI). Some recent studies have also attempted to decode various aspects of spoken language, imagined language, or perceived speech directly from brain signals. In order to extend research in this direction, this dissertation aims to characterize and decode various speech representations popularly used in speech recognition systems directly from brain activity, specifically the electrocorticogram (ECoG). The speech representations studied in this dissertation range from simple features such as the speech power and the fundamental frequency (pitch), to complex representations such as the linear prediction coding and mel frequency cepstral coefficients. These decoded speech representations may eventually be used to enhance existing speech recognition systems or to reconstruct intended or imagined speech directly from brain activity. This research will ultimately pave the way for an ECoG-based neural speech prosthesis, which will offer a more natural communication channel for individuals who have lost the ability to speak normally

    Statistical models for natural sounds

    Get PDF
    It is important to understand the rich structure of natural sounds in order to solve important tasks, like automatic speech recognition, and to understand auditory processing in the brain. This thesis takes a step in this direction by characterising the statistics of simple natural sounds. We focus on the statistics because perception often appears to depend on them, rather than on the raw waveform. For example the perception of auditory textures, like running water, wind, fire and rain, depends on summary-statistics, like the rate of falling rain droplets, rather than on the exact details of the physical source. In order to analyse the statistics of sounds accurately it is necessary to improve a number of traditional signal processing methods, including those for amplitude demodulation, time-frequency analysis, and sub-band demodulation. These estimation tasks are ill-posed and therefore it is natural to treat them as Bayesian inference problems. The new probabilistic versions of these methods have several advantages. For example, they perform more accurately on natural signals and are more robust to noise, they can also fill-in missing sections of data, and provide error-bars. Furthermore, free-parameters can be learned from the signal. Using these new algorithms we demonstrate that the energy, sparsity, modulation depth and modulation time-scale in each sub-band of a signal are critical statistics, together with the dependencies between the sub-band modulators. In order to validate this claim, a model containing co-modulated coloured noise carriers is shown to be capable of generating a range of realistic sounding auditory textures. Finally, we explored the connection between the statistics of natural sounds and perception. We demonstrate that inference in the model for auditory textures qualitatively replicates the primitive grouping rules that listeners use to understand simple acoustic scenes. This suggests that the auditory system is optimised for the statistics of natural sounds

    Automatic Detectors for Underwater Soundscape Measurements

    Get PDF
    Environmental impact regulations require that marine industrial operators quantify their contribution to underwater noise scenes. Automation of such assessments becomes feasible with the successful categorisation of sounds into broader classes based on source types – biological, anthropogenic and physical. Previous approaches to passive acoustic monitoring have mostly been limited to a few specific sources of interest. In this study, source-independent signal detectors are developed and a framework is presented for the automatic categorisation of underwater sounds into the aforementioned classes

    Neural dynamics of selective attention to speech in noise

    Get PDF
    This thesis investigates how the neural system instantiates selective attention to speech in challenging acoustic conditions, such as spectral degradation and the presence of background noise. Four studies using behavioural measures, magneto- and electroencephalography (M/EEG) recordings were conducted in younger (20–30 years) and older participants (60–80 years). The overall results can be summarized as follows. An EEG experiment demonstrated that slow negative potentials reflect participants’ enhanced allocation of attention when they are faced with more degraded acoustics. This basic mechanism of attention allocation was preserved at an older age. A follow-up experiment in younger listeners indicated that attention allocation can be further enhanced in a context of increased task-relevance through monetary incentives. A subsequent study focused on brain oscillatory dynamics in a demanding speech comprehension task. The power of neural alpha oscillations (~10 Hz) reflected a decrease in demands on attention with increasing acoustic detail and critically also with increasing predictiveness of the upcoming speech content. Older listeners’ behavioural responses and alpha power dynamics were stronger affected by acoustic detail compared with younger listeners, indicating that selective attention at an older age is particularly dependent on the sensory input signal. An additional analysis of listeners’ neural phase-locking to the temporal envelopes of attended speech and unattended background speech revealed that younger and older listeners show a similar segregation of attended and unattended speech on a neural level. A dichotic listening experiment in the MEG aimed at investigating how neural alpha oscillations support selective attention to speech. Lateralized alpha power modulations in parietal and auditory cortex regions predicted listeners’ focus of attention (i.e., left vs right). This suggests that alpha oscillations implement an attentional filter mechanism to enhance the signal and to suppress noise. A final behavioural study asked whether acoustic and semantic aspects of task-irrelevant speech determine how much it interferes with attention to task-relevant speech. Results demonstrated that younger and older adults were more distracted when acoustic detail of irrelevant speech was enhanced, whereas predictiveness of irrelevant speech had no effect. All findings of this thesis are integrated in an initial framework for the role of attention for speech comprehension under demanding acoustic conditions

    Estimation and Modeling Problems in Parametric Audio Coding

    Get PDF
    corecore