Search CORE

80 research outputs found

Recommended from our members

PLP2: Autoregressive modeling of auditory-like 2-D spectro-temporal patterns

Author: Athineos Marios
Ellis Daniel P. W.
Hermansky Hynek
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

The temporal trajectories of the spectral energy in auditory critical bands over 250 ms segments are approximated by an all-pole model, the time-domain dual of conventional linear prediction. This quarter-second auditory spectro-temporal pattern is further smoothed by iterative alternation of spectral and temporal all-pole modeling. Just as Perceptual Linear Prediction (PLP) uses an autoregressive model in the frequency domain to estimate peaks in an auditory-like short-term spectral slice, PLP2 uses all-pole modeling in both time and frequency domains to estimate peaks of a two-dimensional spectrotemporal pattern, motivated by considerations of the auditory system

Columbia University Academic Commons

Selective attention and speech processing in the cortex

Author: Rajaram Siddharth
Publication venue
Publication date: 24/09/2015
Field of study

In noisy and complex environments, human listeners must segregate the mixture of sound sources arriving at their ears and selectively attend a single source, thereby solving a computationally difficult problem called the cocktail party problem. However, the neural mechanisms underlying these computations are still largely a mystery. Oscillatory synchronization of neuronal activity between cortical areas is thought to provide a crucial role in facilitating information transmission between spatially separated populations of neurons, enabling the formation of functional networks. In this thesis, we seek to analyze and model the functional neuronal networks underlying attention to speech stimuli and find that the Frontal Eye Fields play a central 'hub' role in the auditory spatial attention network in a cocktail party experiment. We use magnetoencephalography (MEG) to measure neural signals with high temporal precision, while sampling from the whole cortex. However, several methodological issues arise when undertaking functional connectivity analysis with MEG data. Specifically, volume conduction of electrical and magnetic fields in the brain complicates interpretation of results. We compare several approaches through simulations, and analyze the trade-offs among various measures of neural phase-locking in the presence of volume conduction. We use these insights to study functional networks in a cocktail party experiment. We then construct a linear dynamical system model of neural responses to ongoing speech. Using this model, we are able to correctly predict which of two speakers is being attended by a listener. We then apply this model to data from a task where people were attending to stories with synchronous and scrambled videos of the speakers' faces to explore how the presence of visual information modifies the underlying neuronal mechanisms of speech perception. This model allows us to probe neural processes as subjects listen to long stimuli, without the need for a trial-based experimental design. We model the neural activity with latent states, and model the neural noise spectrum and functional connectivity with multivariate autoregressive dynamics, along with impulse responses for external stimulus processing. We also develop a new regularized Expectation-Maximization (EM) algorithm to fit this model to electroencephalography (EEG) data

Boston University Institutional Repository (OpenBU)

Characterization and Decoding of Speech Representations From the Electrocorticogram

Author: Chakrabarti Shreya
Publication venue: ODU Digital Commons
Publication date: 01/07/2015
Field of study

Millions of people worldwide suffer from various neuromuscular disorders such as amyotrophic lateral sclerosis (ALS), brainstem stroke, muscular dystrophy, cerebral palsy, and others, which adversely affect the neural control of muscles or the muscles themselves. The patients who are the most severely affected lose all voluntary muscle control and are completely locked-in, i.e., they are unable to communicate with the outside world in any manner. In the direction of developing neuro-rehabilitation techniques for these patients, several studies have used brain signals related to mental imagery and attention in order to control an external device, a technology known as a brain-computer interface (BCI). Some recent studies have also attempted to decode various aspects of spoken language, imagined language, or perceived speech directly from brain signals. In order to extend research in this direction, this dissertation aims to characterize and decode various speech representations popularly used in speech recognition systems directly from brain activity, specifically the electrocorticogram (ECoG). The speech representations studied in this dissertation range from simple features such as the speech power and the fundamental frequency (pitch), to complex representations such as the linear prediction coding and mel frequency cepstral coefficients. These decoded speech representations may eventually be used to enhance existing speech recognition systems or to reconstruct intended or imagined speech directly from brain activity. This research will ultimately pave the way for an ECoG-based neural speech prosthesis, which will offer a more natural communication channel for individuals who have lost the ability to speak normally

Old Dominion University

Listening in adverse conditions: Masking release and effects of hearing loss

Author: Jespersgaard Claus Forup Corlin
Publication venue: Technical University of Denmark
Publication date: 01/01/2012
Field of study

Online Research Database In Technology

Recommended from our members

Pushing the Envelope—Aside

Author: Athineos Marios
Bourlard Hervé
Chen Barry
Doddington George
Ellis Daniel P. W.
Hermansky Hynek
Jain Pratibha
Morgan Nelson
Ostendorf Mari
Shinozaki Takahiro
Sivadas Sunil
Stolcke Andreas
Sönmez Kemal
Zhu Qifeng
Çetin Özgür
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Despite successes, there are still significant limitations to speech recognition performance, particularly for conversational speech and/or for speech with significant acoustic degradations from noise or reverberation. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article. Note in passing that we and many others have already taken advantage of processing techniques that incorporate information over long time ranges, for instance for normalization (by cepstral mean subtraction as stated in B. Atal (1974) or relative spectral analysis (RASTA) based in H. Hermansky and N. Morgan (1994)). They also have proposed features that are based on speech sound class posterior probabilities, which have good properties for both classification and stream combination

Columbia University Academic Commons

Statistical models for natural sounds

Author: Turner R.E.
Publication venue: UCL (University College London)
Publication date: 01/01/2010
Field of study

It is important to understand the rich structure of natural sounds in order to solve important tasks, like automatic speech recognition, and to understand auditory processing in the brain. This thesis takes a step in this direction by characterising the statistics of simple natural sounds. We focus on the statistics because perception often appears to depend on them, rather than on the raw waveform. For example the perception of auditory textures, like running water, wind, fire and rain, depends on summary-statistics, like the rate of falling rain droplets, rather than on the exact details of the physical source. In order to analyse the statistics of sounds accurately it is necessary to improve a number of traditional signal processing methods, including those for amplitude demodulation, time-frequency analysis, and sub-band demodulation. These estimation tasks are ill-posed and therefore it is natural to treat them as Bayesian inference problems. The new probabilistic versions of these methods have several advantages. For example, they perform more accurately on natural signals and are more robust to noise, they can also fill-in missing sections of data, and provide error-bars. Furthermore, free-parameters can be learned from the signal. Using these new algorithms we demonstrate that the energy, sparsity, modulation depth and modulation time-scale in each sub-band of a signal are critical statistics, together with the dependencies between the sub-band modulators. In order to validate this claim, a model containing co-modulated coloured noise carriers is shown to be capable of generating a range of realistic sounding auditory textures. Finally, we explored the connection between the statistics of natural sounds and perception. We demonstrate that inference in the model for auditory textures qualitatively replicates the primitive grouping rules that listeners use to understand simple acoustic scenes. This suggests that the auditory system is optimised for the statistics of natural sounds

CiteSeerX

UCL Discovery

Automatic Detectors for Underwater Soundscape Measurements

Author: Madhusudhana Shyam Kumar
Publication venue: Curtin University
Publication date: 01/01/2015
Field of study

Environmental impact regulations require that marine industrial operators quantify their contribution to underwater noise scenes. Automation of such assessments becomes feasible with the successful categorisation of sounds into broader classes based on source types – biological, anthropogenic and physical. Previous approaches to passive acoustic monitoring have mostly been limited to a few specific sources of interest. In this study, source-independent signal detectors are developed and a framework is presented for the automatic categorisation of underwater sounds into the aforementioned classes

espace@Curtin

Neural dynamics of selective attention to speech in noise

Author: Wöstmann Malte
Publication venue
Publication date: 08/10/2015
Field of study

This thesis investigates how the neural system instantiates selective attention to speech in challenging acoustic conditions, such as spectral degradation and the presence of background noise. Four studies using behavioural measures, magneto- and electroencephalography (M/EEG) recordings were conducted in younger (20–30 years) and older participants (60–80 years). The overall results can be summarized as follows. An EEG experiment demonstrated that slow negative potentials reflect participants’ enhanced allocation of attention when they are faced with more degraded acoustics. This basic mechanism of attention allocation was preserved at an older age. A follow-up experiment in younger listeners indicated that attention allocation can be further enhanced in a context of increased task-relevance through monetary incentives. A subsequent study focused on brain oscillatory dynamics in a demanding speech comprehension task. The power of neural alpha oscillations (~10 Hz) reflected a decrease in demands on attention with increasing acoustic detail and critically also with increasing predictiveness of the upcoming speech content. Older listeners’ behavioural responses and alpha power dynamics were stronger affected by acoustic detail compared with younger listeners, indicating that selective attention at an older age is particularly dependent on the sensory input signal. An additional analysis of listeners’ neural phase-locking to the temporal envelopes of attended speech and unattended background speech revealed that younger and older listeners show a similar segregation of attended and unattended speech on a neural level. A dichotic listening experiment in the MEG aimed at investigating how neural alpha oscillations support selective attention to speech. Lateralized alpha power modulations in parietal and auditory cortex regions predicted listeners’ focus of attention (i.e., left vs right). This suggests that alpha oscillations implement an attentional filter mechanism to enhance the signal and to suppress noise. A final behavioural study asked whether acoustic and semantic aspects of task-irrelevant speech determine how much it interferes with attention to task-relevant speech. Results demonstrated that younger and older adults were more distracted when acoustic detail of irrelevant speech was enhanced, whereas predictiveness of irrelevant speech had no effect. All findings of this thesis are integrated in an initial framework for the role of attention for speech comprehension under demanding acoustic conditions

Qucosa - Publikationsserver der Universität Leipzig

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Estimation and Modeling Problems in Parametric Audio Coding

Author: Christensen Mads Græsbøll
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2005
Field of study

VBN