69 research outputs found
Restoration and Efficiency of the Neural Processing of Continuous Speech Are Promoted by Prior Knowledge
Sufficiently noisy listening conditions can completely mask the acoustic signal of significant parts of a sentence, and yet listeners may still report the perception of hearing the masked speech. This occurs even when the speech signal is removed entirely, if the gap is filled with stationary noise, a phenomenon known as perceptual restoration. At the neural level, however, it is unclear the extent to which the neural representation of missing extended speech sequences is similar to the dynamic neural representation of ordinary continuous speech. Using auditory magnetoencephalography (MEG), we show that stimulus reconstruction, a technique developed for use with neural representations of ordinary speech, works also for the missing speech segments replaced by noise, even when spanning several phonemes and words. The reconstruction fidelity of the missing speech, up to 25% of what would be attained if present, depends however on listeners’ familiarity with the missing segment. This same familiarity also speeds up the most prominent stage of the cortical processing of ordinary speech by approximately 5 ms. Both effects disappear when listeners have no or little prior experience with the speech segment. The results are consistent with adaptive expectation mechanisms that consolidate detailed representations about speech sounds as identifiable factors assisting automatic restoration over ecologically relevant timescales
Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries
The principles underlying functional asymmetries in cortex remain debated. For example, it is accepted that speech is processed bilaterally in auditory cortex, but a left hemisphere dominance emerges when the input is interpreted linguistically. The mechanisms, however, are contested, such as what sound features or processing principles underlie laterality. Recent findings across species (humans, canines and bats) provide converging evidence that spectrotemporal sound features drive asymmetrical responses. Typically, accounts invoke models wherein the hemispheres differ in time-frequency resolution or integration window size. We develop a framework that builds on and unifies prevailing models, using spectrotemporal modulation space. Using signal processing techniques motivated by neural responses, we test this approach, employing behavioural and neurophysiological measures. We show how psychophysical judgements align with spectrotemporal modulations and then characterize the neural sensitivities to temporal and spectral modulations. We demonstrate differential contributions from both hemispheres, with a left lateralization for temporal modulations and a weaker right lateralization for spectral modulations. We argue that representations in the modulation domain provide a more mechanistic basis to account for lateralization in auditory cortex
Representation of speech in the primary auditory cortex and its implications for robust speech processing
Speech has evolved as a primary form of communication between humans. This most used means of communication has been the subject of intense study for years, but there is still a lot that we do not know about it. It is an oft repeated fact, that even the performance of the best speech processing algorithms still lags far behind that of the average human, It seems inescapable that unless we know more about the way the brain performs this task, our machines can not go much further. This thesis focuses on the question of speech representation in the brain, both from a physiological and technological perspective. We explore the representation of speech through the encoding of its smallest elements - phonemic features - in the primary auditory cortex. We report on how population of neurons with diverse tuning properties respond discriminately to phonemes resulting in explicit encoding of their parameters. Next, we show that this sparse encoding of the phonemic features is a simple consequence of the linear spectro-temporal properties of the auditory cortical neurons and that a Spectro-Temporal receptive field model can predict similar patterns of activation. This is an important step toward the realization of systems that operate based on the same principles as the cortex. Using an inverse method of reconstruction, we shall also explore the extent to which phonemic features are preserved in the cortical representation of noisy speech. The results suggest that the cortical responses are more robust to noise and that the important features of phonemes are preserved in the cortical representation even in noise. Finally, we explain how a model of this cortical representation can be used for speech processing and enhancement applications to improve their robustness and performance
SENSORY AND PERCEPTUAL CODES IN CORTICAL AUDITORY PROCESSING
A key aspect of human auditory cognition is establishing efficient and reliable representations about the acoustic environment, especially at the level of auditory cortex. Since the inception of encoding models that relate sound to neural response, three longstanding questions remain open. First, on the apparently insurmountable problem of fundamental changes to cortical responses depending on certain categories of sound (e.g. simple tones versus environmental sound). Second, on how to integrate inner or subjective perceptual experiences into sound encoding models, given that they presuppose existing, direct physical stimulation which is sometimes missed. And third, on how does context and learning fine-tune these encoding rules, as adaptive changes to improve impoverished conditions particularly important for communication sounds.
In this series, each question is addressed by analysis of mappings from sound stimuli delivered-to and/or perceived-by a listener, to large-scale cortically-sourced response time series from magnetoencephalography. It is first shown that the divergent, categorical modes of sensory coding may unify by exploring alternative acoustic representations other than the traditional spectrogram, such as temporal transient maps. Encoding models of either of artificial random tones, music, or speech stimulus classes, were substantially matched in their structure when represented from acoustic energy increases –consistent with the existence of a domain-general common baseline processing stage.
Separately, the matter of the perceptual experience of sound via cortical responses is addressed via stereotyped rhythmic patterns normally entraining cortical responses with equal periodicity. Here, it is shown that under conditions of perceptual restoration, namely cases where a listener reports hearing a specific sound pattern in the midst of noise nonetheless, one may access such endogenous representations in the form of evoked cortical oscillations at the same rhythmic rate.
Finally, with regards to natural speech, it is shown that extensive prior experience over repeated listening of the same sentence materials may facilitate the ability to reconstruct the original stimulus even where noise replaces it, and to also expedite normal cortical processing times in listeners. Overall, the findings demonstrate cases by which sensory and perceptual coding approaches jointly continue to expand the enquiry about listeners’ personal experience of the communication-rich soundscape
Recommended from our members
Extracting Spatiotemporal Word and Semantic Representations from Multiscale Neurophysiological Recordings in Humans
With the recent advent of neuroimaging techniques, the majority of the research studying the neural basis of language processing has focused on the localization of various lexical and semantic functions. Unfortunately, the limited time resolution of functional neuroimaging prevents a detailed analysis of the dynamics involved in word recognition, and the hemodynamic basis of these techniques prevents the study of the underlying neurophysiology. Compounding this problem, current techniques for the analysis of high-dimensional neural data are mainly sensitive to large effects in a small area, preventing a thorough study of the distributed processing involved for representing semantic knowledge. This thesis demonstrates the use of multivariate machine-learning techniques for the study of the neural representation of semantic and speech information in electro/magneto-physiological recordings with high temporal resolution. Support vector machines (SVMs) allow for the decoding of semantic category and word-specific information from non-invasive electroencephalography (EEG) and magnetoenecephalography (MEG) and demonstrate the consistent, but spatially and temporally distributed nature of such information. Moreover, the anteroventral temporal lobe (avTL) may be important for coordinating these distributed representations, as supported by the presence of supramodal category-specific information in intracranial recordings from the avTL as early as 150ms after auditory or visual word presentation. Finally, to study the inputs to this lexico-semantic system, recordings from a high density microelectrode array in anterior superior temporal gyrus (aSTG) are obtained, and the recorded spiking activity demonstrates the presence of single neurons that respond specifically to speech sounds. The successful decoding of word identity from this firing rate information suggests that the aSTG may be involved in the population coding of acousto-phonetic speech information that is likely on the pathway for mapping speech-sounds to meaning in the avTL. The feasibility of extracting semantic and phonological information from multichannel neural recordings using machine learning techniques provides a powerful method for studying language using large datasets and has potential implications for the development of fast and intuitive communication prostheses.Engineering and Applied Science
Mechanisms of auditory signal decoding in the progressive aphasias
The primary progressive aphasias (PPA) are a diverse group of neurodegenerative disorders that selectively target brain networks mediating language. The pathophysiology of PPA remains poorly understood, but emerging evidence suggests that deficits in auditory processing accompany and may precede language symptoms in these patients. In four studies, I have probed the pathophysiology of auditory signal decoding in patient cohorts representing all major PPA syndromes – nonfluent variant PPA (nfvPPA), semantic variant PPA (svPPA), and logopenic variant PPA (lvPPA) – in relation to healthy age-matched controls. In my first experiment, I presented sequences of spoken syllables manipulated for temporal regularity, spectrotemporal structure and entropy. I used voxel-based morphometry to define critical brain substrates for the processing of these attributes, identifying correlates of behavioural performance within a cortico-subcortical network extending beyond canonical language areas. In my second experiment, I used activation functional magnetic resonance imaging (fMRI) with the same stimuli. I identified network signatures of particular signal attributes: nfvPPA was associated with reduced activity in anterior cingulate for processing temporal irregularity; lvPPA with reduced activation of posterior superior temporal cortex for processing spectrotemporal structure; and svPPA with reduced activation of caudate and anterior cingulate for processing signal entropy. In my third experiment, I manipulated the auditory feedback via which participants heard their own voices during speech production. Healthy control participants spoke significantly less fluently under delayed auditory feedback, but patients with nfvPPA and lvPPA were affected significantly less. In my final experiment, I probed residual capacity for dynamic auditory signal processing and perceptual learning in PPA, using sinewave speech. Patients with nfvPPA and lvPPA showed severely attenuated learning to the degraded stimuli, while patients with svPPA showed intact early perceptual processing, but deficient integration of semantic knowledge. Together, these experiments represent the most concerted and comprehensive attempt to date to define the pathophysiology of auditory signal decoding in PPA
Neural Basis and Computational Strategies for Auditory Processing
Our senses are our window to the world, and hearing is the window through which we perceive the world of sound. While seemingly effortless, the process of hearing involves complex transformations by which the auditory system consolidates acoustic information from the environment into perceptual and cognitive experiences. Studies of auditory processing try to elucidate the mechanisms underlying the function of the auditory system, and infer computational strategies that are valuable both clinically and intellectually, hence contributing to our understanding of the function of the brain.
In this thesis, we adopt both an experimental and computational approach in tackling various aspects of auditory processing. We first investigate the neural basis underlying the function of the auditory cortex, and explore the dynamics and computational mechanisms of cortical processing. Our findings offer physiological evidence for a role of primary cortical neurons in the integration of sound features at different time constants, and possibly in the formation of auditory objects.
Based on physiological principles of sound processing, we explore computational implementations in tackling specific perceptual questions. We exploit our knowledge of the neural mechanisms of cortical auditory processing to formulate models addressing the problems of speech intelligibility and auditory scene analysis. The intelligibility model focuses on a computational approach for evaluating loss of intelligibility, inspired from mammalian physiology and human perception. It is based on a multi-resolution filter-bank implementation of cortical response patterns, which extends into a robust metric for assessing loss of intelligibility in communication channels and speech recordings.
This same cortical representation is extended further to develop a computational scheme for auditory scene analysis. The model maps perceptual principles of auditory grouping and stream formation into a computational system that combines aspects of bottom-up, primitive sound processing with an internal representation of the world. It is based on a framework of unsupervised adaptive learning with Kalman estimation. The model is extremely valuable in exploring various aspects of sound organization in the brain, allowing us to gain interesting insight into the neural basis of auditory scene analysis, as well as practical implementations for sound separation in ``cocktail-party'' situations
Assessing the relationship between talker normalization and spectral contrast effects in speech perception.
Speech perception is influenced by context. This influence can help to alleviate issues that arise from the extreme acoustic variability of speech. Two examples of contextual influences are talker normalization and spectral contrast effects (SCEs). Talker normalization occurs when listeners hear different talkers causing speech perception to be slower and less accurate. SCEs occur when spectral characteristics change from context sentences to target vowels and speech perception is biased by that change. It has been demonstrated that SCEs are restrained when contexts are spoken by different talkers (Assgari & Stilp, 2015). However, what about hearing different talkers restrains these effects was not entirely clear. In addition, while these are both considered contextual influences on speech perception, they have never been formally related to each other. The series of studies reported here served two purposes. First, these studies sought to establish why hearing different talkers restrained SCEs. Results indicate that variability in pitch (as measured by fundamental frequency), a primary acoustic cue to talker changes, restricts the influence of spectral changes on speech perception. Second, these studies attempted to relate talker normalization and SCEs by measuring them concurrently. Talker normalization (as measured by response times) and SCEs were evident in the same task suggesting that they act on speech perception at the same time. Further, these measures of talker normalization were shown to be influenced by f0 variability suggesting that SCEs and talker normalization are both related to f0 variability. However, no relationship between individual’s SCEs and response times was found. Possible reasons why f0 variability may restrain context effects are discussed
- …