69 research outputs found

    Restoration and Efficiency of the Neural Processing of Continuous Speech Are Promoted by Prior Knowledge

    Get PDF
    Sufficiently noisy listening conditions can completely mask the acoustic signal of significant parts of a sentence, and yet listeners may still report the perception of hearing the masked speech. This occurs even when the speech signal is removed entirely, if the gap is filled with stationary noise, a phenomenon known as perceptual restoration. At the neural level, however, it is unclear the extent to which the neural representation of missing extended speech sequences is similar to the dynamic neural representation of ordinary continuous speech. Using auditory magnetoencephalography (MEG), we show that stimulus reconstruction, a technique developed for use with neural representations of ordinary speech, works also for the missing speech segments replaced by noise, even when spanning several phonemes and words. The reconstruction fidelity of the missing speech, up to 25% of what would be attained if present, depends however on listeners’ familiarity with the missing segment. This same familiarity also speeds up the most prominent stage of the cortical processing of ordinary speech by approximately 5 ms. Both effects disappear when listeners have no or little prior experience with the speech segment. The results are consistent with adaptive expectation mechanisms that consolidate detailed representations about speech sounds as identifiable factors assisting automatic restoration over ecologically relevant timescales

    Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries

    Get PDF
    The principles underlying functional asymmetries in cortex remain debated. For example, it is accepted that speech is processed bilaterally in auditory cortex, but a left hemisphere dominance emerges when the input is interpreted linguistically. The mechanisms, however, are contested, such as what sound features or processing principles underlie laterality. Recent findings across species (humans, canines and bats) provide converging evidence that spectrotemporal sound features drive asymmetrical responses. Typically, accounts invoke models wherein the hemispheres differ in time-frequency resolution or integration window size. We develop a framework that builds on and unifies prevailing models, using spectrotemporal modulation space. Using signal processing techniques motivated by neural responses, we test this approach, employing behavioural and neurophysiological measures. We show how psychophysical judgements align with spectrotemporal modulations and then characterize the neural sensitivities to temporal and spectral modulations. We demonstrate differential contributions from both hemispheres, with a left lateralization for temporal modulations and a weaker right lateralization for spectral modulations. We argue that representations in the modulation domain provide a more mechanistic basis to account for lateralization in auditory cortex

    A telemetric approach for characterizing behavioral dynamics and neurophysiology of vocal interactions in Zebra Finches

    Get PDF

    Representation of speech in the primary auditory cortex and its implications for robust speech processing

    Get PDF
    Speech has evolved as a primary form of communication between humans. This most used means of communication has been the subject of intense study for years, but there is still a lot that we do not know about it. It is an oft repeated fact, that even the performance of the best speech processing algorithms still lags far behind that of the average human, It seems inescapable that unless we know more about the way the brain performs this task, our machines can not go much further. This thesis focuses on the question of speech representation in the brain, both from a physiological and technological perspective. We explore the representation of speech through the encoding of its smallest elements - phonemic features - in the primary auditory cortex. We report on how population of neurons with diverse tuning properties respond discriminately to phonemes resulting in explicit encoding of their parameters. Next, we show that this sparse encoding of the phonemic features is a simple consequence of the linear spectro-temporal properties of the auditory cortical neurons and that a Spectro-Temporal receptive field model can predict similar patterns of activation. This is an important step toward the realization of systems that operate based on the same principles as the cortex. Using an inverse method of reconstruction, we shall also explore the extent to which phonemic features are preserved in the cortical representation of noisy speech. The results suggest that the cortical responses are more robust to noise and that the important features of phonemes are preserved in the cortical representation even in noise. Finally, we explain how a model of this cortical representation can be used for speech processing and enhancement applications to improve their robustness and performance

    SENSORY AND PERCEPTUAL CODES IN CORTICAL AUDITORY PROCESSING

    Get PDF
    A key aspect of human auditory cognition is establishing efficient and reliable representations about the acoustic environment, especially at the level of auditory cortex. Since the inception of encoding models that relate sound to neural response, three longstanding questions remain open. First, on the apparently insurmountable problem of fundamental changes to cortical responses depending on certain categories of sound (e.g. simple tones versus environmental sound). Second, on how to integrate inner or subjective perceptual experiences into sound encoding models, given that they presuppose existing, direct physical stimulation which is sometimes missed. And third, on how does context and learning fine-tune these encoding rules, as adaptive changes to improve impoverished conditions particularly important for communication sounds. In this series, each question is addressed by analysis of mappings from sound stimuli delivered-to and/or perceived-by a listener, to large-scale cortically-sourced response time series from magnetoencephalography. It is first shown that the divergent, categorical modes of sensory coding may unify by exploring alternative acoustic representations other than the traditional spectrogram, such as temporal transient maps. Encoding models of either of artificial random tones, music, or speech stimulus classes, were substantially matched in their structure when represented from acoustic energy increases –consistent with the existence of a domain-general common baseline processing stage. Separately, the matter of the perceptual experience of sound via cortical responses is addressed via stereotyped rhythmic patterns normally entraining cortical responses with equal periodicity. Here, it is shown that under conditions of perceptual restoration, namely cases where a listener reports hearing a specific sound pattern in the midst of noise nonetheless, one may access such endogenous representations in the form of evoked cortical oscillations at the same rhythmic rate. Finally, with regards to natural speech, it is shown that extensive prior experience over repeated listening of the same sentence materials may facilitate the ability to reconstruct the original stimulus even where noise replaces it, and to also expedite normal cortical processing times in listeners. Overall, the findings demonstrate cases by which sensory and perceptual coding approaches jointly continue to expand the enquiry about listeners’ personal experience of the communication-rich soundscape

    Mechanisms of auditory signal decoding in the progressive aphasias

    Get PDF
    The primary progressive aphasias (PPA) are a diverse group of neurodegenerative disorders that selectively target brain networks mediating language. The pathophysiology of PPA remains poorly understood, but emerging evidence suggests that deficits in auditory processing accompany and may precede language symptoms in these patients. In four studies, I have probed the pathophysiology of auditory signal decoding in patient cohorts representing all major PPA syndromes – nonfluent variant PPA (nfvPPA), semantic variant PPA (svPPA), and logopenic variant PPA (lvPPA) – in relation to healthy age-matched controls. In my first experiment, I presented sequences of spoken syllables manipulated for temporal regularity, spectrotemporal structure and entropy. I used voxel-based morphometry to define critical brain substrates for the processing of these attributes, identifying correlates of behavioural performance within a cortico-subcortical network extending beyond canonical language areas. In my second experiment, I used activation functional magnetic resonance imaging (fMRI) with the same stimuli. I identified network signatures of particular signal attributes: nfvPPA was associated with reduced activity in anterior cingulate for processing temporal irregularity; lvPPA with reduced activation of posterior superior temporal cortex for processing spectrotemporal structure; and svPPA with reduced activation of caudate and anterior cingulate for processing signal entropy. In my third experiment, I manipulated the auditory feedback via which participants heard their own voices during speech production. Healthy control participants spoke significantly less fluently under delayed auditory feedback, but patients with nfvPPA and lvPPA were affected significantly less. In my final experiment, I probed residual capacity for dynamic auditory signal processing and perceptual learning in PPA, using sinewave speech. Patients with nfvPPA and lvPPA showed severely attenuated learning to the degraded stimuli, while patients with svPPA showed intact early perceptual processing, but deficient integration of semantic knowledge. Together, these experiments represent the most concerted and comprehensive attempt to date to define the pathophysiology of auditory signal decoding in PPA

    Neural Basis and Computational Strategies for Auditory Processing

    Get PDF
    Our senses are our window to the world, and hearing is the window through which we perceive the world of sound. While seemingly effortless, the process of hearing involves complex transformations by which the auditory system consolidates acoustic information from the environment into perceptual and cognitive experiences. Studies of auditory processing try to elucidate the mechanisms underlying the function of the auditory system, and infer computational strategies that are valuable both clinically and intellectually, hence contributing to our understanding of the function of the brain. In this thesis, we adopt both an experimental and computational approach in tackling various aspects of auditory processing. We first investigate the neural basis underlying the function of the auditory cortex, and explore the dynamics and computational mechanisms of cortical processing. Our findings offer physiological evidence for a role of primary cortical neurons in the integration of sound features at different time constants, and possibly in the formation of auditory objects. Based on physiological principles of sound processing, we explore computational implementations in tackling specific perceptual questions. We exploit our knowledge of the neural mechanisms of cortical auditory processing to formulate models addressing the problems of speech intelligibility and auditory scene analysis. The intelligibility model focuses on a computational approach for evaluating loss of intelligibility, inspired from mammalian physiology and human perception. It is based on a multi-resolution filter-bank implementation of cortical response patterns, which extends into a robust metric for assessing loss of intelligibility in communication channels and speech recordings. This same cortical representation is extended further to develop a computational scheme for auditory scene analysis. The model maps perceptual principles of auditory grouping and stream formation into a computational system that combines aspects of bottom-up, primitive sound processing with an internal representation of the world. It is based on a framework of unsupervised adaptive learning with Kalman estimation. The model is extremely valuable in exploring various aspects of sound organization in the brain, allowing us to gain interesting insight into the neural basis of auditory scene analysis, as well as practical implementations for sound separation in ``cocktail-party'' situations

    Assessing the relationship between talker normalization and spectral contrast effects in speech perception.

    Get PDF
    Speech perception is influenced by context. This influence can help to alleviate issues that arise from the extreme acoustic variability of speech. Two examples of contextual influences are talker normalization and spectral contrast effects (SCEs). Talker normalization occurs when listeners hear different talkers causing speech perception to be slower and less accurate. SCEs occur when spectral characteristics change from context sentences to target vowels and speech perception is biased by that change. It has been demonstrated that SCEs are restrained when contexts are spoken by different talkers (Assgari & Stilp, 2015). However, what about hearing different talkers restrains these effects was not entirely clear. In addition, while these are both considered contextual influences on speech perception, they have never been formally related to each other. The series of studies reported here served two purposes. First, these studies sought to establish why hearing different talkers restrained SCEs. Results indicate that variability in pitch (as measured by fundamental frequency), a primary acoustic cue to talker changes, restricts the influence of spectral changes on speech perception. Second, these studies attempted to relate talker normalization and SCEs by measuring them concurrently. Talker normalization (as measured by response times) and SCEs were evident in the same task suggesting that they act on speech perception at the same time. Further, these measures of talker normalization were shown to be influenced by f0 variability suggesting that SCEs and talker normalization are both related to f0 variability. However, no relationship between individual’s SCEs and response times was found. Possible reasons why f0 variability may restrain context effects are discussed
    • …
    corecore