3,285 research outputs found

    Context-Dependent Encoding in the Human Auditory Brainstem Relates to Hearing Speech in Noise: Implications for Developmental Dyslexia

    Get PDF
    SummaryWe examined context-dependent encoding of speech in children with and without developmental dyslexia by measuring auditory brainstem responses to a speech syllable presented in a repetitive or variable context. Typically developing children showed enhanced brainstem representation of features related to voice pitch in the repetitive context, relative to the variable context. In contrast, children with developmental dyslexia exhibited impairment in their ability to modify representation in predictable contexts. From a functional perspective, we found that the extent of context-dependent encoding in the auditory brainstem correlated positively with behavioral indices of speech perception in noise. The ability to sharpen representation of repeating elements is crucial to speech perception in noise, since it allows superior “tagging” of voice pitch, an important cue for segregating sound streams in background noise. The disruption of this mechanism contributes to a critical deficit in noise-exclusion, a hallmark symptom in developmental dyslexia

    Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension

    Get PDF
    Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments

    Contributions of local speech encoding and functional connectivity to audio-visual speech perception

    Get PDF
    Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments

    Investigating the Neural Basis of Audiovisual Speech Perception with Intracranial Recordings in Humans

    Get PDF
    Speech is inherently multisensory, containing auditory information from the voice and visual information from the mouth movements of the talker. Hearing the voice is usually sufficient to understand speech, however in noisy environments or when audition is impaired due to aging or disabilities, seeing mouth movements greatly improves speech perception. Although behavioral studies have well established this perceptual benefit, it is still not clear how the brain processes visual information from mouth movements to improve speech perception. To clarify this issue, I studied the neural activity recorded from the brain surfaces of human subjects using intracranial electrodes, a technique known as electrocorticography (ECoG). First, I studied responses to noisy speech in the auditory cortex, specifically in the superior temporal gyrus (STG). Previous studies identified the anterior parts of the STG as unisensory, responding only to auditory stimulus. On the other hand, posterior parts of the STG are known to be multisensory, responding to both auditory and visual stimuli, which makes it a key region for audiovisual speech perception. I examined how these different parts of the STG respond to clear versus noisy speech. I found that noisy speech decreased the amplitude and increased the across-trial variability of the response in the anterior STG. However, possibly due to its multisensory composition, posterior STG was not as sensitive to auditory noise as the anterior STG and responded similarly to clear and noisy speech. I also found that these two response patterns in the STG were separated by a sharp boundary demarcated by the posterior-most portion of the Heschl’s gyrus. Second, I studied responses to silent speech in the visual cortex. Previous studies demonstrated that visual cortex shows response enhancement when the auditory component of speech is noisy or absent, however it was not clear which regions of the visual cortex specifically show this response enhancement and whether this response enhancement is a result of top-down modulation from a higher region. To test this, I first mapped the receptive fields of different regions in the visual cortex and then measured their responses to visual (silent) and audiovisual speech stimuli. I found that visual regions that have central receptive fields show greater response enhancement to visual speech, possibly because these regions receive more visual information from mouth movements. I found similar response enhancement to visual speech in frontal cortex, specifically in the inferior frontal gyrus, premotor and dorsolateral prefrontal cortices, which have been implicated in speech reading in previous studies. I showed that these frontal regions display strong functional connectivity with visual regions that have central receptive fields during speech perception

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Get PDF
    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

    DISSOCIABLE MECHANISMS OF CONCURRENT SPEECH IDENTIFICATION IN NOISE AT CORTICAL AND SUBCORTICAL LEVELS.

    Get PDF
    When two vowels with different fundamental frequencies (F0s) are presented concurrently, listeners often hear two voices producing different vowels on different pitches. Parsing of this simultaneous speech can also be affected by the signal-to-noise ratio (SNR) in the auditory scene. The extraction and interaction of F0 and SNR cues may occur at multiple levels of the auditory system. The major aims of this dissertation are to elucidate the neural mechanisms and time course of concurrent speech perception in clean and in degraded listening conditions and its behavioral correlates. In two complementary experiments, electrical brain activity (EEG) was recorded at cortical (EEG Study #1) and subcortical (FFR Study #2) levels while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero and four semitones (STs) presented in either clean or noise degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in identifying both vowels for larger F0 separations (i.e., 4ST; with pitch cues), and this F0-benefit was more pronounced at more favorable SNRs. Time-frequency analysis of cortical EEG oscillations (i.e., brain rhythms) revealed a dynamic time course for concurrent speech processing that depended on both extrinsic (SNR) and intrinsic (pitch) acoustic factors. Early high frequency activity reflected pre-perceptual encoding of acoustic features (~200 ms) and the quality (i.e., SNR) of the speech signal (~250-350ms), whereas later-evolving low-frequency rhythms (~400-500ms) reflected post-perceptual, cognitive operations that covaried with listening effort and task demands. Analysis of subcortical responses indicated that while FFRs provided a high-fidelity representation of double vowel stimuli and the spectro-temporal nonlinear properties of the peripheral auditory system. FFR activity largely reflected the neural encoding of stimulus features (exogenous coding) rather than perceptual outcomes, but timbre (F1) could predict the speed in noise conditions. Taken together, results of this dissertation suggest that subcortical auditory processing reflects mostly exogenous (acoustic) feature encoding in stark contrast to cortical activity, which reflects perceptual and cognitive aspects of concurrent speech perception. By studying multiple brain indices underlying an identical task, these studies provide a more comprehensive window into the hierarchy of brain mechanisms and time-course of concurrent speech processing

    Noise Exposure, Self-Reported Speech-in-Noise Percpetion, and the Auditory Brainstem Response in Normal-Hearing Human Ears

    Get PDF
    Difficulty understanding speech-in-noise (SIN) is a common complaint among many listeners. There is emerging evidence that noise exposure is associated with difficulties in speech discrimination and temporal processing despite normal audiometric thresholds. At present, evidence linking temporary noise-induced hearing loss and selective loss of low spontaneous rate fibers in human ears is limited and inconsistent. Likewise, results of SIN measures in relation to noise-induced cochlear synaptopathy varied across studies. The goals of this study are to further our understanding of the effects of noise exposure on the auditory system and to investigate novel approaches for detecting early noise-induced auditory damage. Data were collected from 30 normal-hearing subjects (18-35 years old) with varying amounts of noise exposure. Auditory brainstem responses (ABR) were recorded to both a click (measure of auditory nerve function) and speech stimulus (/da/; measure of temporal processing). The speech hearing subscale of the Speech, Spatial and Qualities of Hearing Scale (SSQ) was also administered to quantify individual self-reported SIN abilities. The data resulted in mixed findings. Overall click-ABR wave I results provided no evidence for noise-induced synaptopathy in this cohort. However, differences in the wave I amplitude between males and females were observed suggesting noise effects may vary between sexes. Transient components of the speech-ABR showed no evidence of neural slowing but revealed enhanced neural responses in individuals with greater amounts of noise exposure. This later finding may be a manifestation of either musical training or increased central neural gain as a result of pathology. Lastly, individuals with greater amounts of noise exposure reported experiencing more difficulties hearing SIN (as per the SSQ) but ABR data did not show the predicted physiologic evidence to explain the self-perceived SIN deficit

    The role of sound offsets in auditory temporal processing and perception

    Get PDF
    Sound-offset responses are distinct to sound onsets in their underlying neural mechanisms, temporal processing pathways and roles in auditory perception following recent neurobiological studies. In this work, I investigate the role of sound offsets and the effect of reduced sensitivity to offsets on auditory perception in humans. The implications of a 'sound-offset deficit' for speech-in-noise perception are investigated, based on a mathematical model with biological significance and independent channels for onset and offset detection. Sound offsets are important in recognising, distinguishing and grouping sounds. They are also likely to play a role in perceiving consonants that lie in the troughs of amplitude fluctuations in speech. The offset influence on the discriminability of model outputs for 48 non-sense vowel-consonant-vowel (VCV) speech stimuli in varying levels of multi-talker babble noise (-12, -6, 0, 6, 12 dB SNR) was assessed, and led to predictions that correspond to known phonetic categories. This work therefore suggests that variability in the offset salience alone can explain the rank order of consonants most affected in noisy situations. A novel psychophysical test battery for offset sensitivity was devised and assessed, followed by a study to find an electrophysiological correlate. The findings suggest that individual differences in sound-offset sensitivity may be a factor contributing to inter-subject variation in speech-in-noise discrimination ability. The promising measures from these results can be used to test between-population differences in offset sensitivity, with more support for objective than psychophysical measures. In the electrophysiological study, offset responses in a duration discrimination paradigm were found to be modulated by attention compared to onset responses. Overall, this thesis shows for the first time that the onset-offset dichotomy in the auditory system, previously explored in physiological studies, is also evident in human studies for both simple and complex speech sounds

    Cortical and subcortical speech-evoked responses in young and older adults: Effects of background noise, arousal states, and neural excitability

    Get PDF
    This thesis investigated how the brain processes speech signals in human adults across a wide age-range in the sensory auditory systems using electroencephalography (EEG). Two types of speech-evoked phase-locked responses were focused on: (i) cortical responses (theta-band phase-locked responses) that reflect processing of low-frequency slowly-varying envelopes of speech; (ii) subcortical/peripheral responses (frequency-following responses; FFRs) that reflect encoding of speech periodicity and temporal fine structure information. The aims are to elucidate how these neural activities are affected by different internal (aging, hearing loss, level of arousal and neural excitability) and external (background noise) factors during our daily life through three studies. Study 1 investigated theta-band phase-locking and FFRs in noisy environments in young and older adults. It investigated how aging and hearing loss affect these activities under quiet and noisy environments, and how these activities are associated with speech-in-noise perception. The results showed that ageing and hearing loss affect speech-evoked phase-locked responses through different mechanisms, and the effects of aging on cortical and subcortical activities take different roles in speech-in-noise perception. Study 2 investigated how level of arousal, or consciousness, affects phase-locked responses in young and older adults. The results showed that both theta-band phase-locking and FFRs decreases following decreases in the level of arousal. It was further found that neuro-regulatory role of sleep spindles on theta-band phase-locking is distinct between young and older adults, indicating that the mechanisms of neuro-regulation for phase-locked responses in different arousal states are age-dependent. Study 3 established a causal relationship between the auditory cortical excitability and FFRs using combined transcranial direct current stimulation (tDCS) and EEG. FFRs were measured before and after tDCS was applied over the auditory cortices. The results showed that changes in neural excitability of the right auditory cortex can alter FFR magnitudes along the contralateral pathway. This shows important theoretical and clinical implications that causally link functions of auditory cortex with neural encoding of speech periodicity. Taken together, findings of this thesis will advance our understanding of how speech signals are processed via neural phase-locking in our everyday life across the lifespan
    • …
    corecore