523 research outputs found

    TEMPORAL CODING OF SPEECH IN HUMAN AUDITORY CORTEX

    Get PDF
    Human listeners can reliably recognize speech in complex listening environments. The underlying neural mechanisms, however, remain unclear and cannot yet be emulated by any artificial system. In this dissertation, we study how speech is represented in the human auditory cortex and how the neural representation contributes to reliable speech recognition. Cortical activity from normal hearing human subjects is noninvasively recorded using magnetoencephalography, during natural speech listening. It is first demonstrated that neural activity from auditory cortex is precisely synchronized to the slow temporal modulations of speech, when the speech signal is presented in a quiet listening environment. How this neural representation is affected by acoustic interference is then investigated. Acoustic interference degrades speech perception via two mechanisms, informational masking and energetic masking, which are addressed respectively by using a competing speech stream and a stationary noise as the interfering sound. When two speech streams are presented simultaneously, cortical activity is predominantly synchronized to the speech stream the listener attends to, even if the unattended, competing speech stream is 8 dB more intense. When speech is presented together with spectrally matched stationary noise, cortical activity remains precisely synchronized to the temporal modulations of speech until the noise is 9 dB more intense. Critically, the accuracy of neural synchronization to speech predicts how well individual listeners can understand speech in noise. Further analysis reveals that two neural sources contribute to speech synchronized cortical activity, one with a shorter response latency of about 50 ms and the other with a longer response latency of about 100 ms. The longer-latency component, but not the shorter-latency component, shows selectivity to the attended speech and invariance to background noise, indicating a transition from encoding the acoustic scene to encoding the behaviorally important auditory object, in auditory cortex. Taken together, we have demonstrated that during natural speech comprehension, neural activity in the human auditory cortex is precisely synchronized to the slow temporal modulations of speech. This neural synchronization is robust to acoustic interference, whether speech or noise, and therefore provides a strong candidate for the neural basis of acoustic background invariant speech recognition

    Investigating the neural code for dynamic speech and the effect of signal degradation

    Get PDF
    It is common practice in psychophysical studies to investigate speech processing by manipulating or reducing spectral and temporal information in the input signal. Such investigations, along with the often surprising performance of modern cochlear implants, have highlighted the robustness of the auditory system to severe degradations and suggest that the ability to discriminate speech sounds is fundamentally limited by the complexity of the input signal. It is not clear, however, how and to what extent this is underpinned by neural processing mechanisms. This thesis examines the effect on the neural representation of reducing spectral and temporal information in the signal. A stimulus set from an existing psychophysical study was emulated, comprising a set of 16 vowel-consonant-vowel phoneme sequences (VCVs) each produced by multiple talkers, which were parametrically degraded using a noise-vocoder. Neuronal representations were simulated using a published computational model of the auditory nerve. Representations were also recorded in the inferior colliculus (IC) and auditory cortex (AC) of anaesthetised guinea pigs. Their discriminability was quantified using a novel neural classifier. Commensurate with investigations using simple stimuli, high rate envelope modulations in complex signals are represented in the auditory nerve and midbrain. It is demonstrated here that representations of these features are efficacious in a closed-set speech recognition task where appropriate decoding mechanisms are available, yet do not appear to be accessible perceptually. Optimal encoding windows for speech discrimination increase from of the order of 1 millisecond in the auditory nerve to 10s of milliseconds in the IC and the AC. Recent publications suggest that millisecond-precise neuronal activity is important for speech recognition. It is demonstrated here that the relevance of millisecond-precise responses in this context is highly dependent on the brain region, the nature of the speech recognition task and the complexity of the stimulus set

    Coding Strategies for Cochlear Implants Under Adverse Environments

    Get PDF
    Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise

    Place-based mapping with electric-acoustic stimulation

    Get PDF
    The goals of this dissertation were to understand the influence of electric frequency-to-place mismatches on the speech recognition of listeners of electric-acoustic stimulation (EAS) and whether listeners would experience better speech recognition with maps derived from a strict place-based mapping as compared to alternative mapping procedures. Current default EAS mapping procedures do not account for the individual variation in electrode array placement relative to cochlear tonotopicity, resulting in electric frequency-to-place mismatches. The strict place-based mapping procedure assigns the electric filter frequencies to match the cochlear place frequencies for electrodes in the low-to-mid frequency region and distributes the remaining high-frequency information across electrodes in the basal region. The rationales for this procedure are that eliminating mismatches will improve speech recognition since 1) critical speech information is provided by the mid-frequencies and 2) better spectral resolution of low-frequency cues may support better performance in noise. EAS simulation studies find acute masked speech recognition is significantly better with strict place-based maps as compared to maps with spectral shifts. For the present work, the first experiment evaluated the effectiveness of the strict place-based mapping procedure to an alternative full-frequency place-based mapping procedure using simulations of short electrode arrays at shallow angular insertion depths. Recipients of short arrays (e.g., ≤ 24 mm) may experience limited benefit with strict place-based maps since speech information below the frequency of the most apical electrode is discarded. The full-frequency place-based map would provide more low-frequency information yet present spectral shifts for the electrodes below the 1 kHz cochlear region. For the EAS simulations, performance with the strict map remained stable across cases, while performance with the full-frequency map improved with decreases in AID. The second experiment assessed the pattern of speech recognition acclimatization for EAS users listening with either a strict place-based map or default map. Poorer performance was observed for EAS users with larger magnitudes of electric mismatch out to 6-months post-activation. Taken together, the results from this dissertation suggest that eliminating electric frequency-to-place mismatches such as with the strict place-based mapping procedure supports better early speech recognition for EAS users than alternative mapping procedures.Doctor of Philosoph

    Investigating the neural code for dynamic speech and the effect of signal degradation

    Get PDF
    It is common practice in psychophysical studies to investigate speech processing by manipulating or reducing spectral and temporal information in the input signal. Such investigations, along with the often surprising performance of modern cochlear implants, have highlighted the robustness of the auditory system to severe degradations and suggest that the ability to discriminate speech sounds is fundamentally limited by the complexity of the input signal. It is not clear, however, how and to what extent this is underpinned by neural processing mechanisms. This thesis examines the effect on the neural representation of reducing spectral and temporal information in the signal. A stimulus set from an existing psychophysical study was emulated, comprising a set of 16 vowel-consonant-vowel phoneme sequences (VCVs) each produced by multiple talkers, which were parametrically degraded using a noise-vocoder. Neuronal representations were simulated using a published computational model of the auditory nerve. Representations were also recorded in the inferior colliculus (IC) and auditory cortex (AC) of anaesthetised guinea pigs. Their discriminability was quantified using a novel neural classifier. Commensurate with investigations using simple stimuli, high rate envelope modulations in complex signals are represented in the auditory nerve and midbrain. It is demonstrated here that representations of these features are efficacious in a closed-set speech recognition task where appropriate decoding mechanisms are available, yet do not appear to be accessible perceptually. Optimal encoding windows for speech discrimination increase from of the order of 1 millisecond in the auditory nerve to 10s of milliseconds in the IC and the AC. Recent publications suggest that millisecond-precise neuronal activity is important for speech recognition. It is demonstrated here that the relevance of millisecond-precise responses in this context is highly dependent on the brain region, the nature of the speech recognition task and the complexity of the stimulus set

    Analysis and correction of the helium speech effect by autoregressive signal processing

    Get PDF
    SIGLELD:D48902/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
    • …
    corecore