1,088 research outputs found

    A physiologically inspired model for solving the cocktail party problem.

    Get PDF
    At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.R01 DC000100 - NIDCD NIH HHSPublished versio

    Heschl's gyrus is more sensitive to tone level than non-primary auditory cortex

    Get PDF
    Previous neuroimaging studies generally demonstrate a growth in the cortical response with an increase in sound level. However, the details of the shape and topographic location of such growth remain largely unknown. One limiting methodological factor has been the relatively sparse sampling of sound intensities. Additionally, most studies have either analysed the entire auditory cortex without differentiating primary and non-primary regions or have limited their analyses to Heschl's gyrus (HG). Here, we characterise the pattern of responses to a 300-Hz tone presented in 6-dB steps from 42 to 96 dB sound pressure level as a function of its sound level, within three anatomically defined auditory areas; the primary area, on HG, and two non-primary areas, consisting of a small area lateral to the axis of HG (the anterior lateral area, ALA) and the posterior part of auditory cortex (the planum temporale, PT). Extent and magnitude of auditory activation increased non-linearly with sound level. In HG, the extent and magnitude were more sensitive to increasing level than in ALA and PT. Thus, HG appears to have a larger involvement in sound-level processing than does ALA or PT

    Simple acoustic features can explain phoneme-based predictions of cortical responses to speech

    Get PDF
    When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representations are guided by linguistically defined concepts, such as phonemes. Here, we argue that in order to arrive at an unbiased understanding of the neuronal responses to speech, we should focus instead on representations obtained directly from the stimulus. We illustrate our view with a data-driven, information theoretic analysis of a dataset of 24 young, healthy humans who listened to a 1 h narrative while their magnetoencephalogram (MEG) was recorded. We find that two recent results, the improved performance of an encoding model in which annotated linguistic and acoustic features were combined and the decoding of phoneme subgroups from phoneme-locked responses, can be explained by an encoding model that is based entirely on acoustic features. These acoustic features capitalize on acoustic edges and outperform Gabor-filtered spectrograms, which can explicitly describe the spectrotemporal characteristics of individual phonemes. By replicating our results in publicly available electroencephalography (EEG) data, we conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we believe that in order to further our understanding of human cortical responses to speech, we should also explore low-level and parsimonious explanations for apparent high-level phenomena

    Neural encoding of the speech envelope by children with developmental dyslexia.

    Get PDF
    Developmental dyslexia is consistently associated with difficulties in processing phonology (linguistic sound structure) across languages. One view is that dyslexia is characterised by a cognitive impairment in the "phonological representation" of word forms, which arises long before the child presents with a reading problem. Here we investigate a possible neural basis for developmental phonological impairments. We assess the neural quality of speech encoding in children with dyslexia by measuring the accuracy of low-frequency speech envelope encoding using EEG. We tested children with dyslexia and chronological age-matched (CA) and reading-level matched (RL) younger children. Participants listened to semantically-unpredictable sentences in a word report task. The sentences were noise-vocoded to increase reliance on envelope cues. Envelope reconstruction for envelopes between 0 and 10Hz showed that the children with dyslexia had significantly poorer speech encoding in the 0-2Hz band compared to both CA and RL controls. These data suggest that impaired neural encoding of low frequency speech envelopes, related to speech prosody, may underpin the phonological deficit that causes dyslexia across languages.Medical Research Council (Grant ID: G0902375)This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.bandl.2016.06.00

    Physiology-based model of multi-source auditory processing

    Full text link
    Our auditory systems are evolved to process a myriad of acoustic environments. In complex listening scenarios, we can tune our attention to one sound source (e.g., a conversation partner), while monitoring the entire acoustic space for cues we might be interested in (e.g., our names being called, or the fire alarm going off). While normal hearing listeners handle complex listening scenarios remarkably well, hearing-impaired listeners experience difficulty even when wearing hearing-assist devices. This thesis presents both theoretical work in understanding the neural mechanisms behind this process, as well as the application of neural models to segregate mixed sources and potentially help the hearing impaired population. On the theoretical side, auditory spatial processing has been studied primarily up to the midbrain region, and studies have shown how individual neurons can localize sounds using spatial cues. Yet, how higher brain regions such as the cortex use this information to process multiple sounds in competition is not clear. This thesis demonstrates a physiology-based spiking neural network model, which provides a mechanism illustrating how the auditory cortex may organize up-stream spatial information when there are multiple competing sound sources in space. Based on this model, an engineering solution to help hearing-impaired listeners segregate mixed auditory inputs is proposed. Using the neural model to perform sound-segregation in the neural domain, the neural outputs (representing the source of interest) are reconstructed back to the acoustic domain using a novel stimulus reconstruction method.2017-09-22T00:00:00

    Representation of Instantaneous and Short-Term Loudness in the Human Cortex.

    Get PDF
    Acoustic signals pass through numerous transforms in the auditory system before perceptual attributes such as loudness and pitch are derived. However, relatively little is known as to exactly when these transformations happen, and where, cortically or sub-cortically, they occur. In an effort to examine this, we investigated the latencies and locations of cortical entrainment to two transforms predicted by a model of loudness perception for time-varying sounds: the transforms were instantaneous loudness and short-term loudness, where the latter is hypothesized to be derived from the former and therefore should occur later in time. Entrainment of cortical activity was estimated from electro- and magneto-encephalographic (EMEG) activity, recorded while healthy subjects listened to continuous speech. There was entrainment to instantaneous loudness bilaterally at 45, 100, and 165 ms, in Heschl's gyrus, dorsal lateral sulcus, and Heschl's gyrus, respectively. Entrainment to short-term loudness was found in both the dorsal lateral sulcus and superior temporal sulcus at 275 ms. These results suggest that short-term loudness is derived from instantaneous loudness, and that this derivation occurs after processing in sub-cortical structures.This work was supported by an ERC Advanced Grant (230570, ‘Neurolex’) to WMW, and by MRC Cognition and Brain Sciences Unit (CBU) funding to WMW (U.1055.04.002.00001.01). Computing resources were provided by the MRC-CBU.This is the final version of the article. It first appeared from Frontiers via http://dx.doi.org/10.3389/fnins.2016.0018
    • 

    corecore