2,260 research outputs found

    A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications

    Full text link
    Auditory models are commonly used as feature extractors for automatic speech-recognition systems or as front-ends for robotics, machine-hearing and hearing-aid applications. Although auditory models can capture the biophysical and nonlinear properties of human hearing in great detail, these biophysical models are computationally expensive and cannot be used in real-time applications. We present a hybrid approach where convolutional neural networks are combined with computational neuroscience to yield a real-time end-to-end model for human cochlear mechanics, including level-dependent filter tuning (CoNNear). The CoNNear model was trained on acoustic speech material and its performance and applicability were evaluated using (unseen) sound stimuli commonly employed in cochlear mechanics research. The CoNNear model accurately simulates human cochlear frequency selectivity and its dependence on sound intensity, an essential quality for robust speech intelligibility at negative speech-to-background-noise ratios. The CoNNear architecture is based on parallel and differentiable computations and has the power to achieve real-time human performance. These unique CoNNear features will enable the next generation of human-like machine-hearing applications

    A physiologically inspired model for solving the cocktail party problem.

    Get PDF
    At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.R01 DC000100 - NIDCD NIH HHSPublished versio

    On the origins of the compressive cochlear nonlinearity

    Get PDF
    Various simple mathematical models of the dynamics of the organ of Corti in the mammalian cochlea are analysed. The models are assessed against their ability to explain the compressive nonlinear response of the basilar membrane. The speci fic models considered are: phenomenological Hopf and cusp normal forms, a recently-proposed description combining active hair-bundle motility and somatic motility, a reduction thereof, and finally a new model highlighting the importance of the coupling between the nonlinear transduction current and somatic motility. The overall conclusion is that neither a Hopf bifurcation nor cusp bifurcation are necessary for realistic compressive nonlinearity. Moreover, two physiological models are discussed showing compressive nonlinearities similar to experimental observations without the need for tuning near any bifurcation

    Decoding neural responses to temporal cues for sound localization

    Get PDF
    The activity of sensory neural populations carries information about the environment. This may be extracted from neural activity using different strategies. In the auditory brainstem, a recent theory proposes that sound location in the horizontal plane is decoded from the relative summed activity of two populations in each hemisphere, whereas earlier theories hypothesized that the location was decoded from the identity of the most active cells. We tested the performance of various decoders of neural responses in increasingly complex acoustical situations, including spectrum variations, noise, and sound diffraction. We demonstrate that there is insufficient information in the pooled activity of each hemisphere to estimate sound direction in a reliable way consistent with behavior, whereas robust estimates can be obtained from neural activity by taking into account the heterogeneous tuning of cells. These estimates can still be obtained when only contralateral neural responses are used, consistently with unilateral lesion studies. DOI: http://dx.doi.org/10.7554/eLife.01312.001

    Population Coding of Interaural Time Differences in Gerbils and Barn Owls

    Get PDF
    Interaural time differences (ITDs) are the primary cue for the localization of low-frequency sound sources in the azimuthal plane. For decades, it was assumed that the coding of ITDs in the mammalian brain was similar to that in the avian brain, where information is sparsely distributed across individual neurons, but recent studies have suggested otherwise. In this study, we characterized the representation of ITDs in adult male and female gerbils. First, we performed behavioral experiments to determine the acuity with which gerbils can use ITDs to localize sounds. Next, we used different decoders to infer ITDs from the activity of a population of neurons in central nucleus of the inferior colliculus. These results show that ITDs are not represented in a distributed manner, but rather in the summed activity of the entire population. To contrast these results with those from a population where the representation of ITDs is known to be sparsely distributed, we performed the same analysis on activity from the external nucleus of the inferior colliculus of adult male and female barn owls. Together, our results support the idea that, unlike the avian brain, the mammalian brain represents ITDs in the overall activity of a homogenous population of neurons within each hemisphere

    Time Domain Computation of a Nonlinear Nonlocal Cochlear Model with Applications to Multitone Interaction in Hearing

    Full text link
    A nonlinear nonlocal cochlear model of the transmission line type is studied in order to capture the multitone interactions and resulting tonal suppression effects. The model can serve as a module for voice signal processing, it is a one dimensional (in space) damped dispersive nonlinear PDE based on mechanics and phenomenology of hearing. It describes the motion of basilar membrane (BM) in the cochlea driven by input pressure waves. Both elastic damping and selective longitudinal fluid damping are present. The former is nonlinear and nonlocal in BM displacement, and plays a key role in capturing tonal interactions. The latter is active only near the exit boundary (helicotrema), and is built in to damp out the remaining long waves. The initial boundary value problem is numerically solved with a semi-implicit second order finite difference method. Solutions reach a multi-frequency quasi-steady state. Numerical results are shown on two tone suppression from both high-frequency and low-frequency sides, consistent with known behavior of two tone suppression. Suppression effects among three tones are demonstrated by showing how the response magnitudes of the fixed two tones are reduced as we vary the third tone in frequency and amplitude. We observe qualitative agreement of our model solutions with existing cat auditory neural data. The model is thus simple and efficient as a processing tool for voice signals.Comment: 23 pages,7 figures; added reference
    corecore