2,260 research outputs found
A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications
Auditory models are commonly used as feature extractors for automatic
speech-recognition systems or as front-ends for robotics, machine-hearing and
hearing-aid applications. Although auditory models can capture the biophysical
and nonlinear properties of human hearing in great detail, these biophysical
models are computationally expensive and cannot be used in real-time
applications. We present a hybrid approach where convolutional neural networks
are combined with computational neuroscience to yield a real-time end-to-end
model for human cochlear mechanics, including level-dependent filter tuning
(CoNNear). The CoNNear model was trained on acoustic speech material and its
performance and applicability were evaluated using (unseen) sound stimuli
commonly employed in cochlear mechanics research. The CoNNear model accurately
simulates human cochlear frequency selectivity and its dependence on sound
intensity, an essential quality for robust speech intelligibility at negative
speech-to-background-noise ratios. The CoNNear architecture is based on
parallel and differentiable computations and has the power to achieve real-time
human performance. These unique CoNNear features will enable the next
generation of human-like machine-hearing applications
A physiologically inspired model for solving the cocktail party problem.
At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.R01 DC000100 - NIDCD NIH HHSPublished versio
On the origins of the compressive cochlear nonlinearity
Various simple mathematical models of the dynamics of the organ of Corti in the mammalian cochlea are analysed. The models are assessed against their ability to explain the compressive nonlinear response of the basilar membrane. The speci fic models considered are: phenomenological Hopf and cusp normal forms, a recently-proposed description combining active hair-bundle motility and somatic motility, a reduction thereof, and finally a new model highlighting the importance of the coupling between the nonlinear transduction current and somatic motility. The overall conclusion is that neither a Hopf bifurcation nor cusp bifurcation are necessary for realistic compressive nonlinearity. Moreover, two physiological models are discussed showing compressive nonlinearities similar to experimental observations without the need for tuning near any bifurcation
Decoding neural responses to temporal cues for sound localization
The activity of sensory neural populations carries information about the environment. This may be extracted from neural activity using different strategies. In the auditory brainstem, a recent theory proposes that sound location in the horizontal plane is decoded from the relative summed activity of two populations in each hemisphere, whereas earlier theories hypothesized that the location was decoded from the identity of the most active cells. We tested the performance of various decoders of neural responses in increasingly complex acoustical situations, including spectrum variations, noise, and sound diffraction. We demonstrate that there is insufficient information in the pooled activity of each hemisphere to estimate sound direction in a reliable way consistent with behavior, whereas robust estimates can be obtained from neural activity by taking into account the heterogeneous tuning of cells. These estimates can still be obtained when only contralateral neural responses are used, consistently with unilateral lesion studies. DOI: http://dx.doi.org/10.7554/eLife.01312.001
Population Coding of Interaural Time Differences in Gerbils and Barn Owls
Interaural time differences (ITDs) are the primary cue for the localization of low-frequency sound sources in the azimuthal plane. For decades, it was assumed that the coding of ITDs in the mammalian brain was similar to that in the avian brain, where information is sparsely distributed across individual neurons, but recent studies have suggested otherwise. In this study, we characterized the representation of ITDs in adult male and female gerbils. First, we performed behavioral experiments to determine the acuity with which gerbils can use ITDs to localize sounds. Next, we used different decoders to infer ITDs from the activity of a population of neurons in central nucleus of the inferior colliculus. These results show that ITDs are not represented in a distributed manner, but rather in the summed activity of the entire population. To contrast these results with those from a population where the representation of ITDs is known to be sparsely distributed, we performed the same analysis on activity from the external nucleus of the inferior colliculus of adult male and female barn owls. Together, our results support the idea that, unlike the avian brain, the mammalian brain represents ITDs in the overall activity of a homogenous population of neurons within each hemisphere
Time Domain Computation of a Nonlinear Nonlocal Cochlear Model with Applications to Multitone Interaction in Hearing
A nonlinear nonlocal cochlear model of the transmission line type is studied
in order to capture the multitone interactions and resulting tonal suppression
effects. The model can serve as a module for voice signal processing, it is a
one dimensional (in space) damped dispersive nonlinear PDE based on mechanics
and phenomenology of hearing. It describes the motion of basilar membrane (BM)
in the cochlea driven by input pressure waves. Both elastic damping and
selective longitudinal fluid damping are present. The former is nonlinear and
nonlocal in BM displacement, and plays a key role in capturing tonal
interactions. The latter is active only near the exit boundary (helicotrema),
and is built in to damp out the remaining long waves. The initial boundary
value problem is numerically solved with a semi-implicit second order finite
difference method. Solutions reach a multi-frequency quasi-steady state.
Numerical results are shown on two tone suppression from both high-frequency
and low-frequency sides, consistent with known behavior of two tone
suppression. Suppression effects among three tones are demonstrated by showing
how the response magnitudes of the fixed two tones are reduced as we vary the
third tone in frequency and amplitude. We observe qualitative agreement of our
model solutions with existing cat auditory neural data. The model is thus
simple and efficient as a processing tool for voice signals.Comment: 23 pages,7 figures; added reference
- …