538 research outputs found

    Decoding neural responses to temporal cues for sound localization

    Get PDF
    The activity of sensory neural populations carries information about the environment. This may be extracted from neural activity using different strategies. In the auditory brainstem, a recent theory proposes that sound location in the horizontal plane is decoded from the relative summed activity of two populations in each hemisphere, whereas earlier theories hypothesized that the location was decoded from the identity of the most active cells. We tested the performance of various decoders of neural responses in increasingly complex acoustical situations, including spectrum variations, noise, and sound diffraction. We demonstrate that there is insufficient information in the pooled activity of each hemisphere to estimate sound direction in a reliable way consistent with behavior, whereas robust estimates can be obtained from neural activity by taking into account the heterogeneous tuning of cells. These estimates can still be obtained when only contralateral neural responses are used, consistently with unilateral lesion studies. DOI: http://dx.doi.org/10.7554/eLife.01312.001

    Spike-Timing-Based Computation in Sound Localization

    Get PDF
    Spike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal

    Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses

    Get PDF
    Spatial impulse response analysis techniques are commonly used in the field of acoustics, as they help to characterise the interaction of sound with an enclosed environment. This paper presents a novel approach for spatial analyses of binaural impulse responses, using a binaural model fronted neural network. The proposed method uses binaural cues utilised by the human auditory system, which are mapped by the neural network to the azimuth direction of arrival classes. A cascade-correlation neural network was trained using a multi-conditional training dataset of head-related impulse responses with added noise. The neural network is tested using a set of binaural impulse responses captured using two dummy head microphones in an anechoic chamber, with a reflective boundary positioned to produce a reflection with a known direction of arrival. Results showed that the neural network was generalisable for the direct sound of the binaural room impulse responses for both dummy head microphones. However, it was found to be less accurate at predicting the direction of arrival of the reflections. The work indicates the potential of using such an algorithm for the spatial analysis of binaural impulse responses, while indicating where the method applied needs to be made more robust for more general application

    A binaural grouping model for predicting speech intelligibility in multitalker environments

    Get PDF
    Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH

    Localization in Reverberation with Cochlear Implants

    Get PDF
    Users of bilateral cochlear implants (CIs) experience difficulties localizing sounds in reverberant rooms, even in rooms where normal-hearing listeners would hardly notice the reverberation. We measured the localization ability of seven bilateral CI users listening with their own devices in anechoic space and in a simulated reverberant room. To determine factors affecting performance in reverberant space we measured the sensitivity to interaural time differences (ITDs), interaural level differences (ILDs), and forward masking in the same participants using direct computer control of the electric stimulation in their CIs. Localization performance, quantified by the coefficient of determination r(2) and the root mean squared error, was significantly worse in the reverberant room than in anechoic conditions. Localization performance in the anechoic room, expressed as r(2), was best predicted by subject’s sensitivity to ILDs. However, the decrease in localization performance caused by reverberation was better predicted by the sensitivity to envelope ITDs measured on single electrode pairs, with a correlation coefficient of 0.92. The CI users who were highly sensitive to envelope ITDs also better maintained their localization ability in reverberant space. Results in the forward masking task added only marginally to the predictions of localization performance in both environments. The results indicate that envelope ITDs provided by CI processors support localization in reverberant space. Thus, methods that improve perceptual access to envelope ITDs could help improve localization with bilateral CIs in everyday listening situations

    Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping

    Full text link

    Computational models for listener-specific predictions of spatial audio quality

    Get PDF
    International audienceMillions of people use headphones every day for listening to music, watching movies, or communicating with others. Nevertheless, sounds presented via headphones are usually perceived inside the head instead of being localized at a naturally external position. Besides externalization and localization, spatial hearing also involves perceptual attributes like apparent source width, listener envelopment, and the ability to segregate sounds. The acoustic basis for spatial hearing is described by the listener-specific head-related transfer functions (HRTFs, Møller et al., 1995). Binaural virtual acoustics based on listener-specific HRTFs can create sounds presented via headphones being indistinguishable from natural sounds (Langendijk and Bronkhorst, 2000). In this talk, we will focus on the dimensions of sound localization that are particularly sensitive to listener-specific HRTFs, that is, along sagittal planes (i.e., vertical planes being orthogonal to the interaural axis) and near distances (sound externalization/internalization). We will discuss recent findings from binaural virtual acoustics and models aiming at predicting sound externalization (Hassager et al., 2016) and localization in sagittal planes (Baumgartner et al., 2014) considering the listener’s HRTFs. Sagittal-plane localization seems to be well understood and its model can already now reliably predict the localization performance in many listening situations (e.g., Marelli et al., 2015; Baumgartner and Majdak, 2015). In contrast, more investigation is required in order to better understand and create a valid model of sound externalization (Baumgartner et al., 2017). We aim to shed light onto the diversity of cues causing degraded sound externalization with spectral distortions by conducting a model-based meta-analysis of psychoacoustic studies. As potential cues we consider monaural and interaural spectral-shapes, spectral and temporal fluctuations of interaural level differences, interaural coherences, and broadband inconsistencies between interaural time and level differences in a highly comparable template-based modeling framework. Mere differences in sound pressure level between target and reference stimuli were used as a control cue. Our investigations revealed that the monaural spectral-shapes and the strengths of time-intensity trading are potent cues to explain previous results under anechoic conditions. However, future experiments will be required to unveil the actual essence of these cues.ReferencesBaumgartner, R., Majdak, P. (2015): Modeling Localization of Amplitude-Panned Virtual Sources in Sagittal Planes, in: Journal of Audio Engineering Society 63, 562-569.Baumgartner, R., Majdak, P., and Laback, B. (2014). “Modeling sound-source localization in sagittal planes for human listeners,” The Journal of the Acoustical Society of America 136, 791–802.Baumgartner, R., Reed, D. K., Tóth, B., Best, V., Majdak, P., Colburn, H. S., and Shinn-Cunningham, B. (2017). “Asymmetries in behavioral and neural responses to spectral cues demonstrate the generality of auditory looming bias,” Proceedings of the National Academy of Sciences 114, 9743–9748.Hassager, H. G., Gran, F., and Dau, T. (2016). “The role of spectral detail in the binaural transfer function on perceived externalization in a reverberant environment,” The Journal of the Acoustical Society of America 139, 2992–3000.Langendijk, E. H., and Bronkhorst, A. W. (2000). “Fidelity of three-dimensional-sound reproduction using a virtual auditory display,” J Acoust Soc Am 107, 528–37.Marelli, D., Baumgartner, R., and Majdak, P. (2015). “Efficient Approximation of Head-Related Transfer Functions in Subbands for Accurate Sound Localization,” IEEE Transactions on Audio, Speech, and Language Processing 23, 1130–1143.Møller, H., Sørensen, M. F., Hammershøi, D., and Jensen, C. B. (1995). “Head-related transfer functions of human subjects,” J Audio Eng Soc 43, 300–321

    A hemispheric two-channel code accounts for binaural unmasking in humans

    Full text link
    The ability to localize sound sources relies on differences between the signals at the two ears. These differences are also the basis for binaural unmasking, an improvement in detecting or understanding a sound masked by sources from other locations. The neurocomputational operation that underlies binaural unmasking is still a matter of debate. Current models rely on the cross-correlation function of the signals at the two ears, the neuronal substrate of which has been observed in the barn owl but not in mammals. This disagreement lead to the formulation of an alternative coding mechanism where interaural differences are encoded using the neuronal activity within two hemispheric channels. This mechanism agrees with mammalian physiology but has not yet been shown to account for binaural unmasking in humans. This study introduces a new mathematical formulation for the two-channel model, which is then used to explain the outcome of an extensive library of psychoacoustic experiments

    IIR modeling of interpositional transfer functions with a genetic algorithm aided by an adaptive filter for the purpose of altering free-field sound localization

    Get PDF
    The psychoacoustic process of sound localization is a system of complex analysis. Scientists have found evidence that both binaural and monaural cues are responsible for determining the angles of elevation and azimuth which represent a sound source. Engineers have successfully used these cues to build mathematical localization systems. Research has indicated that spectral cues play an important role in 3-d localization. Therefore, it seems conceivable to design a filtering system which can alter the localization of a sound source, either for correctional purposes or listener preference. Such filters, known as Interpositional Transfer Functions, can be formed from division in the z-domain of Head-related Transfer Functions. HRTF’s represent the free-field response of the human body to sound processed by the ears. In filtering applications, the use of IIR filters is often favored over that of FIR filters due to their preservation of resolution while minimizing the number of required coefficients. Several methods exist for creating IIR filters from their representative FIR counterparts. For complicated filters, genetic algorithms (GAs) have proven effective. The research summarized in this thesis combines the past efforts of researchers in the fields of sound localization, genetic algorithms, and adaptive filtering. It represents the initial stage in the development of a practical system for future hardware implementation which uses a genetic algorithm as a driving engine. Under ideal conditions, an IIR filter design system has been demonstrated to successfully model several IPTF pairs which alter sound localization when applied to non-minimum phase HRTF’s obtained from free-field measurement

    Acoustic source separation based on target equalization-cancellation

    Full text link
    Normal-hearing listeners are good at focusing on the target talker while ignoring the interferers in a multi-talker environment. Therefore, efforts have been devoted to build psychoacoustic models to understand binaural processing in multi-talker environments and to develop bio-inspired source separation algorithms for hearing-assistive devices. This thesis presents a target-Equalization-Cancellation (target-EC) approach to the source separation problem. The idea of the target-EC approach is to use the energy change before and after cancelling the target to estimate a time-frequency (T-F) mask in which each entry estimates the strength of target signal in the original mixture. Once the mask is calculated, it is applied to the original mixture to preserve the target-dominant T-F units and to suppress the interferer-dominant T-F units. On the psychoacoustic modeling side, when the output of the target-EC approach is evaluated with the Coherence-based Speech Intelligibility Index (CSII), the predicted binaural advantage closely matches the pattern of the measured data. On the application side, the performance of the target-EC source separation algorithm was evaluated by psychoacoustic measurements using both a closed-set speech corpus and an open-set speech corpus, and it was shown that the target-EC cue is a better cue for source separation than the interaural difference cues
    • …
    corecore