2,467 research outputs found

    A binaural grouping model for predicting speech intelligibility in multitalker environments

    Get PDF
    Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH

    Effects of cochlear implantation on binaural hearing in adults with unilateral hearing loss

    Get PDF
    A FDA clinical trial was carried out to evaluate the potential benefit of cochlear implant (CI) use for adults with unilateral moderate-to-profound sensorineural hearing loss. Subjects were 20 adults with moderate-to-profound unilateral sensorineural hearing loss and normal or near-normal hearing on the other side. A MED-EL standard electrode was implanted in the impaired ear. Outcome measures included: (a) sound localization on the horizontal plane (11 positions, −90° to 90°), (b) word recognition in quiet with the CI alone, and (c) masked sentence recognition with the target at 0° and the masker at −90°, 0°, or 90°. This battery was completed preoperatively and at 1, 3, 6, 9, and 12 months after CI activation. Normative data were also collected for 20 age-matched control subjects with normal or near-normal hearing bilaterally. The CI improved localization accuracy and reduced side bias. Word recognition with the CI alone was similar to performance of traditional CI recipients. The CI improved masked sentence recognition when the masker was presented from the front or from the side of normal or near-normal hearing. The binaural benefits observed with the CI increased between the 1- and 3-month intervals but appeared stable thereafter. In contrast to previous reports on localization and speech perception in patients with unilateral sensorineural hearing loss, CI benefits were consistently observed across individual subjects, and performance was at asymptote by the 3-month test interval. Cochlear implant settings, consistent CI use, and short duration of deafness could play a role in this result

    A physiologically inspired model for solving the cocktail party problem.

    Get PDF
    At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.R01 DC000100 - NIDCD NIH HHSPublished versio

    Improving elevation perception with a tool for image-guided head-related transfer function selection

    Get PDF
    This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a subject's pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that subject. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of an auditory model for sound localization in the mid-sagittal plane available from previous literature. Using virtual subjects from a HRTF database, a virtual experiment is implemented to assess the vertical localization performance of the database subjects when they are provided with HRTF sets selected by the proposed procedure. Results report a statistically significant improvement in predictions of localization performance for selected HRTFs compared to KEMAR HRTF which is a commercial standard in many binaural audio solutions; moreover, the proposed analysis provides useful indications to refine the perceptually-motivated metrics that guides the selection

    Relative Auditory Distance Discrimination With Virtual Nearby Sound Sources

    Get PDF
    In this paper a psychophysical experiment targeted at exploring relative distance discrimination thresholds with binaurally rendered virtual sound sources in the near field is described. Pairs of virtual sources are spatialized around 6 different spatial locations (2 directions 7 3 reference distances) through a set of generic far-field Head-Related Transfer Functions (HRTFs) coupled with a near-field correction model proposed in the literature, known as DVF (Distance Variation Function). Individual discrimination thresholds for each spatial location and for each of the two orders of presentation of stimuli (approaching or receding) are calculated on 20 subjects through an adaptive procedure. Results show that thresholds are higher than those reported in the literature for real sound sources, and that approaching and receding stimuli behave differently. In particular, when the virtual source is close (< 25 cm) thresholds for the approaching condition are significantly lower compared to thresholds for the receding condition, while the opposite behaviour appears for greater distances (~ 1 m). We hypothesize such an asymmetric bias to be due to variations in the absolute stimulus level
    • …
    corecore