56 research outputs found

    Speech intelligibility in virtual restaurants

    Get PDF
    Speech reception thresholds (SRTs) for a target voice on the same virtual table were measured in various restaurant simulations under conditions of masking by between 1 and 8 interferers at other tables. Results for different levels of reverberation and different simulation techniques were qualitatively similar. SRTs increased steeply with the number of interferers, reflecting progressive failure to perceptually unmask the target speech as the acoustic scene became more complex. For a single interferer, continuous noise was the most effective masker, and a single interfering voice of either gender was least effective. With two interferers, evidence of informational masking emerged as a difference in SRT between forward and reversed speech, but SRTs for all interferer types progressively converged at 4 and 8 interferers. In simulation based on a real room, this occurred at a signal-to-noise ratio of around -5 dB

    Speech intelligibility prediction in reverberation: Towards an integrated model of speech transmission, spatial unmasking, and binaural de-reverberation

    Get PDF
    Room acoustic indicators of intelligibility have focused on the effects of temporal smearing of speech by reverberation and masking by diffuse ambient noise. In the presence of a discrete noise source, these indicators neglect the binaural listener's ability to separate target speech from noise. Lavandier and Culling [(2010). J. Acoust. Soc. Am. 127, 387–399] proposed a model that incorporates this ability but neglects the temporal smearing of speech, so that predictions hold for near-field targets. An extended model based on useful-to-detrimental (U/D) ratios is presented here that accounts for temporal smearing, spatial unmasking, and binaural de-reverberation in reverberant environments. The influence of the model parameters was tested by comparing the model predictions with speech reception thresholds measured in three experiments from the literature. Accurate predictions were obtained by adjusting the parameters to each room. Room-independent parameters did not lead to similar performances, suggesting that a single U/D model cannot be generalized to any room. Despite this limitation, the model framework allows to propose a unified interpretation of spatial unmasking, temporal smearing, and binaural de-reverberation. I. INTROD

    Psychoacoustic measurement of phase and level for cross-talk cancellation using bilateral bone transducers: Comparison of methods

    Get PDF
    Two bone-conduction hearing aids (BCHAs) could deliver improved stereo separation using cross-talk cancellation. Sound vibrations from each BCHA would be cancelled at the contralateral cochlea by an out-of-phase signal of the same level from the ipsilateral BCHA. A method to measure the level and phase required for these cancellation signals was developed and cross-validated with an established technique that combines air- and bone-conducted sound. Three participants with normal hearing wore bone transducers (BTs) on each mastoid and insert earphones. Both BTs produced a pure tone and the level and phase were adjusted in the right BT in order to cancel all perceived sound at that ear. To cross-validate, one BT was stimulated with a pure tone and participants cancelled the resultant signal at both cochleae via adjustment of the phase and level of signals from the earphones. Participants achieved cancellation using both methods between 1.5 and 8 kHz. Levels measured with each method differed by <1 dB between 3 and 5 kHz. The phase results also corresponded well for the cancelled ear (11° mean difference) but poorly for the contralateral ear (38.4° mean difference). The first method is transferable to patients with middle-ear dysfunction, but covers a limited frequency range

    Measurements of inter-cochlear level and phase differences of bone-conducted sound

    Get PDF
    Bone-anchored hearing aids are a widely used method of treating conductive hearing loss, but the benefit of bilateral implantation is limited due to interaural cross-talk. The present study measured the phase and level of pure tones reaching each cochlea from a single, mastoid placed bone transducer on normal hearing participants. In principle, the technique could be used to implement a cross-talk cancellation system in those with bilateral bone conductors. The phase and level of probe tones over two insert earphones was adjusted until they canceled sound from a bone transducer (i.e., resulting in perceived silence). Testing was performed in 50-Hz steps between 0.25 and 8 kHz. Probe phase and level results were used to calculate inter-cochlear level and phase differences. The inter-cochlear phase differences of the bone-conducted sound were similar for all three participants showing a relatively linear increase between 4 and 8 kHz. The attenuation characteristics were highly variable over the frequency range as well as between participants. This variability was thought to be related to differences in skull dynamics across the ears. Repeated measurements of cancellation phase and level of the same frequency produced good consistency across sessions from the same participant

    Timing of head turns to upcoming talkers in triadic conversation: evidence for prediction of turn-ends and interruptions

    Get PDF
    In conversation, people are able to listen to an utterance and respond within only a few hundred milliseconds. It takes substantially longer to prepare even a simple utterance, suggesting that interlocutors may make use of predictions about when the talker is about to end. But it is not only the upcoming talker that needs to anticipate the prior talker ending—listeners that are simply following the conversation could also benefit from predicting the turn end in order to shift attention appropriately with the turn switch. In this paper, we examined whether people predict upcoming turn ends when watching conversational turns switch between others by analysing natural conversations. These conversations were between triads of older adults in different levels and types of noise. The analysis focused on the observer during turn switches between the other two parties using head orientation (i.e. saccades from one talker to the next) to identify when their focus moved from one talker to the next. For non-overlapping utterances, observers started to turn to the upcoming talker before the prior talker had finished speaking in 17% of turn switches (going up to 26% when accounting for motor-planning time). For overlapping utterances, observers started to turn towards the interrupter before they interrupted in 18% of turn switches (going up to 33% when accounting for motor-planning time). The timing of head turns was more precise at lower than higher noise levels, and was not affected by noise type. These findings demonstrate that listeners in natural group conversation situations often exhibit head movements that anticipate the end of one conversational turn and the beginning of another. Furthermore, this work demonstrates the value of analysing head movement as a cue to social attention, which could be relevant for advancing communication technology such as hearing devices

    Timing of head turns to upcoming talkers in triadic conversation: Evidence for prediction of turn ends and interruptions

    Get PDF
    In conversation, people are able to listen to an utterance and respond within only a few hundred milliseconds. It takes substantially longer to prepare even a simple utterance, suggesting that interlocutors may make use of predictions about when the talker is about to end. But it is not only the upcoming talker that needs to anticipate the prior talker ending—listeners that are simply following the conversation could also benefit from predicting the turn end in order to shift attention appropriately with the turn switch. In this paper, we examined whether people predict upcoming turn ends when watching conversational turns switch between others by analysing natural conversations. These conversations were between triads of older adults in different levels and types of noise. The analysis focused on the observer during turn switches between the other two parties using head orientation (i.e. saccades from one talker to the next) to identify when their focus moved from one talker to the next. For non-overlapping utterances, observers started to turn to the upcoming talker before the prior talker had finished speaking in 17% of turn switches (going up to 26% when accounting for motor-planning time). For overlapping utterances, observers started to turn towards the interrupter before they interrupted in 18% of turn switches (going up to 33% when accounting for motor-planning time). The timing of head turns was more precise at lower than higher noise levels, and was not affected by noise type. These findings demonstrate that listeners in natural group conversation situations often exhibit head movements that anticipate the end of one conversational turn and the beginning of another. Furthermore, this work demonstrates the value of analysing head movement as a cue to social attention, which could be relevant for advancing communication technology such as hearing devices

    Cochlear implant simulator with independent representation of the full spiral ganglion

    Get PDF
    In cochlear implant simulation with vocoders, narrow-band carriers deliver the envelopes from each analysis band to the cochlear positions of the simulated electrodes. However, this approach does not faithfully represent the continuous nature of the spiral ganglion. The proposed “SPIRAL” vocoder simulates current spread by mixing all envelopes across many tonal carriers. SPIRAL demonstrated that the classic finding of reduced speech-intelligibility benefit with additional electrodes could be due to current spread. SPIRAL produced lower speech reception thresholds than an equivalent noise vocoder. These thresholds are stable for between 20 and 160 carriers

    Intermural correlation sensitivity

    Get PDF
    Abstract: Sensitivity to differences in interauraF correlation was measured as a function of reference intermural correlation and frequency (250 to 15W Hz) for narrowband-noise stimuli (1.3 ERBs wide) and for the same stimuli spectrally fringed by broadband correlated noise. d&apos; was measured for twe-interval discriminations betweerr fixed pairs of correlation values, and these measurements were used to generate cumulative d&apos; versus correlation curves for each stimulus frequency and type. The perceptual cue reported by subjects was perceived intracranial breadth for narrowbarrd stimuli (wider image for lower correlation) and loudness of a whistling sound heard at the frequency of the decorrelated band for the fringed stimuli (louder for lower correlation). At low correlations, sensitivity was greater for fringed than for narrowband stimuli at all frequencies, but at higher correlations, sensitivity was often greater for narrowband stimuli. For fringed stimuli, cumulative sensitivity was greater at low frequencies than at high frequencies, but listeners produced varied patterns for narrowband stimuli. The forms of cumulative d&apos; curves as a function of frequency were interpolated using an eight-parameter fitted function. Such functions may be used to predict listeners&apos; perceptions of stimuli that vary across frequency in intermuralcorrelation

    The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources

    Get PDF
    Three experiments investigated the roles of interaural time differences ͑ITDs͒ and level differences ͑ILDs͒ in spatial unmasking in multi-source environments. In experiment 1, speech reception thresholds ͑SRTs͒ were measured in virtual-acoustic simulations of an anechoic environment with three interfering sound sources of either speech or noise. The target source lay directly ahead, while three interfering sources were ͑1͒ all at the target&apos;s location ͑0°,0°,0°͒, ͑2͒ at locations distributed across both hemifields ͑Ϫ30°,60°,90°͒, ͑3͒ at locations in the same hemifield ͑30°,60°,90°͒, or ͑4͒ co-located in one hemifield ͑90°,90°,90°͒. Sounds were convolved with head-related impulse responses ͑HRIRs͒ that were manipulated to remove individual binaural cues. Three conditions used HRIRs with ͑1͒ both ILDs and ITDs, ͑2͒ only ILDs, and ͑3͒ only ITDs. The ITD-only condition produced the same pattern of results across spatial configurations as the combined cues, but with smaller differences between spatial configurations. The ILD-only condition yielded similar SRTs for the ͑Ϫ30°,60°,90°͒ and ͑0°,0°,0°͒ configurations, as expected for best-ear listening. In experiment 2, pure-tone BMLDs were measured at third-octave frequencies against the ITD-only, speech-shaped noise interferers of experiment 1. These BMLDs were 4 -8 dB at low frequencies for all spatial configurations. In experiment 3, SRTs were measured for speech in diotic, speech-shaped noise. Noises were filtered to reduce the spectrum level at each frequency according to the BMLDs measured in experiment 2. SRTs were as low or lower than those of the corresponding ITD-only conditions from experiment 1. Thus, an explanation of speech understanding in complex listening environments based on the combination of best-ear listening and binaural unmasking ͑without involving sound-localization͒ cannot be excluded
    corecore