304 research outputs found

    Effect of Reverberation Context on Spatial Hearing Performance of Normally Hearing Listeners

    Get PDF
    Previous studies provide evidence that listening experience in a particular reverberant environment improves speech intelligibility and localization performance in that environment. Such studies, however, are few, and there is little knowledge of the underlying mechanisms. The experiments presented in this thesis explored the effect of reverberation context, in particular, the similarity in interaural coherence within a context, on listeners\u27 performance in sound localization, speech perception in a spatially separated noise, spatial release from speech-on-speech masking, and target location identification in a multi-talker configuration. All experiments were conducted in simulated reverberant environments created with a loudspeaker array in an anechoic chamber. The reflections comprising the reverberation in each environment had the same temporal and relative amplitude patterns, but varied in their lateral spread, which affected the interaural coherence of reverberated stimuli. The effect of reverberation context was examined by comparing performance in two reverberation contexts, mixed and fixed. In the mixed context, the reverberation environment applied to each stimulus varied trial-by-trial, whereas in the fixed context, the reverberation environment was held constant within a block of trials. In Experiment I (absolute judgement of sound location), variability in azimuth judgments was lower in the fixed than in the mixed context, suggesting that sound localization depended not only on the cues presented in isolated trials. In Experiment II, the intelligibility of speech in a spatially separated noise was found to be similar in both reverberation contexts. That result contrasts with other studies, and suggests that the fixed context did not assist listeners in compensating for degraded interaural coherence. In Experiment III, speech intelligibility in multi-talker configurations was found to be better in the fixed context, but only when the talkers were separated. That is, the fixed context improved spatial release from masking. However, in the presence of speech maskers, consistent reverberation did not improve the localizability of the target talker in a three-alternative location-identification task. Those results suggest that in multi-talker situations, consistent coherence may not improve target localizability, but rather that consistent context may facilitate the buildup of spatial selective attention

    Acoustic source separation based on target equalization-cancellation

    Full text link
    Normal-hearing listeners are good at focusing on the target talker while ignoring the interferers in a multi-talker environment. Therefore, efforts have been devoted to build psychoacoustic models to understand binaural processing in multi-talker environments and to develop bio-inspired source separation algorithms for hearing-assistive devices. This thesis presents a target-Equalization-Cancellation (target-EC) approach to the source separation problem. The idea of the target-EC approach is to use the energy change before and after cancelling the target to estimate a time-frequency (T-F) mask in which each entry estimates the strength of target signal in the original mixture. Once the mask is calculated, it is applied to the original mixture to preserve the target-dominant T-F units and to suppress the interferer-dominant T-F units. On the psychoacoustic modeling side, when the output of the target-EC approach is evaluated with the Coherence-based Speech Intelligibility Index (CSII), the predicted binaural advantage closely matches the pattern of the measured data. On the application side, the performance of the target-EC source separation algorithm was evaluated by psychoacoustic measurements using both a closed-set speech corpus and an open-set speech corpus, and it was shown that the target-EC cue is a better cue for source separation than the interaural difference cues

    Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

    Get PDF
    International audienceThis paper addresses the problem of sound-source localization (SSL) with a robot head, which remains a challenge in real-world environments. In particular we are interested in locating speech sources, as they are of high interest for human-robot interaction. The microphone-pair response corresponding to the direct-path sound propagation is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function (ATF) of the two microphones, and it is an important feature for SSL. We propose a method to estimate the DP-RTF from noisy and reverberant signals in the short-time Fourier transform (STFT) domain. First, the convolutive transfer function (CTF) approximation is adopted to accurately represent the impulse response of the microphone array, and the first coefficient of the CTF is mainly composed of the direct-path ATF. At each frequency, the frame-wise speech auto-and cross-power spectral density (PSD) are obtained by spectral subtraction. Then a set of linear equations is constructed by the speech auto-and cross-PSD of multiple frames, in which the DP-RTF is an unknown variable, and is estimated by solving the equations. Finally, the estimated DP-RTFs are concatenated across frequencies and used as a feature vector for SSL. Experiments with a robot, placed in various reverberant environments, show that the proposed method outperforms two state-of-the-art methods

    Assessing HRTF preprocessing methods for Ambisonics rendering through perceptual models

    Get PDF
    Binaural rendering of Ambisonics signals is a common way to reproduce spatial audio content. Processing Ambisonics signals at low spatial orders is desirable in order to reduce complexity, although it may degrade the perceived quality, in part due to the mismatch that occurs when a low-order Ambisonics signal is paired with a spatially dense head-related transfer function (HRTF). In order to alleviate this issue, the HRTF may be preprocessed so its spatial order is reduced. Several preprocessing methods have been proposed, but they have not been thoroughly compared yet. In this study, nine HRTF preprocessing methods were used to render anechoic binaural signals from Ambisonics representations of orders 1 to 44, and these were compared through perceptual hearing models in terms of localisation performance, externalisation and speech reception. This assessment was supported by numerical analyses of HRTF interpolation errors, interaural differences, perceptually-relevant spectral differences, and loudness stability. Models predicted that the binaural renderings’ accuracy increased with spatial order, as expected. A notable effect of the preprocessing method was observed: whereas all methods performed similarly at the highest spatial orders, some were considerably better at lower orders. A newly proposed method, BiMagLS, displayed the best performance overall and is recommended for the rendering of bilateral Ambisonics signals. The results, which were in line with previous literature, indirectly validate the perceptual models’ ability to predict listeners’ responses in a consistent and explicable manner

    Movements in Binaural Space: Issues in HRTF Interpolation and Reverberation, with applications to Computer Music

    Get PDF
    This thesis deals broadly with the topic of Binaural Audio. After reviewing the literature, a reappraisal of the minimum-phase plus linear delay model for HRTF representation and interpolation is offered. A rigorous analysis of threshold based phase unwrapping is also performed. The results and conclusions drawn from these analyses motivate the development of two novel methods for HRTF representation and interpolation. Empirical data is used directly in a Phase Truncation method. A Functional Model for phase is used in the second method based on the psychoacoustical nature of Interaural Time Differences. Both methods are validated; most significantly, both perform better than a minimum-phase method in subjective testing. The accurate, artefact-free dynamic source processing afforded by the above methods is harnessed in a binaural reverberation model, based on an early reflection image model and Feedback Delay Network diffuse field, with accurate interaural coherence. In turn, these flexible environmental processing algorithms are used in the development of a multi-channel binaural application, which allows the audition of multi-channel setups in headphones. Both source and listener are dynamic in this paradigm. A GUI is offered for intuitive use of the application. HRTF processing is thus re-evaluated and updated after a review of accepted practice. Novel solutions are presented and validated. Binaural reverberation is recognised as a crucial tool for convincing artificial spatialisation, and is developed on similar principles. Emphasis is placed on transparency of development practices, with the aim of wider dissemination and uptake of binaural technology

    The effects of monaural and binaural cues on perceived reverberation by normal hearing and hearing-impaired listeners.

    Get PDF
    This dissertation is a quantitative and qualitative examination of how young normal hearing and young hearing-impaired listeners perceive reverberation. A primary complaint among hearing-impaired listeners is difficulty understanding speech in noisy or reverberant environments. This work was motivated by a desire to better understand reverberation perception and processing so that this knowledge might be used to improve outcomes for hearing-impaired listeners in these environments. This dissertation is written in six chapters. Chapter One is an introduction to the field and a review of the relevant literature. Chapter Two describes a motivating experiment from laboratory work completed before the dissertation. This experiment asked human subjects to rate the amount of reverberation they perceived in a sound relative to another sound. This experiment showed a significant effect of listening condition on how listeners made their judgments. Chapter Three follows up on this experiment, seeking a better understanding of how listeners perform the task in Chapter Two. Chapter Three shows that listeners can use limited information to make their judgments. Chapter Four compares reverberation perception in normal hearing and hearing-impaired listeners and examines the effect of speech intelligibility on reverberation perception. This experiment finds no significant differences between cues used by normal hearing and hearing-impaired listeners when judging perceptual aspects of reverberation. Chapter Five describes and uses a quantitative model to examine the results of Chapters Two and Four. Chapter Six summarizes the data presented in the dissertation and discusses potential implications and future directions. This work finds that the perceived amount of reverberation relies primarily on two factors: 1) the listening condition (i.e., binaural, monaural, or a listening condition in which reverberation is present only in one ear) and 2) the sum of reverberant energy present at the two ears. Listeners do not need the reverberant tail to estimate perceived amount of reverberation, meaning that listeners are able to extract information about reverberation from the ongoing signal. The precise mechanism underlying this process is not explicitly found in this work; however, a potential framework is presented in Chapter Six

    A binaural auditory model for evaluating quality aspects in reproduced sound

    Get PDF
    Ihmisen kuulojärjestelmän kyky paikantaa äänilähteitä perustuu korviin saapuvien äänten välisten vaihe- ja tasoerojen analysointiin. Näiden binauraalisten vihjeiden avulla voimme erotella eri ääniähteiden sijainnit myös useiden samanaikaisten äänien läsnäollessa. Viimeaikaiset tutkimukset ja auditoriset mallit ovat osoittaneet kuinka nämä erot voidaan arvioida ristikorrelaation avulla ja kuinka täten voidaan mallintaa kuulojärjestelmämmme kykyä paikantaa ääniä. Tässä diplomityössä esitellään tähän lähestymistapaan ja nykyisiin auditorisiin malleihin pohjautuva uusi binauraalinen kuulon malli. Työn tavoitteena on pystyä arvoimaan binauraalisesti nauhoitetun äänen tilavaikutelmaan ja väriin liittyviä ominaisuuksia kehitetyn mallin avulla. Binauraalisten vihjeiden arviointi mallissa perustuu Jeffressin ristikorrelaatiomalliin, ottaen huomioon myös basilaarikalvon taajuuserottelukyvyn vaikutuksen äänilähteiden erottelukykyyn. Työn tavoitteena on tämän lähestymistavan avulla pystyä paikantamaan äänilähteitä laajakaistaisesta signaalista ja arvioida sitten äänen tilavaikutelmaan liittyviä ominaisuuksia näiden paikannusten avulla. Tässä työssä esitettävässä mallissa nauhoitetusta äänestä lasketaan myös osaäänekkyystiheysspektri, jossa kuulojärjestelmän eri osien vaikutukset ääneen on huomioitu. Näitä spektrejä käytetään sitten nauhoitetun äänen äänekkyyteen ja väriin liittyvien ominaisuuksien arvioinnissa. Näin ollen tämä diplomityö esittelee mahdollisuuden käyttää binauraalista kuulon mallia äänenlaadun arvointiin äänen tilavaikutelmaan, äänekkyyteen ja väriin liittyvien ominaisuuksien avulla.Binaural cues describing the differences between signals in the left and righ ears, in terms of phase and power, enable our auditory system to localize and segregate sound sources spatially even in the presence of multiple overlapping sound stimuli. Recent publications and binaural auditory models have illustrated how interaural coherence can be used to estimate these cues and thus model the capability of our auditory system to localize sounds. In this Master's thesis this approach is developed further and a new binaural auditory model is presented. The model is built on some of the existing auditory models. The aim is to use the model to evaluate binaural recordings of reproduced sound in terms of spatial and timbral aspects. The binaural cue estimation is based on the cross-correlation model by Jeffress and the binaural cues are estimated in this model by taking into account the frequency selectivity of the peripheral hearing. The purpose of this approach is to localize sound sources from a broadband signal and to evaluate the spatial aspects based on these localizations. Composite loudness level spectra are also calculated in this work by modeling the transfer functions of the peripheral auditory system. These spectra enable the analysis of the frequency balance from reproduced sound. Consequently, this Master's thesis illustrates the possible application of a binaural auditory model to the analysis of reproduced sound in terms of loudness, timbral and spatial aspects
    corecore