161 research outputs found

    Influence of vision on short-term sound localization training with non-individualized HRTF

    Get PDF
    International audiencePrevious studies have demonstrated that it is possible for humans to adapt to new HRTF, non-individualized or altered, in a short time period. While natural adaptation, through sound exposure, takes several weeks [1], some training programs have been employed to accelerate adaptation and improve performance on sound localization in a few days (see [2] for a review). The majority of these training programs are based on audio-visual positional or response feedback learning [3] (participants correct their answer after seeing the target position), or on active learning, for example through audio-proprioceptive manipulations [4] (blindfolded participants actively explore the sphere around them by playing a mini sonified version of hot-and-cold game). While all training programs are based on a bimodal coupling (audio-vision [3] or audio-proprioception [4]), they are rarely based on a trimodal one. Therefore, if vision is not necessary for adaptation [4], and audio-visual training can even be less efficient than other methods [1,2], the role of vision in short-term audio localization training remains unclear, especially when action and proprioception are already involved. Our study compares two versions of active trainings: an audio-proprioceptive one and an audio-visuo-proprioceptive one. We hypothesize that combining all modalities leads to better adaptation inducing better performances and a longer remaining effect.The experiment is developed in virtual reality using a HTC Vive as a head- and hand-tracker. 3D audio spatialization is obtained through Steam Audio’s non-individualized built-in HRTF. When applicable, 3D visual information is displayed directly on the Vive screen. A total of 36 participants, equally distributed in 3 groups (G1 to G3), participate in this between-subject design study.G1 is a control group receiving no training session, while the 2 other groups receive a training session of 12 minutes during 3 consecutive days. All the participants also had to perform 5 sound localization tests (no feedback, hand-pointing techniques, 2 repetitions × 33 positions, frontal space): one before the experiment, one after each training session, and the last one 1 week after the first day in order to evaluate the remaining effect. G2 receives an audio-proprioceptive training as exposed in [4]. Participants have to freely scan the space around them with their hand-held Vive controller to find an animal sound hidden around them. The controller-to-target angular distance is sonified and spatialized at the controller position. No visual information is provided. G3 receives the same task as in G2 but, a visual representation of a sphere is also displayed at the hand position during all training sessions (audio-visuo-proprioceptive situation). We measure the angular error in azimuth and elevation during localization tests. Performances are also analyzed in interaural polar coordinate system to discuss front/back and up/down confusion errors. Data from training sessions are logged (total number of found animals and detailed sequence of hand positions) to evaluate how training and vision influence scanning strategy. The experimental phase is taking place right now (10 participants have completed it for the moment) and extends until the end of April. Complete results will be available for the final version of the paper in June. References [1] Carlile, S., and Blackman, T. Relearning auditory spectral cues for locations inside and outside the visual field. J. Assoc. Res. Otolaryngol. 15, 249–263 (2014)[2] Strelnikov, K., Rosito, M., and Barrone, P. Effect of audiovisual training on monaural spatial hearing in horizontal plane. PLoS ONE 6:e18344 (2011)[3] Mendonça, C. A review on auditory space adaptation to altered head-related cues. Front. Neurosci. 8, 219 (2014)[4] Parseihian, G. & Katz, B.F.G. Rapid head-related transfer function adaptation using a virtual auditory environment. J. Acous. Soc. of America 131, 2948–2957 (2012

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    The Effects of Decreasing the Magnitude of Elevationdependent notches in HRTFs on Median Plane Localisation

    Get PDF
    Accurate sound localisation in the median plane is known to require certain spectral cues caused by the pinnae and upper body; which provides important information about a sources location in space, with pinna cues in particular producing considerably prominent peaks and troughs in the spectrum. Iida et al. (2007) suggest that the first peak and the first two pinna related spectral notches (> 5 kHz) provide enough information for the source to be accurately localised. The importance of the magnitude of spectral peaks and notches in median plane localisation is thought to depend on the relative rather than absolute distribution across the frequency spectrum (Macpherson and Sabin, 2013). The present experiment tested three subjects and explored whether a necessary magnitude of pinna related spectral notches exists, if so, what happens to the perceived source position once the magnitude is reduced. The results generally showed that for elevated sources, as pinna related notches were reduced in magnitude, the source position moved upwards in space. This seems to be due to the overall frequency spectrum becoming closer to that of sources located higher in space. For a non-elevated frontal source in the median plane on the other hand, an increase in front-back confusion occurred as a result of notch magnitude manipulation. This again is considered to be due to the altered frequency distribution mimicking that of the HRTF for the opposite direction (i.e., relative dominance of frequencies around 1 kHz over those around 4 kHz)

    The plastic ear and perceptual relearning in auditory spatial perception

    Get PDF
    The auditory system of adult listeners has been shown to accommodate to altered spectral cues to sound location which presumably provides the basis for recalibration to changes in the shape of the ear over a life time. Here we review the role of auditory and non-auditory inputs to the perception of sound location and consider a range of recent experiments looking at the role of non-auditory inputs in the process of accommodation to these altered spectral cues. A number of studies have used small ear moulds to modify the spectral cues that result in significant degradation in localization performance. Following chronic exposure (10-60 days) performance recovers to some extent and recent work has demonstrated that this occurs for both audio-visual and audio-only regions of space. This begs the questions as to the teacher signal for this remarkable functional plasticity in the adult nervous system. Following a brief review of influence of the motor state in auditory localisation, we consider the potential role of auditory-motor learning in the perceptual recalibration of the spectral cues. Several recent studies have considered how multi-modal and sensory-motor feedback might influence accommodation to altered spectral cues produced by ear moulds or through virtual auditory space stimulation using non-individualised spectral cues. The work with ear moulds demonstrates that a relatively short period of training involving sensory-motor feedback (5 – 10 days) significantly improved both the rate and extent of accommodation to altered spectral cues. This has significant implications not only for the mechanisms by which this complex sensory information is encoded to provide a spatial code but also for adaptive training to altered auditory inputs. The review concludes by considering the implications for rehabilitative training with hearing aids and cochlear prosthesis

    Anthropometric Individualization of Head-Related Transfer Functions Analysis and Modeling

    Get PDF
    Human sound localization helps to pay attention to spatially separated speakers using interaural level and time differences as well as angle-dependent monaural spectral cues. In a monophonic teleconference, for instance, it is much more difficult to distinguish between different speakers due to missing binaural cues. Spatial positioning of the speakers by means of binaural reproduction methods using head-related transfer functions (HRTFs) enhances speech comprehension. These HRTFs are influenced by the torso, head and ear geometry as they describe the propagation path of the sound from a source to the ear canal entrance. Through this geometry-dependency, the HRTF is directional and subject-dependent. To enable a sufficient reproduction, individual HRTFs should be used. However, it is tremendously difficult to measure these HRTFs. For this reason this thesis proposes approaches to adapt the HRTFs applying individual anthropometric dimensions of a user. Since localization at low frequencies is mainly influenced by the interaural time difference, two models to adapt this difference are developed and compared with existing models. Furthermore, two approaches to adapt the spectral cues at higher frequencies are studied, improved and compared. Although the localization performance with individualized HRTFs is slightly worse than with individual HRTFs, it is nevertheless still better than with non-individual HRTFs, taking into account the measurement effort

    HRTF Upsampling with a Generative Adversarial Network using a Gnomonic Equiangular Projection

    Get PDF

    HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection

    Full text link
    An individualised head-related transfer function (HRTF) is essential for creating realistic virtual reality (VR) and augmented reality (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HRTF is created from a low-resolution one. This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for convenient use with a convolutional super-resolution generative adversarial network (SRGAN). This new approach is benchmarked against two baselines: barycentric upsampling and a HRTF selection approach. Experimental results show that the proposed method outperforms both baselines in terms of log-spectral distortion (LSD) and localisation performance using perceptual models when the input HRTF is sparse.Comment: 13 pages, 9 figures, Preprint (Submitted to Transactions on Audio, Speech and Language Processing on the 24 Feb 2023

    High Frequency Reproduction in Binaural Ambisonic Rendering

    Get PDF
    Humans can localise sounds in all directions using three main auditory cues: the differences in time and level between signals arriving at the left and right eardrums (interaural time difference and interaural level difference, respectively), and the spectral characteristics of the signals due to reflections and diffractions off the body and ears. These auditory cues can be recorded for a position in space using the head-related transfer function (HRTF), and binaural synthesis at this position can then be achieved through convolution of a sound signal with the measured HRTF. However, reproducing soundfields with multiple sources, or at multiple locations, requires a highly dense set of HRTFs. Ambisonics is a spatial audio technology that decomposes a soundfield into a weighted set of directional functions, which can be utilised binaurally in order to spatialise audio at any direction using far fewer HRTFs. A limitation of low-order Ambisonic rendering is poor high frequency reproduction, which reduces the accuracy of the resulting binaural synthesis. This thesis presents novel HRTF pre-processing techniques, such that when using the augmented HRTFs in the binaural Ambisonic rendering stage, the high frequency reproduction is a closer approximation of direct HRTF rendering. These techniques include Ambisonic Diffuse-Field Equalisation, to improve spectral reproduction over all directions; Ambisonic Directional Bias Equalisation, to further improve spectral reproduction toward a specific direction; and Ambisonic Interaural Level Difference Optimisation, to improve lateralisation and interaural level difference reproduction. Evaluation of the presented techniques compares binaural Ambisonic rendering to direct HRTF rendering numerically, using perceptually motivated spectral difference calculations, auditory cue estimations and localisation prediction models, and perceptually, using listening tests assessing similarity and plausibility. Results conclude that the individual pre-processing techniques produce modest improvements to the high frequency reproduction of binaural Ambisonic rendering, and that using multiple pre-processing techniques can produce cumulative, and statistically significant, improvements
    • 

    corecore