80 research outputs found

    Perceptual thresholds for the effects of room modes as a function of modal decay

    Get PDF
    Room modes cause audible artefacts in listening environments. Modal control approaches have emerged in scientific literature over the years and, often, their performance is measured by criteria that may be perceptually unfounded. Previous research has shown modal decay as a key perceptual factor in detecting modal effects. In this work, perceptual thresholds for the effects of modes as a function of modal decay have been measured in the region between 32Hz and 250Hz. A test methodology has been developed to include modal interaction and temporal masking from musical events, which are important aspects in recreating an ecologically valid test regime. This method has been deployed in addition to artificial test stimuli traditionally used in psychometric studies, which provide unmasked, absolute thresholds. For artificial stimuli, thresholds decrease monotonically from 0.9 seconds at 32 Hz to 0.17 seconds at 200 Hz, with a knee at 63 Hz. For music stimuli, thresholds decrease monotonically from 0.51 seconds at 63 Hz to 0.12 seconds at 250 Hz. Perceptual thresholds are shown to be dependent on frequency and to a much lesser extent on level. Results presented here define absolute and practical thresholds, which are useful as perceptually relevant optimization targets for modal control methods

    Flexible binaural resynthesis of room impulse responses for augmented reality research

    Get PDF
    International audienceA basic building block in audio for Augmented Reality (AR) is the use of virtual sound sources layered on top of any real sources present in an environment. In order to perceive these virtual sources as belonging to the natural scene it is important to match their acoustic parameters to those of a real source with the same characteristics, i.e. radiation properties, sound propagation and head-related impulse response (HRIR). However, it is still unclear to what extent these parameters need to be matched in order to generate plausible scenes in which virtual sound sources blend seamlessly with real sound sources. This contribution presents an auralization framework that allows protyping of augmented reality scenarios from measured multichannel room impulse responses to get a better understanding of the relevance of individual acoustic parameters.A well-established approach for binaural measurement and reproduction of sound scenes is based on capturing binaural room impulse responses (BRIR) using a head and torso simulator (HATS) and convolving these BRIRs dynamically with audio content according to the listener head orientation. However, such measurements are laborious and time consuming, requiring measuring the scene with the HATS in multiple orientations. Additionally, the HATS HRIR is inherently encoded in the BRIRs, making them unsuitable for personalization for different listeners. The approach presented here consists of the resynthesis and dynamic binaural reproduction of multichannel room impulse responses (RIR) using an arbitrary HRIR dataset. Using a compact microphone array, we obtained a pressure RIR and a set of auxiliary RIRs, and we applied the Spatial Decomposition Method (SDM) to estimate the direction-of-arrival (DOA) of the different sound events in the RIR. The DOA information was used to map sound pressure to different locations by means of an HRIR dataset, generating a binaural room impulse response (BRIR) for a specific orientation. By either rotating the DOA or the HRIR data set, BRIRs for any direction may be obtained. Auralizations using SDM are known to whiten the spectrum of late reverberation. Available alternatives such as time-frequency equalization were not feasible in this case, as a different time-frequency filter would be necessary for each direction, resulting in a non-homogeneous equalization of the BRIRs. Instead, the resynthesized BRIRs were decomposed into sub-bands and the decay slope of each sub-band was modified independently to match the reverberation time of the original pressure RIR. In this way we could apply the same reverberation correction factor to all BRIRs. In addition, we used a direction independent equalization to correct for timbral effects of equipment, HRIR, and signal processing. Real-time reproduction was achieved by means of a custom Max/MSP patch, in which the direct sound, early reflections and late reverberation were convolved separately to allow real-time changes in the time-energy properties of the BRIRs. The mixing time of the reproduced BRIRs is configurable and a single direction independent reverberation tail is used. To evaluate the quality of the resynthesis method in a real room, we conducted both objective and perceptual comparisons for a variety of source positions. The objective analysis was performed by comparing real measurements of a KEMAR mannequin with the resynthesis at the same receiver location using a simulated KEMAR HRIR. Typical room acoustic parameters of both real and resynthsized acoustics were found to be in good agreement. The perceptual validation consisted of a comparison of a loudspeaker and its resynthesized counterpart. Non-occluding headphones with individual equalization were used to ensure that listeners were able to simultaneously listen to the real and the virtual samples. Subjects were allowed to listen to the sounds for as long as they desired and freely switch between the real and virtual stimuli in real time. The integration of an Optitrack motion tracking system allowed us to present world-locked audio, accounting for head rotations.We present here the results of this listening test (N = 14) with three sections: discrimination, identification, and qualitative ratings. Preliminary analysis revealed that in these conditions listeners were generally able to discriminate between real and virtual sources and were able to consistently identify which of the presented sources was real and which was virtual. The qualitative analysis revealed that timbral differences are the most prominent cues for discrimination and identification, while spatial cues are well preserved. All the listeners reported good externalization of the binaural audio.Future work includes extending the presented validation to more environments, as well as implementing tools to arbitrarily modify BRIRs in the spatial, temporal, and frequency domains in order to study the perceptual requirements of room acoustics reproduction in AR

    Optimization and improvements in spatial sound reproduction systems through perceptual considerations

    Full text link
    [ES] La reproducción de las propiedades espaciales del sonido es una cuestión cada vez más importante en muchas aplicaciones inmersivas emergentes. Ya sea en la reproducción de contenido audiovisual en entornos domésticos o en cines, en sistemas de videoconferencia inmersiva o en sistemas de realidad virtual o aumentada, el sonido espacial es crucial para una sensación de inmersión realista. La audición, más allá de la física del sonido, es un fenómeno perceptual influenciado por procesos cognitivos. El objetivo de esta tesis es contribuir con nuevos métodos y conocimiento a la optimización y simplificación de los sistemas de sonido espacial, desde un enfoque perceptual de la experiencia auditiva. Este trabajo trata en una primera parte algunos aspectos particulares relacionados con la reproducción espacial binaural del sonido, como son la escucha con auriculares y la personalización de la Función de Transferencia Relacionada con la Cabeza (Head Related Transfer Function - HRTF). Se ha realizado un estudio sobre la influencia de los auriculares en la percepción de la impresión espacial y la calidad, con especial atención a los efectos de la ecualización y la consiguiente distorsión no lineal. Con respecto a la individualización de la HRTF se presenta una implementación completa de un sistema de medida de HRTF y se introduce un nuevo método para la medida de HRTF en salas no anecoicas. Además, se han realizado dos experimentos diferentes y complementarios que han dado como resultado dos herramientas que pueden ser utilizadas en procesos de individualización de la HRTF, un modelo paramétrico del módulo de la HRTF y un ajuste por escalado de la Diferencia de Tiempo Interaural (Interaural Time Difference - ITD). En una segunda parte sobre reproducción con altavoces, se han evaluado distintas técnicas como la Síntesis de Campo de Ondas (Wave-Field Synthesis - WFS) o la panoramización por amplitud. Con experimentos perceptuales se han estudiado la capacidad de estos sistemas para producir sensación de distancia y la agudeza espacial con la que podemos percibir las fuentes sonoras si se dividen espectralmente y se reproducen en diferentes posiciones. Las aportaciones de esta investigación pretenden hacer más accesibles estas tecnologías al público en general, dada la demanda de experiencias y dispositivos audiovisuales que proporcionen mayor inmersión.[CA] La reproducció de les propietats espacials del so és una qüestió cada vegada més important en moltes aplicacions immersives emergents. Ja siga en la reproducció de contingut audiovisual en entorns domèstics o en cines, en sistemes de videoconferència immersius o en sistemes de realitat virtual o augmentada, el so espacial és crucial per a una sensació d'immersió realista. L'audició, més enllà de la física del so, és un fenomen perceptual influenciat per processos cognitius. L'objectiu d'aquesta tesi és contribuir a l'optimització i simplificació dels sistemes de so espacial amb nous mètodes i coneixement, des d'un criteri perceptual de l'experiència auditiva. Aquest treball tracta, en una primera part, alguns aspectes particulars relacionats amb la reproducció espacial binaural del so, com són l'audició amb auriculars i la personalització de la Funció de Transferència Relacionada amb el Cap (Head Related Transfer Function - HRTF). S'ha realitzat un estudi relacionat amb la influència dels auriculars en la percepció de la impressió espacial i la qualitat, dedicant especial atenció als efectes de l'equalització i la consegüent distorsió no lineal. Respecte a la individualització de la HRTF, es presenta una implementació completa d'un sistema de mesura de HRTF i s'inclou un nou mètode per a la mesura de HRTF en sales no anecoiques. A mès, s'han realitzat dos experiments diferents i complementaris que han donat com a resultat dues eines que poden ser utilitzades en processos d'individualització de la HRTF, un model paramètric del mòdul de la HRTF i un ajustament per escala de la Diferencià del Temps Interaural (Interaural Time Difference - ITD). En una segona part relacionada amb la reproducció amb altaveus, s'han avaluat distintes tècniques com la Síntesi de Camp d'Ones (Wave-Field Synthesis - WFS) o la panoramització per amplitud. Amb experiments perceptuals, s'ha estudiat la capacitat d'aquests sistemes per a produir una sensació de distància i l'agudesa espacial amb que podem percebre les fonts sonores, si es divideixen espectralment i es reprodueixen en diferents posicions. Les aportacions d'aquesta investigació volen fer més accessibles aquestes tecnologies al públic en general, degut a la demanda d'experiències i dispositius audiovisuals que proporcionen major immersió.[EN] The reproduction of the spatial properties of sound is an increasingly important concern in many emerging immersive applications. Whether it is the reproduction of audiovisual content in home environments or in cinemas, immersive video conferencing systems or virtual or augmented reality systems, spatial sound is crucial for a realistic sense of immersion. Hearing, beyond the physics of sound, is a perceptual phenomenon influenced by cognitive processes. The objective of this thesis is to contribute with new methods and knowledge to the optimization and simplification of spatial sound systems, from a perceptual approach to the hearing experience. This dissertation deals in a first part with some particular aspects related to the binaural spatial reproduction of sound, such as listening with headphones and the customization of the Head Related Transfer Function (HRTF). A study has been carried out on the influence of headphones on the perception of spatial impression and quality, with particular attention to the effects of equalization and subsequent non-linear distortion. With regard to the individualization of the HRTF a complete implementation of a HRTF measurement system is presented, and a new method for the measurement of HRTF in non-anechoic conditions is introduced. In addition, two different and complementary experiments have been carried out resulting in two tools that can be used in HRTF individualization processes, a parametric model of the HRTF magnitude and an Interaural Time Difference (ITD) scaling adjustment. In a second part concerning loudspeaker reproduction, different techniques such as Wave-Field Synthesis (WFS) or amplitude panning have been evaluated. With perceptual experiments it has been studied the capacity of these systems to produce a sensation of distance, and the spatial acuity with which we can perceive the sound sources if they are spectrally split and reproduced in different positions. The contributions of this research are intended to make these technologies more accessible to the general public, given the demand for audiovisual experiences and devices with increasing immersion.Gutiérrez Parera, P. (2020). Optimization and improvements in spatial sound reproduction systems through perceptual considerations [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/142696TESI

    Perceptually Enhanced Spectral Distance Metric for Head-Related Transfer Function Quality Prediction

    Get PDF
    Given the substantial time and complexity involved in the perceptual evaluation of head-related transfer function (HRTF) processing, there is considerable value in adopting numerical assessment. Although many numerical methods have been introduced in recent years, monaural spectral distance metrics such as log-spectral distortion (LSD) remain widely used despite their significant limitations. In this study, listening tests were conducted to investigate the correlation between LSD and the auditory perception of HRTFs. By distorting the magnitude spectra of HRTFs across 32 spatial directions at six levels of LSD, the perceived spatial and timbral attributes of these distorted HRTFs were measured. The results revealed the limitations of LSD in adequately assessing HRTFs' perception performance. Based on the experimental results, a perceptually enhanced spectral distance metric for predicting HRTF quality has been developed, which processes HRTF data through spectral analysis, threshold discrimination, feature combination, binaural weighting, and perceptual outcome estimation. Compared to the currently available methods for assessing spectral differences of HRTFs, the proposed method exhibited superior performance in prediction error and correlation with actual perceptual results. The method holds potential for assessing the effectiveness of HRTF-related research, such as modeling and individualization.</p

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    Auditory navigation with a tubular acoustic model for interactive distance cues and personalized head-related transfer functions: an auditory target-reaching task

    Get PDF
    This paper presents a novel spatial auditory display that combines a virtual environment based on a Digital Waveguide Mesh (DWM) model of a small tubular shape with a binaural rendering system with personalized head-related transfer functions (HRTFs) allowing interactive selection of absolute 3D spatial cues of direction as well as egocentric distance. The tube metaphor in particular minimizes loudness changes with distance, providing mainly direct-to-reverberant and spectral cues. The proposed display was assessed through a target-reaching task where participants explore a 2D virtual map with a pen tablet and hit a sound source (the target) using auditory information only; subjective time to hit and traveled distance were analyzed for three experiments. The first one aimed at assessing the proposed HRTF selection method for personalization and dimensionality of the reaching task, with particular attention to elevation perception; we showed that most subjects performed better when they had to reach a vertically unbounded (2D) rather then an elevated (3D) target. The second experiment analyzed interaction between the tube metaphor and HRTF showing a dominant effect of DWM model over binaural rendering. In the last experiment, participants using absolute distance cues from the tube model performed comparably well to when they could rely on more robust, although relative, intensity cues. These results suggest that participants made proficient use of both binaural and reverberation cues during the task, displayed as part of a coherent 3D sound model, in spite of the known complexity of use of both such cues. HRTF personalization was beneficial for participants who were able to perceive vertical dimension of a virtual sound. Further work is needed to add full physical consistency to the proposed auditory display

    Virtual Reality Exploration with Different Head-Related Transfer Functions

    Get PDF
    One of the main challenges of spatial audio rendering in headphones is the crucial work behind the personalization of the so-called head-related transfer functions (HRTFs). HRTFs capture the listener's acoustic effects allowing a personal perception of immersion in virtual reality context. This paper aims to investigate the possible benefits of personalized HRTFs that were individually selected based on anthropometric data (pinnae shapes). Personalized audio rendering was compared to a generic HRTF and a stereo sound condition. Two studies were performed; the first study consisted of a screening test aiming to evaluate the participants' localization performance with HRTFs for a non-visible spatialized audio source. The second experiment allowed the participants to freely explore a VR scene with five audiovisual sources for two minutes each, with both HRTF and stereo conditions. A questionnaire with items for spatial audio quality, presence and attention was used for the evaluation. Results indicate that audio rendering methods made no difference on responses to the questionnaire in the two minutes of a free exploration

    Examining the Interrelation and Perceptual Influence of Head-Related Transfer Functions Distance Metrics

    Get PDF
    Head-Related Transfer Functions (HRTFs) se využívají pro popis virtuálních audioscén, pro fitování naslouchacích pomůcek a v psychoakustice. Aproximují interakci zvuku s tělem posluchače. Jelikož existují různé stupně individualizace HRTF pro posluchače roste potřeba jejich porovnání. Pro tento účel se využívají distanční metriky, které mohou popisovat různé typy rozdílů mezi HRTFs. Pět distančních metrik pro porovnání HRTFs bylo analyzováno. Cílem této práce je analyzovat interakce a vzájemné informace poskytnuté danými distančními metrikami a zredukovat jejich počet na počet vhodný pro experiment. Dále, navrhnout a provést daný experiment pro prozkoumání slyšitelného prahu (just noticable difference: JND) daných metrik a také pro získání prvních poznatků o predikci percepčních atributů pomocí distančních metrik. Dané vzájemné interakce a informace poskytnuté distančními metrikami jsou analyzovány pomocí korelační analýzy, analýzy hlavních komponent a faktorové analýzy. Tři distanční metriky (Inter-subject Spectral Difference: ISSD, Mean Squared Error: MSE and Mel-frequency Cepstral Distortion: MFCD) zachovávající nejrozmanitější informace jsou vybrány pro návrh experimentu. 3AFC paradigma je navrženo pro prozkoumání slyšitelného prahu těchto metrik. Test se také zaměřuje na zkoumání vztahu mezi distančními metrikami a subjektivními percepčními atributy (kolorizace a lokalizace). Tyto atributy jsou zkoumány za různých podmínek, což umožnilo pozorovat výsledky shodující se s předpokladem “výhody pravého ucha” (right ear advantage). Dále byly dány návrhy pro další kroky ve vývoji distančních metrik.Head-Related Transfer Functions (HRTFs) become necessary when using headphones to create virtual audio scenes. They approximate the interaction of the sound with the body of the listener. They can be also used in hearing aid fitting and psychology, where individually measured HRTFs are better suited than generic ones. For comparison of HRTF datasets, distance metrics, taking different types of errors into account, can be used. Five distance metrics were examined in the present work. The main objective of this thesis was to examine mutual information and interaction of these metrics and reduce them to a~smaller set of measures suitable for a listening experiment. A further objective was to design and conduct a listening experiment paradigm to examine just noticeable differences (JND) as well as gain first insights into various perceptual attributes using respective distance metrics for prediction. The present mutual information and interactions between different objective distance metrics were analysed and evaluated using tools such as correlation analysis, principal component analysis and factor analysis. Three distance metrics (Inter-subject Spectral Difference: ISSD, Mean Squared Error: MSE and Mel-frequency Cepstral Distortion: MFCD) providing the most diverse information were selected for the listening test. To examine an audible threshold of these metrics, a 3AFC listening test paradigm was proposed. The test also focused on finding a relation of selected distance metrics and subjective perceivable attributes (coloration and localization). These attributes were inspected in different conditions, enabling observations in alignment with the right ear advantage presumption. Suggestions were given for further steps in the metric development, based on the present work

    Spatial Analysis and Synthesis Methods: Subjective and Objective Evaluations Using Various Microphone Arrays in the Auralization of a Critical Listening Room

    Full text link
    Parametric sound field synthesis methods, such as the Spatial Decomposition Method (SDM) and Higher-Order Spatial Impulse Response Rendering (HO-SIRR), are widely used for the analysis and auralization of sound fields. This paper studies the performances of various sound field synthesis methods in the context of the auralization of a critical listening room. The influence on the perceived spatial and timbral fidelity of the following factors is considered: the rendering framework, direction of arrival (DOA) estimation method, microphone array structure, and use of a dedicated center reference microphone with SDM. Listening tests compare the synthesized sound fields to a reference binaural rendering condition. Several acoustic parameters are measured to gain insights into objective differences between methods. A high-quality pressure microphone improves the SDM framework's timbral fidelity. Additionally, SDM and HO-SIRR show similarities in spatial fidelity. Performance variation between SDM configurations is influenced by the DOA estimation method and microphone array construction. The binaural SDM (BSDM) presentations display temporal artifacts impacting sound quality.Comment: 13 pages, 6 figure
    corecore