1,971 research outputs found

    Real-time Microphone Array Processing for Sound-field Analysis and Perceptually Motivated Reproduction

    Get PDF
    This thesis details real-time implementations of sound-field analysis and perceptually motivated reproduction methods for visualisation and auralisation purposes. For the former, various methods for visualising the relative distribution of sound energy from one point in space are investigated and contrasted; including a novel reformulation of the cross-pattern coherence (CroPaC) algorithm, which integrates a new side-lobe suppression technique. Whereas for auralisation applications, listening tests were conducted to compare ambisonics reproduction with a novel headphone formulation of the directional audio coding (DirAC) method. The results indicate that the side-lobe suppressed CroPaC method offers greater spatial selectivity in reverberant conditions compared with other popular approaches, and that the new DirAC formulation yields higher perceived spatial accuracy when compared to the ambisonics method

    Parametric first-order ambisonic decoding for headphones utilising the cross-pattern coherence algorithm

    Get PDF
    International audienceRegarding the reproduction of recorded or synthesised spatial sound scenes, perhaps the most convenient and flexible approach is to employ the Ambisonics framework. The Ambisonics framework allows for linear and non-parametric storage, manipulation and reproduction of sound-fields, described using spherical harmonics up to a given order of expansion. Binaural Ambisonic reproduction can be realised by matching the spherical harmonic patterns to a set of binaural filters, in manner which is frequency-dependent, linear and time-invariant. However, the perceptual performance of this approach is largely dependent on the spatial resolution of the input format. When employing lower-order material as input, perceptual deficiencies may easily occur, such as poor localisation accuracy and colouration. This is especially problematic, as the vast majority of existing Ambisonic recordings are often made available as first-order only. The detrimental effects associated with lower-order Ambisonics reproduction have been well studied and documented. To improve upon the perceived spatial accuracy of the method, the simplest solution is to increase the spherical harmonic order at the recording stage. However, microphone arrays capable of capturing higher-order components, are generally much more expensive than first-order arrays; while more affordable options tend to offer higher-order components only at limited frequency ranges. Additionally, an increase in spherical harmonic order also requires an increase in the number of channels and storage, and in the case of transmission, more bandwidth is needed. Furthermore, it is important to note that this solution does not aid in the reproduction of existing lower-order recordings. It is for these reasons that this work focuses on alternative methods which improve the reproduction of first-order material for headphone playback. For the task of binaural sound-field reproduction, an alternative is to employ a parametric approach, which divides the sound-field decoding into analysis and synthesis stages. Unlike Ambisonic reproduction, which operates via a linear combination of the input signals, parametric approaches operate in the time-frequency domain and rely on the extraction of spatial parameters during their analysis stage. These spatial parameters are then utilised to conduct a more informed reproduction in the synthesis stage. Parametric methods are capable of reproducing sounds at a spatial resolution that far exceeds their linear and time-invariant counterparts, as they are not bounded by the resolution of the input format. For example, they can elect to directly convolve the analysed source signals with Head-Related Transfer Functions (HRTF), which correspond to their analysed directions. An infinite order of spherical harmonic components would be required to attain the same resolution with a binaural Ambisonic decoder. The most well-known and established parametric reproduction method is Directional Audio Coding (DirAC), which employs a sound-field model consisting of one plane-wave and one diffuseness estimate per time-frequency tile. These parameters are derived from the active-intensity vector, in the case of first-order input. More recent formulations allow for multiple plane-wave and diffuseness estimates via spatially-localised active-intensity vectors, using higher-order input. Another parametric method is High Angular Resolution plane-wave Expansion (HARPEX), which extracts two plane-waves per frequency and is first-order only. The Sparse-Recovery method extracts a number of plane-waves, which corresponds to up to half the number of input channels of arbitrary order. The COding and Multi-Parameterisation of Ambisonic Sound Scenes (COMPASS) method also extracts source components up to half the number of input channels, but employs an additional residual stream that encapsulates the remaining diffuse and ambient components in the scene. In this paper, a new binaural parametric decoder for first-order input is proposed. The method employs a sound-field model of one plane-wave and one diffuseness estimate per frequency, much like the DirAC model. However, the source component directions are identified via a plane-wave decomposition using a dense scanning grid and peak-finding, which is shown to be more robust than the active-intensity vector for multiple narrow-band sources. The source and ambient components per time-frequency tile are then segregated, and their relative energetic contributions are established, using the Cross-Pattern Coherence (CroPaC) spatial-filter. This approach is shown to be more robust than deriving this energy information from the active-intensity-based diffuseness estimates. A real-time audio plug-in implementation of the proposed approach is also described.A multiple-stimulus listening test was conducted to evaluate the perceived spatial accuracy and fidelity of the proposed method, alongside both first-order and third-order Ambisonics reproduction. The listening test results indicate that the proposed parametric decoder, using only first-order signals, is capable of delivering perceptual accuracy that matches or surpasses that of third-order ambisonics decoding

    DOA ESTIMATION WITH HISTOGRAM ANALYSIS OF SPATIALLY CONSTRAINED ACTIVE INTENSITY VECTORS

    Get PDF
    The active intensity vector (AIV) is a common descriptor of the sound field. In microphone array processing, AIV is commonly approximated with beamforming operations and uti- lized as a direction of arrival (DOA) estimator. However, in its original form, it provides inaccurate estimates in sound field conditions where coherent sound sources are simultane- ously active. In this work we utilize a higher order intensity- based DOA estimator on spatially-constrained regions (SCR) to overcome such limitations. We then apply 1-dimensional (1D) histogram processing on the noisy estimates for mul- tiple DOA estimation. The performance of the estimator is shown with a 7-channel microphone array, fitted on a rigid mobile-like device, in reverberant conditions and under dif- ferent signal-to-noise ratios

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Auralization of Air Vehicle Noise for Community Noise Assessment

    Get PDF
    This paper serves as an introduction to air vehicle noise auralization and documents the current state-of-the-art. Auralization of flyover noise considers the source, path, and receiver as part of a time marching simulation. Two approaches are offered; a time domain approach performs synthesis followed by propagation, while a frequency domain approach performs propagation followed by synthesis. Source noise description methods are offered for isolated and installed propulsion system and airframe noise sources for a wide range of air vehicles. Methods for synthesis of broadband, discrete tones, steady and unsteady periodic, and a periodic sources are presented, and propagation methods and receiver considerations are discussed. Auralizations applied to vehicles ranging from large transport aircraft to small unmanned aerial systems demonstrate current capabilities

    Optimization and improvements in spatial sound reproduction systems through perceptual considerations

    Full text link
    [ES] La reproducción de las propiedades espaciales del sonido es una cuestión cada vez más importante en muchas aplicaciones inmersivas emergentes. Ya sea en la reproducción de contenido audiovisual en entornos domésticos o en cines, en sistemas de videoconferencia inmersiva o en sistemas de realidad virtual o aumentada, el sonido espacial es crucial para una sensación de inmersión realista. La audición, más allá de la física del sonido, es un fenómeno perceptual influenciado por procesos cognitivos. El objetivo de esta tesis es contribuir con nuevos métodos y conocimiento a la optimización y simplificación de los sistemas de sonido espacial, desde un enfoque perceptual de la experiencia auditiva. Este trabajo trata en una primera parte algunos aspectos particulares relacionados con la reproducción espacial binaural del sonido, como son la escucha con auriculares y la personalización de la Función de Transferencia Relacionada con la Cabeza (Head Related Transfer Function - HRTF). Se ha realizado un estudio sobre la influencia de los auriculares en la percepción de la impresión espacial y la calidad, con especial atención a los efectos de la ecualización y la consiguiente distorsión no lineal. Con respecto a la individualización de la HRTF se presenta una implementación completa de un sistema de medida de HRTF y se introduce un nuevo método para la medida de HRTF en salas no anecoicas. Además, se han realizado dos experimentos diferentes y complementarios que han dado como resultado dos herramientas que pueden ser utilizadas en procesos de individualización de la HRTF, un modelo paramétrico del módulo de la HRTF y un ajuste por escalado de la Diferencia de Tiempo Interaural (Interaural Time Difference - ITD). En una segunda parte sobre reproducción con altavoces, se han evaluado distintas técnicas como la Síntesis de Campo de Ondas (Wave-Field Synthesis - WFS) o la panoramización por amplitud. Con experimentos perceptuales se han estudiado la capacidad de estos sistemas para producir sensación de distancia y la agudeza espacial con la que podemos percibir las fuentes sonoras si se dividen espectralmente y se reproducen en diferentes posiciones. Las aportaciones de esta investigación pretenden hacer más accesibles estas tecnologías al público en general, dada la demanda de experiencias y dispositivos audiovisuales que proporcionen mayor inmersión.[CA] La reproducció de les propietats espacials del so és una qüestió cada vegada més important en moltes aplicacions immersives emergents. Ja siga en la reproducció de contingut audiovisual en entorns domèstics o en cines, en sistemes de videoconferència immersius o en sistemes de realitat virtual o augmentada, el so espacial és crucial per a una sensació d'immersió realista. L'audició, més enllà de la física del so, és un fenomen perceptual influenciat per processos cognitius. L'objectiu d'aquesta tesi és contribuir a l'optimització i simplificació dels sistemes de so espacial amb nous mètodes i coneixement, des d'un criteri perceptual de l'experiència auditiva. Aquest treball tracta, en una primera part, alguns aspectes particulars relacionats amb la reproducció espacial binaural del so, com són l'audició amb auriculars i la personalització de la Funció de Transferència Relacionada amb el Cap (Head Related Transfer Function - HRTF). S'ha realitzat un estudi relacionat amb la influència dels auriculars en la percepció de la impressió espacial i la qualitat, dedicant especial atenció als efectes de l'equalització i la consegüent distorsió no lineal. Respecte a la individualització de la HRTF, es presenta una implementació completa d'un sistema de mesura de HRTF i s'inclou un nou mètode per a la mesura de HRTF en sales no anecoiques. A mès, s'han realitzat dos experiments diferents i complementaris que han donat com a resultat dues eines que poden ser utilitzades en processos d'individualització de la HRTF, un model paramètric del mòdul de la HRTF i un ajustament per escala de la Diferencià del Temps Interaural (Interaural Time Difference - ITD). En una segona part relacionada amb la reproducció amb altaveus, s'han avaluat distintes tècniques com la Síntesi de Camp d'Ones (Wave-Field Synthesis - WFS) o la panoramització per amplitud. Amb experiments perceptuals, s'ha estudiat la capacitat d'aquests sistemes per a produir una sensació de distància i l'agudesa espacial amb que podem percebre les fonts sonores, si es divideixen espectralment i es reprodueixen en diferents posicions. Les aportacions d'aquesta investigació volen fer més accessibles aquestes tecnologies al públic en general, degut a la demanda d'experiències i dispositius audiovisuals que proporcionen major immersió.[EN] The reproduction of the spatial properties of sound is an increasingly important concern in many emerging immersive applications. Whether it is the reproduction of audiovisual content in home environments or in cinemas, immersive video conferencing systems or virtual or augmented reality systems, spatial sound is crucial for a realistic sense of immersion. Hearing, beyond the physics of sound, is a perceptual phenomenon influenced by cognitive processes. The objective of this thesis is to contribute with new methods and knowledge to the optimization and simplification of spatial sound systems, from a perceptual approach to the hearing experience. This dissertation deals in a first part with some particular aspects related to the binaural spatial reproduction of sound, such as listening with headphones and the customization of the Head Related Transfer Function (HRTF). A study has been carried out on the influence of headphones on the perception of spatial impression and quality, with particular attention to the effects of equalization and subsequent non-linear distortion. With regard to the individualization of the HRTF a complete implementation of a HRTF measurement system is presented, and a new method for the measurement of HRTF in non-anechoic conditions is introduced. In addition, two different and complementary experiments have been carried out resulting in two tools that can be used in HRTF individualization processes, a parametric model of the HRTF magnitude and an Interaural Time Difference (ITD) scaling adjustment. In a second part concerning loudspeaker reproduction, different techniques such as Wave-Field Synthesis (WFS) or amplitude panning have been evaluated. With perceptual experiments it has been studied the capacity of these systems to produce a sensation of distance, and the spatial acuity with which we can perceive the sound sources if they are spectrally split and reproduced in different positions. The contributions of this research are intended to make these technologies more accessible to the general public, given the demand for audiovisual experiences and devices with increasing immersion.Gutiérrez Parera, P. (2020). Optimization and improvements in spatial sound reproduction systems through perceptual considerations [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/142696TESI

    High Frequency Reproduction in Binaural Ambisonic Rendering

    Get PDF
    Humans can localise sounds in all directions using three main auditory cues: the differences in time and level between signals arriving at the left and right eardrums (interaural time difference and interaural level difference, respectively), and the spectral characteristics of the signals due to reflections and diffractions off the body and ears. These auditory cues can be recorded for a position in space using the head-related transfer function (HRTF), and binaural synthesis at this position can then be achieved through convolution of a sound signal with the measured HRTF. However, reproducing soundfields with multiple sources, or at multiple locations, requires a highly dense set of HRTFs. Ambisonics is a spatial audio technology that decomposes a soundfield into a weighted set of directional functions, which can be utilised binaurally in order to spatialise audio at any direction using far fewer HRTFs. A limitation of low-order Ambisonic rendering is poor high frequency reproduction, which reduces the accuracy of the resulting binaural synthesis. This thesis presents novel HRTF pre-processing techniques, such that when using the augmented HRTFs in the binaural Ambisonic rendering stage, the high frequency reproduction is a closer approximation of direct HRTF rendering. These techniques include Ambisonic Diffuse-Field Equalisation, to improve spectral reproduction over all directions; Ambisonic Directional Bias Equalisation, to further improve spectral reproduction toward a specific direction; and Ambisonic Interaural Level Difference Optimisation, to improve lateralisation and interaural level difference reproduction. Evaluation of the presented techniques compares binaural Ambisonic rendering to direct HRTF rendering numerically, using perceptually motivated spectral difference calculations, auditory cue estimations and localisation prediction models, and perceptually, using listening tests assessing similarity and plausibility. Results conclude that the individual pre-processing techniques produce modest improvements to the high frequency reproduction of binaural Ambisonic rendering, and that using multiple pre-processing techniques can produce cumulative, and statistically significant, improvements

    Perceptual Evaluation of Spatial Room Impulse Response Extrapolation by Direct and Residual Subspace Decomposition

    Get PDF
    Six-degrees-of-freedom rendering of an acoustic environment can be achieved by interpolating a set of measured spatial room impulse responses (SRIRs). However, the involved measurement effort and computational expense are high. This work compares novel ways of extrapolating a single measured SRIR to a target position. The novel extrapolation techniques are based on a recently proposed subspace method that decomposes SRIRs into a direct part, comprising direct sound and salient reflections, and a residual. We evaluate extrapolations between different positions in a shoebox-shaped room in a multi-stimulus comparison test. Extrapolation using a residual SRIR and salient reflections that match the reflections at the target position is rated as perceptually most similar to the measured reference
    corecore