76 research outputs found

    Real-time Microphone Array Processing for Sound-field Analysis and Perceptually Motivated Reproduction

    Get PDF
    This thesis details real-time implementations of sound-field analysis and perceptually motivated reproduction methods for visualisation and auralisation purposes. For the former, various methods for visualising the relative distribution of sound energy from one point in space are investigated and contrasted; including a novel reformulation of the cross-pattern coherence (CroPaC) algorithm, which integrates a new side-lobe suppression technique. Whereas for auralisation applications, listening tests were conducted to compare ambisonics reproduction with a novel headphone formulation of the directional audio coding (DirAC) method. The results indicate that the side-lobe suppressed CroPaC method offers greater spatial selectivity in reverberant conditions compared with other popular approaches, and that the new DirAC formulation yields higher perceived spatial accuracy when compared to the ambisonics method

    Parametric first-order ambisonic decoding for headphones utilising the cross-pattern coherence algorithm

    Get PDF
    International audienceRegarding the reproduction of recorded or synthesised spatial sound scenes, perhaps the most convenient and flexible approach is to employ the Ambisonics framework. The Ambisonics framework allows for linear and non-parametric storage, manipulation and reproduction of sound-fields, described using spherical harmonics up to a given order of expansion. Binaural Ambisonic reproduction can be realised by matching the spherical harmonic patterns to a set of binaural filters, in manner which is frequency-dependent, linear and time-invariant. However, the perceptual performance of this approach is largely dependent on the spatial resolution of the input format. When employing lower-order material as input, perceptual deficiencies may easily occur, such as poor localisation accuracy and colouration. This is especially problematic, as the vast majority of existing Ambisonic recordings are often made available as first-order only. The detrimental effects associated with lower-order Ambisonics reproduction have been well studied and documented. To improve upon the perceived spatial accuracy of the method, the simplest solution is to increase the spherical harmonic order at the recording stage. However, microphone arrays capable of capturing higher-order components, are generally much more expensive than first-order arrays; while more affordable options tend to offer higher-order components only at limited frequency ranges. Additionally, an increase in spherical harmonic order also requires an increase in the number of channels and storage, and in the case of transmission, more bandwidth is needed. Furthermore, it is important to note that this solution does not aid in the reproduction of existing lower-order recordings. It is for these reasons that this work focuses on alternative methods which improve the reproduction of first-order material for headphone playback. For the task of binaural sound-field reproduction, an alternative is to employ a parametric approach, which divides the sound-field decoding into analysis and synthesis stages. Unlike Ambisonic reproduction, which operates via a linear combination of the input signals, parametric approaches operate in the time-frequency domain and rely on the extraction of spatial parameters during their analysis stage. These spatial parameters are then utilised to conduct a more informed reproduction in the synthesis stage. Parametric methods are capable of reproducing sounds at a spatial resolution that far exceeds their linear and time-invariant counterparts, as they are not bounded by the resolution of the input format. For example, they can elect to directly convolve the analysed source signals with Head-Related Transfer Functions (HRTF), which correspond to their analysed directions. An infinite order of spherical harmonic components would be required to attain the same resolution with a binaural Ambisonic decoder. The most well-known and established parametric reproduction method is Directional Audio Coding (DirAC), which employs a sound-field model consisting of one plane-wave and one diffuseness estimate per time-frequency tile. These parameters are derived from the active-intensity vector, in the case of first-order input. More recent formulations allow for multiple plane-wave and diffuseness estimates via spatially-localised active-intensity vectors, using higher-order input. Another parametric method is High Angular Resolution plane-wave Expansion (HARPEX), which extracts two plane-waves per frequency and is first-order only. The Sparse-Recovery method extracts a number of plane-waves, which corresponds to up to half the number of input channels of arbitrary order. The COding and Multi-Parameterisation of Ambisonic Sound Scenes (COMPASS) method also extracts source components up to half the number of input channels, but employs an additional residual stream that encapsulates the remaining diffuse and ambient components in the scene. In this paper, a new binaural parametric decoder for first-order input is proposed. The method employs a sound-field model of one plane-wave and one diffuseness estimate per frequency, much like the DirAC model. However, the source component directions are identified via a plane-wave decomposition using a dense scanning grid and peak-finding, which is shown to be more robust than the active-intensity vector for multiple narrow-band sources. The source and ambient components per time-frequency tile are then segregated, and their relative energetic contributions are established, using the Cross-Pattern Coherence (CroPaC) spatial-filter. This approach is shown to be more robust than deriving this energy information from the active-intensity-based diffuseness estimates. A real-time audio plug-in implementation of the proposed approach is also described.A multiple-stimulus listening test was conducted to evaluate the perceived spatial accuracy and fidelity of the proposed method, alongside both first-order and third-order Ambisonics reproduction. The listening test results indicate that the proposed parametric decoder, using only first-order signals, is capable of delivering perceptual accuracy that matches or surpasses that of third-order ambisonics decoding

    Äänikentän tila-analyysi parametrista tilaäänentoistoa varten käyttäen harvoja mikrofoniasetelmia

    Get PDF
    In spatial audio capturing the aim is to store information about the sound field so that the sound field can be reproduced without a perceptual difference to the original. The need for this is in applications like virtual reality and teleconferencing. Traditionally the sound field has been captured with a B-format microphone, but it is not always a feasible solution due to size and cost constraints. Alternatively, also arrays of omnidirectional microphones can be utilized and they are often used in devices like mobile phones. If the microphone array is sparse, i.e., the microphone spacings are relatively large, the analysis of the sound Direction of Arrival (DoA) becomes ambiguous in higher frequencies. This is due to spatial aliasing, which is a common problem in narrowband DoA estimation. In this thesis the spatial aliasing problem was examined and its effect on DoA estimation and spatial sound synthesis with Directional Audio Coding (DirAC) was studied. The aim was to find methods for unambiguous narrowband DoA estimation. The current State of the Art methods can remove aliased estimates but are not capable of estimating the DoA with the optimal Time-Frequency resolution. In this thesis similar results were obtained with parameter extrapolation when only a single broadband source exists. The main contribution of this thesis was the development of a correlation-based method. The developed method utilizes pre-known, array-specific information on aliasing in each DoA and frequency. The correlation-based method was tested and found to be the best option to overcome the problem of spatial aliasing. This method was able to resolve spatial aliasing even with multiple sources or when the source’s frequency content is completely above the spatial aliasing frequency. In a listening test it was found that the correlation-based method could provide a major improvement to the DirAC synthesized spatial image quality when compared to an aliased estimator.Tilaäänen tallentamisessa tavoitteena on tallentaa äänikentän ominaisuudet siten, että äänikenttä pystytään jälkikäteen syntetisoimaan ilman kuuloaistilla havaittavaa eroa alkuperäiseen. Tarve tälle löytyy erilaisista sovelluksista, kuten virtuaalitodellisuudesta ja telekonferensseista. Perinteisesti äänikentän ominaisuuksia on tallennettu B-formaatti mikrofonilla, jonka käyttö ei kuitenkaan aina ole koko- ja kustannussyistä mahdollista. Vaihtoehtoisesti voidaan käyttää myös pallokuvioisista mikrofoneista koostuvia mikrofoniasetelmia. Mikäli mikrofonien väliset etäisyydet ovat liian suuria, eli asetelma on harva, tulee äänen saapumissuunnan selvittämisestä epäselvää korkeammilla taajuuksilla. Tämä johtuu ilmiöstä nimeltä tilallinen laskostuminen. Tämän diplomityön tarkoituksena oli tutkia tilallisen laskostumisen ilmiötä, sen vaikutusta saapumissuunnan arviointiin sekä tilaäänisynteesiin Directional Audio Coding (DirAC) -menetelmällä. Lisäksi tutkittiin menetelmiä, joiden avulla äänen saapumissuunta voitaisiin selvittää oikein myös tilallisen laskostumisen läsnä ollessa. Työssä havaittiin, että nykyiset ratkaisut laskostumisongelmaan eivät kykene tuottamaan oikeita suunta-arvioita optimaalisella aikataajuusresoluutiolla. Tässä työssä samantapaisia tuloksia saatiin laajakaistaisen äänilähteen tapauksessa ekstrapoloimalla suunta-arvioita laskostumisen rajataajuuden alapuolelta. Työn pääosuus oli kehittää korrelaatioon perustuva saapumissuunnan arviointimenetelmä, joka kykenee tuottamaan luotettavia arvioita rajataajuuden yläpuolella ja useamman äänilähteen ympäristöissä. Kyseinen menetelmä hyödyntää mikrofoniasetelmalle ominaista, saapumissuunnasta ja taajuudesta riippuvaista laskostumiskuviota. Kuuntelukokeessa havaittiin, että korrelaatioon perustuva menetelmä voi tuoda huomattavan parannuksen syntetisoidun tilaäänikuvan laatuun verrattuna synteesiin laskostuneilla suunta-arvioilla

    Parametric spatial audio processing utilising compact microphone arrays

    Get PDF
    This dissertation focuses on the development of novel parametric spatial audio techniques using compact microphone arrays. Compact arrays are of special interest since they can be adapted to fit in portable devices, opening the possibility of exploiting the potential of immersive spatial audio algorithms in our daily lives. The techniques developed in this thesis consider the use of signal processing algorithms adapted for human listeners, thus exploiting the capabilities and limitations of human spatial hearing. The findings of this research are in the following three areas of spatial audio processing: directional filtering, spatial audio reproduction, and direction of arrival estimation.  In directional filtering, two novel algorithms have been developed based on the cross-pattern coherence (CroPaC). The method essentially exploits the directional response of two different types of beamformers by using their cross-spectrum to estimate a soft masker. The soft masker provides a probability-like parameter that indicates whether there is sound present in specific directions. It is then used as a post-filter to provide further suppression of directionally distributed noise at the output of a beamformer. The performance of these algorithms represent a significant improvement over previous state-of-the-art methods.  In parametric spatial audio reproduction, an algorithm is developed for multi-channel loudspeaker and headphone rendering. Current limitations in spatial audio reproduction are related to high inter-channel coherence between the channels, which is common in signal-independent systems, or time-frequency artefacts in parametric systems. The developed algorithm focuses on solving these limitations by utilising two sets of beamformers. The first set of beamformers, namely analysis beamformers, is used to estimate a set of perceptually-relevant sound-field parameters, such as the separate channel energies, inter-channel time differences and inter-channel coherences of the target-output-setup signals. The directionality of the analysis beamformers is defined so that it follows that of typical loudspeaker panning functions and, for headphone reproduction, that of the head-related transfer functions (HRTFs). The directionality of the second set of high audio quality beamformers is then enhanced with the parametric information derived from the analysis beamformers. Listening tests confirm the perceptual benefit of such type of processing. In direction of arrival (DOA) estimation, histogram analysis of beamforming and active intensity based DOA estimators has been proposed. Numerical simulations and experiments with prototype and commercial microphone arrays show that the accuracy of DOA estimation is improved

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Sound field planarity characterized by superdirective beamforming

    Full text link
    The ability to replicate a plane wave represents an essential element of spatial sound field reproduction. In sound field synthesis, the desired field is often formulated as a plane wave and the error minimized; for other sound field control methods, the energy density or energy ratio is maximized. In all cases and further to the reproduction error, it is informative to characterize how planar the resultant sound field is. This paper presents a method for quantifying a region's acoustic planarity by superdirective beamforming with an array of microphones, which analyzes the azimuthal distribution of impinging waves and hence derives the planarity. Estimates are obtained for a variety of simulated sound field types, tested with respect to array orientation, wavenumber, and number of microphones. A range of microphone configurations is examined. Results are compared with delay-and-sum beamforming, which is equivalent to spatial Fourier decomposition. The superdirective beamformer provides better characterization of sound fields, and is effective with a moderate number of omni-directional microphones over a broad frequency range. Practical investigation of planarity estimation in real sound fields is needed to demonstrate its validity as a physical sound field evaluation measure. © 2013 Acoustical Society of America

    Unbiased coherent-to-diffuse ratio estimation for dereverberation

    Full text link
    We investigate the estimation of the time- and frequency-dependent coherent-to-diffuse ratio (CDR) from the measured spatial coherence between two omnidirectional microphones. We illustrate the relationship between several known CDR es-timators using a geometric interpretation in the complex plane, discuss the problem of estimator bias, and propose unbiased versions of the estimators. Furthermore, we show that knowl-edge of either the direction of arrival (DOA) of the target source or the coherence of the noise field is sufficient for an unbiased CDR estimation. Finally, we apply the CDR estimators to the problem of dereverberation, using automatic speech recognition word error rate as objective performance measure

    Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses

    Get PDF
    Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components. The proposed method is based on the generalized singular value decomposition and interprets the residual as noise that is to be separated from the other components of the reverberation. Large generalized singular values are attributed to the direct part, which is then obtained as a low-rank approximation of the SRIR. By advancing from the end of the SRIR toward the beginning while iteratively updating the residual estimate, the method adapts to spatio-temporal variations of the residual. The method is evaluated using a spatio-spectral error measure and simulated SRIRs of different rooms, microphone arrays, and ratios of direct sound to residual energy. The proposed method creates lower errors than existing approaches in all tested scenarios, including a scenario with two simultaneous reflections. A case study with measured SRIRs shows the applicability of the method under real-world acoustic conditions. A reference implementation is provided

    Environmental sound monitoring using machine listening and spatial audio

    Get PDF
    This thesis investigates how the technologies of machine listening and spatial audio can be utilised and combined to develop new methods of environmental sound monitoring for the soundscape approach. The majority of prior work on the soundscape approach has necessitated time-consuming, costly, and non-repeatable subjective listening tests, and one of the aims of this work was to produce robust systems reducing this need. The EigenScape database of Ambisonic acoustic scene recordings, containing eight classes encompassing a variety of urban and natural locations, is presented and used as a basis for this research. Using this data it was found that it is possible to classify acoustic scenes with a high level of accuracy based solely on features describing the spatial distribution of sounds within them. Further improvements were made when combining spatial and spectral features for a more complete characterisation of each scene class. A system is also presented using spherical harmonic beamforming and unsupervised clustering to estimate the onsets, offsets, and direction-of-arrival of sounds in synthesised scenes with up to three overlapping sources. It is shown that performance is enhanced using higher-order Ambisonics, but whilst there is a large increase in performance between first and second-order, increases at subsequent orders are more modest. Finally, a mobile application developed using the EigenScape data is presented, and is shown to produce plausible estimates for the relative prevalence of natural and mechanical sound in the various locations at which it was tested
    corecore