378 research outputs found

    Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model

    Get PDF
    The direct-to-reverberant ratio (DRR), which describes the energy ratio between the direct and reverberant component of a soundfield, is an important parameter in many audio applications. In this paper, we present a multichannel algorithm, which utilizes the blind recordings of a spherical microphone array to estimate the DRR of interest. The algorithm is developed based on a spatial correlation model formulated in the spherical harmonics domain. This model expresses the cross correlation matrix of the recorded soundfield coefficients in terms of two spatial correlation matrices, one for direct sound and the other for reverberation. While the direct path arrives from the source, the reverberant path is considered to be a nondiffuse soundfield with varying directional gains. The direct and reverberant sound energies are estimated from the aforementioned spatial correlation model, which then leads to the DRR estimation. The practical feasibility of the proposed algorithm was evaluated using the speech corpus of the acoustic characterization of environments challenge. The experimental results revealed that the proposed method was able to effectively estimate the DRR of a large collection of reverberant speech recordings including various environmental noise types, room types and speakers.DP14010341

    PSD Estimation of Multiple Sound Sources in a Reverberant Room Using a Spherical Microphone Array

    Full text link
    We propose an efficient method to estimate source power spectral densities (PSDs) in a multi-source reverberant environment using a spherical microphone array. The proposed method utilizes the spatial correlation between the spherical harmonics (SH) coefficients of a sound field to estimate source PSDs. The use of the spatial cross-correlation of the SH coefficients allows us to employ the method in an environment with a higher number of sources compared to conventional methods. Furthermore, the orthogonality property of the SH basis functions saves the effort of designing specific beampatterns of a conventional beamformer-based method. We evaluate the performance of the algorithm with different number of sources in practical reverberant and non-reverberant rooms. We also demonstrate an application of the method by separating source signals using a conventional beamformer and a Wiener post-filter designed from the estimated PSDs.Comment: Accepted for WASPAA 201

    PSD Estimation and Source Separation in a Noisy Reverberant Environment using a Spherical Microphone Array

    Get PDF
    In this paper, we propose an efficient technique for estimating individual power spectral density (PSD) components, i.e., PSD of each desired sound source as well as of noise and reverberation, in a multi-source reverberant sound scene with coherent background noise. We formulate the problem in the spherical harmonics domain to take the advantage of the inherent orthogonality of the spherical harmonics basis functions and extract the PSD components from the cross-correlation between the different sound field modes. We also investigate an implementation issue that occurs at the nulls of the Bessel functions and offer an engineering solution. The performance evaluation takes place in a practical environment with a commercial microphone array in order to measure the robustness of the proposed algorithm against all the deviations incurred in practice. We also exhibit an application of the proposed PSD estimator through a source septation algorithm and compare the performance with a contemporary method in terms of different objective measures

    Direct-to-Reverberant Energy Ratio Estimation Using a First-Order Microphone

    Get PDF
    The direct-to-reverberant ratio (DRR) is an important characterization of a reverberant environment. This paper presents a novel blind DRR estimation method based on the coherence function between the sound pressure and particle velocity at a point. First, a general expression of coherence function and DRR is derived in the spherical harmonic domain, without imposing assumptions on the reverberation. In this paper, DRR is expressed in terms of the coherence function as well as two parameters that are related to statistical characteristics of the reverberant environment. Then, a method to estimate the values of these two parameters using a microphone system capable of capturing first-order spherical harmonics is proposed, under three assumptions which are more realistic than the diffuse field model. Furthermore, a theoretical analysis on the use of plane wave model for direct path signal and its effect on DRR estimation is presented, and a rule of thumb is provided for determining whether the point source model should be used for the direct path signal. Finally, the ACE challenge dataset is used to validate the proposed DRR estimation method. The results show that the average full band estimation error is within 2 dB, with no clear trend of bias.DP14010341

    Spatial dissection of a soundfield using spherical harmonic decomposition

    Get PDF
    A real-world soundfield is often contributed by multiple desired and undesired sound sources. The performance of many acoustic systems such as automatic speech recognition, audio surveillance, and teleconference relies on its ability to extract the desired sound components in such a mixed environment. The existing solutions to the above problem are constrained by various fundamental limitations and require to enforce different priors depending on the acoustic condition such as reverberation and spatial distribution of sound sources. With the growing emphasis and integration of audio applications in diverse technologies such as smart home and virtual reality appliances, it is imperative to advance the source separation technology in order to overcome the limitations of the traditional approaches. To that end, we exploit the harmonic decomposition model to dissect a mixed soundfield into its underlying desired and undesired components based on source and signal characteristics. By analysing the spatial projection of a soundfield, we achieve multiple outcomes such as (i) soundfield separation with respect to distinct source regions, (ii) source separation in a mixed soundfield using modal coherence model, and (iii) direction of arrival (DOA) estimation of multiple overlapping sound sources through pattern recognition of the modal coherence of a soundfield. We first employ an array of higher order microphones for soundfield separation in order to reduce hardware requirement and implementation complexity. Subsequently, we develop novel mathematical models for modal coherence of noisy and reverberant soundfields that facilitate convenient ways for estimating DOA and power spectral densities leading to robust source separation algorithms. The modal domain approach to the soundfield/source separation allows us to circumvent several practical limitations of the existing techniques and enhance the performance and robustness of the system. The proposed methods are presented with several practical applications and performance evaluations using simulated and real-life dataset

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

    Room geometry blind inference based on the localization of real sound source and first order reflections

    Full text link
    The conventional room geometry blind inference techniques with acoustic signals are conducted based on the prior knowledge of the environment, such as the room impulse response (RIR) or the sound source position, which will limit its application under unknown scenarios. To solve this problem, we have proposed a room geometry reconstruction method in this paper by using the geometric relation between the direct signal and first-order reflections. In addition to the information of the compact microphone array itself, this method does not need any precognition of the environmental parameters. Besides, the learning-based DNN models are designed and used to improve the accuracy and integrity of the localization results of the direct source and first-order reflections. The direction of arrival (DOA) and time difference of arrival (TDOA) information of the direct and reflected signals are firstly estimated using the proposed DCNN and TD-CNN models, which have higher sensitivity and accuracy than the conventional methods. Then the position of the sound source is inferred by integrating the DOA, TDOA and array height using the proposed DNN model. After that, the positions of image sources and corresponding boundaries are derived based on the geometric relation. Experimental results of both simulations and real measurements verify the effectiveness and accuracy of the proposed techniques compared with the conventional methods under different reverberant environments

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363
    • …
    corecore