337 research outputs found
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
We present a novel learning-based approach to estimate the
direction-of-arrival (DOA) of a sound source using a convolutional recurrent
neural network (CRNN) trained via regression on synthetic data and Cartesian
labels. We also describe an improved method to generate synthetic data to train
the neural network using state-of-the-art sound propagation algorithms that
model specular as well as diffuse reflections of sound. We compare our model
against three other CRNNs trained using different formulations of the same
problem: classification on categorical labels, and regression on spherical
coordinate labels. In practice, our model achieves up to 43% decrease in
angular error over prior methods. The use of diffuse reflection results in 34%
and 41% reduction in angular prediction errors on LOCATA and SOFA datasets,
respectively, over prior methods based on image-source methods. Our method
results in an additional 3% error reduction over prior schemes that use
classification based networks, and we use 36% fewer network parameters
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
We propose a spatial diffuseness feature for deep neural network (DNN)-based
automatic speech recognition to improve recognition accuracy in reverberant and
noisy environments. The feature is computed in real-time from multiple
microphone signals without requiring knowledge or estimation of the direction
of arrival, and represents the relative amount of diffuse noise in each time
and frequency bin. It is shown that using the diffuseness feature as an
additional input to a DNN-based acoustic model leads to a reduced word error
rate for the REVERB challenge corpus, both compared to logmelspec features
extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201
Spatial dissection of a soundfield using spherical harmonic decomposition
A real-world soundfield is often contributed by multiple desired and undesired sound sources. The performance of many acoustic systems such as automatic speech recognition, audio surveillance, and teleconference relies on its ability to extract the desired sound components in such a mixed environment. The existing solutions to the above problem are constrained by various fundamental limitations and require to enforce different priors depending on the acoustic condition such as reverberation and spatial distribution of sound sources. With the growing emphasis and integration of audio applications in diverse technologies such as smart home and virtual reality appliances, it is imperative to advance the source separation technology in order to overcome the limitations of the traditional approaches.
To that end, we exploit the harmonic decomposition model to dissect a mixed soundfield into its underlying desired and undesired components based on source and signal characteristics. By analysing the spatial projection of a soundfield, we achieve multiple outcomes such as (i) soundfield separation with respect to distinct source regions, (ii) source separation in a mixed soundfield using modal coherence model, and (iii) direction of arrival (DOA) estimation of multiple overlapping sound sources through pattern recognition of the modal coherence of a soundfield.
We first employ an array of higher order microphones for soundfield separation in order to reduce hardware requirement and implementation complexity. Subsequently, we develop novel mathematical models for modal coherence of noisy and reverberant soundfields that facilitate convenient ways for estimating DOA and power spectral densities leading to robust source separation algorithms. The modal domain approach to the soundfield/source separation allows us to circumvent several practical limitations of the existing techniques and enhance the performance and robustness of the system. The proposed methods are presented with several practical applications and performance evaluations using simulated and real-life dataset
- …