Search CORE

556 research outputs found

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

Author: Hogan Kevin
Kanu John D.
Manocha Dinesh
Tang Zhenyu
Publication venue: 'International Speech Communication Association'
Publication date: 09/07/2019
Field of study

We present a novel learning-based approach to estimate the direction-of-arrival (DOA) of a sound source using a convolutional recurrent neural network (CRNN) trained via regression on synthetic data and Cartesian labels. We also describe an improved method to generate synthetic data to train the neural network using state-of-the-art sound propagation algorithms that model specular as well as diffuse reflections of sound. We compare our model against three other CRNNs trained using different formulations of the same problem: classification on categorical labels, and regression on spherical coordinate labels. In practice, our model achieves up to 43% decrease in angular error over prior methods. The use of diffuse reflection results in 34% and 41% reduction in angular prediction errors on LOCATA and SOFA datasets, respectively, over prior methods based on image-source methods. Our method results in an additional 3% error reduction over prior schemes that use classification based networks, and we use 36% fewer network parameters

arXiv.org e-Print Archive

Crossref

PSD Estimation of Multiple Sound Sources in a Reverberant Room Using a Spherical Microphone Array

Author: Abhayapala Thushara D.
Fahim Abdullah
Samarasinghe Prasanga N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2017
Field of study

We propose an efficient method to estimate source power spectral densities (PSDs) in a multi-source reverberant environment using a spherical microphone array. The proposed method utilizes the spatial correlation between the spherical harmonics (SH) coefficients of a sound field to estimate source PSDs. The use of the spatial cross-correlation of the SH coefficients allows us to employ the method in an environment with a higher number of sources compared to conventional methods. Furthermore, the orthogonality property of the SH basis functions saves the effort of designing specific beampatterns of a conventional beamformer-based method. We evaluate the performance of the algorithm with different number of sources in practical reverberant and non-reverberant rooms. We also demonstrate an application of the method by separating source signals using a conventional beamformer and a Wiener post-filter designed from the estimated PSDs.Comment: Accepted for WASPAA 201

arXiv.org e-Print Archive

Crossref