945 research outputs found
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
We present a novel learning-based approach to estimate the
direction-of-arrival (DOA) of a sound source using a convolutional recurrent
neural network (CRNN) trained via regression on synthetic data and Cartesian
labels. We also describe an improved method to generate synthetic data to train
the neural network using state-of-the-art sound propagation algorithms that
model specular as well as diffuse reflections of sound. We compare our model
against three other CRNNs trained using different formulations of the same
problem: classification on categorical labels, and regression on spherical
coordinate labels. In practice, our model achieves up to 43% decrease in
angular error over prior methods. The use of diffuse reflection results in 34%
and 41% reduction in angular prediction errors on LOCATA and SOFA datasets,
respectively, over prior methods based on image-source methods. Our method
results in an additional 3% error reduction over prior schemes that use
classification based networks, and we use 36% fewer network parameters
Semi-Supervised Sound Source Localization Based on Manifold Regularization
Conventional speaker localization algorithms, based merely on the received
microphone signals, are often sensitive to adverse conditions, such as: high
reverberation or low signal to noise ratio (SNR). In some scenarios, e.g. in
meeting rooms or cars, it can be assumed that the source position is confined
to a predefined area, and the acoustic parameters of the environment are
approximately fixed. Such scenarios give rise to the assumption that the
acoustic samples from the region of interest have a distinct geometrical
structure. In this paper, we show that the high dimensional acoustic samples
indeed lie on a low dimensional manifold and can be embedded into a low
dimensional space. Motivated by this result, we propose a semi-supervised
source localization algorithm which recovers the inverse mapping between the
acoustic samples and their corresponding locations. The idea is to use an
optimization framework based on manifold regularization, that involves
smoothness constraints of possible solutions with respect to the manifold. The
proposed algorithm, termed Manifold Regularization for Localization (MRL), is
implemented in an adaptive manner. The initialization is conducted with only
few labelled samples attached with their respective source locations, and then
the system is gradually adapted as new unlabelled samples (with unknown source
locations) are received. Experimental results show superior localization
performance when compared with a recently presented algorithm based on a
manifold learning approach and with the generalized cross-correlation (GCC)
algorithm as a baseline
Automotive three-microphone voice activity detector and noise-canceller
This paper addresses issues in improving hands-free speech recognition performance in car
environments. A three-microphone array has been used to form a beamformer with leastmean
squares (LMS) to improve Signal to Noise Ratio (SNR). A three-microphone array
has been paralleled to a Voice Activity Detection (VAD). The VAD uses time-delay
estimation together with magnitude-squared coherence (MSC)
- …