127 research outputs found
Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates
This paper presents a novel approach for indoor acoustic source localization
using microphone arrays and based on a Convolutional Neural Network (CNN). The
proposed solution is, to the best of our knowledge, the first published work in
which the CNN is designed to directly estimate the three dimensional position
of an acoustic source, using the raw audio signal as the input information
avoiding the use of hand crafted audio features. Given the limited amount of
available localization data, we propose in this paper a training strategy based
on two steps. We first train our network using semi-synthetic data, generated
from close talk speech recordings, and where we simulate the time delays and
distortion suffered in the signal that propagates from the source to the array
of microphones. We then fine tune this network using a small amount of real
data. Our experimental results show that this strategy is able to produce
networks that significantly improve existing localization methods based on
\textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN
method exhibits better resistance against varying gender of the speaker and
different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table
Low-Complexity Steered Response Power Mapping based on Nyquist-Shannon Sampling
The steered response power (SRP) approach to acoustic source localization
computes a map of the acoustic scene from the frequency-weighted output power
of a beamformer steered towards a set of candidate locations. Equivalently, SRP
may be expressed in terms of time-domain generalized cross-correlations (GCCs)
at lags equal to the candidate locations' time-differences of arrival (TDOAs).
Due to the dense grid of candidate locations, each of which requires inverse
Fourier transform (IFT) evaluations, conventional SRP exhibits a high
computational complexity. In this paper, we propose a low-complexity SRP
approach based on Nyquist-Shannon sampling. Noting that on the one hand the
range of possible TDOAs is physically bounded, while on the other hand the GCCs
are bandlimited, we critically sample the GCCs around their TDOA interval and
approximate the SRP map by interpolation. In usual setups, the number of sample
points can be orders of magnitude less than the number of candidate locations
and frequency bins, yielding a significant reduction of IFT computations at a
limited interpolation cost. Simulations comparing the proposed approximation
with conventional SRP indicate low approximation errors and equal localization
performance. MATLAB and Python implementations are available online
Exploiting a geometrically sampled grid in the steered response power algorithm for localization improvement
The steered response power phase transform (SRP-PHAT) is a beamformer method very attractive in acoustic localization applications due to its robustness in reverberant environments. This paper presents a spatial grid design procedure, called the geometrically sampled grid (GSG), which aims at computing the spatial grid by taking into account the discrete sampling of time difference of arrival (TDOA) functions and the desired spatial resolution. A SRP-PHAT localization algorithm based on the GSG method is also introduced. The proposed method exploits the intersections of the discrete hyperboloids representing the TDOA information domain of the sensor array, and projects the whole TDOA information on the space search grid. The GSG method thus allows one to design the sampled spatial grid which represents the best search grid for a given sensor array, it allows one to perform a sensitivity analysis of the array and to characterize its spatial localization accuracy, and it may assist the system designer in the reconfiguration of the array. Experimental results using both simulated data and real recordings show that the localization accuracy is substantially improved both for high and for low spatial resolution, and that it is closely related to the proposed power response sensitivity measure
Sensitivity-based region selection in the steered response power algorithm
The steered response power (SRP) algorithm is a well-studied method for acoustic source localization using a microphone array. Recently, different improvements based on the accumulation of all time difference of arrival (TDOA) information have been proposed in order to achieve spatial resolution scalability of the grid search map and reduce the computational cost. However, the TDOA information distribution is not uniform with respect to the search grid, as it depends on the geometry of the array, the sampling frequency, and the spatial resolution. In this paper, we propose a sensitivity-based region selection SRP (R-SRP) algorithm that exploits the nonuniform TDOA information accumulation on the search grid. First, high and low sensitivity regions of the search space are identified using an array sensitivity estimation procedure; then, through the formulation of a peak-to-peak ratio (PPR) measuring the peak energy distribution in the two regions, the source is classified to belong to a high or to a low sensitivity region, and this information is used to design an ad hoc weighting function of the acoustic power map on which the grid search is performed. Simulated and real experiments show that the proposed method improves the localization performance in comparison to the state-of-the-art
Exploiting joint sparsity for far-field microphone array sound source localization
Abstract(#br)The presence of far-field noise and reverberation poses significant challenges to the conventional microphone array sound source localization approaches. Consider the sparsity contained in the source direction vector, source localization can be transformed into a compressed sensing (CS) problem by constructing the redundancy frequency domain room impulse response (RIR) matrix as CS measurement matrix. In this paper a new sparse recovery model is derived by decomposing the RIR into delay response term and reverberation response term to facilitate reverberation mitigation via frequency domain accumulation. Furthermore, as the source direction vector of adjacent speech frames tends to exhibit similar sparse pattern, namely, the direction of source can be assumed to keep static within this short period, thus there exists substantial correlation of spatial sparsity among adjacent speech frames. In this paper, under the framework of distributed compressed sensing (DCS), multiple source direction vectors are treated as sparse solutions with common spatial support to derive a joint sparse recovery algorithm for far-field source localization. The experimental results obtained in the context of a uniform circle array (UCA) show that the proposed algorithm is capable of yielding better estimation performance compared with the traditional algorithms
- …