2,033 research outputs found
Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization
Conventional approaches to sound source localization require at least two
microphones. It is known, however, that people with unilateral hearing loss can
also localize sounds. Monaural localization is possible thanks to the
scattering by the head, though it hinges on learning the spectra of the various
sources. We take inspiration from this human ability to propose algorithms for
accurate sound source localization using a single microphone embedded in an
arbitrary scattering structure. The structure modifies the frequency response
of the microphone in a direction-dependent way giving each direction a
signature. While knowing those signatures is sufficient to localize sources of
white noise, localizing speech is much more challenging: it is an ill-posed
inverse problem which we regularize by prior knowledge in the form of learned
non-negative dictionaries. We demonstrate a monaural speech localization
algorithm based on non-negative matrix factorization that does not depend on
sophisticated, designed scatterers. In fact, we show experimental results with
ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures
we can accurately localize arbitrary speakers; that is, we do not need to learn
the dictionary for the particular speaker to be localized. Finally, we discuss
multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM
Transactions on Audio, Speech, and Language processing (TASLP
Euclidean Distance Matrices: Essential Theory, Algorithms and Applications
Euclidean distance matrices (EDM) are matrices of squared distances between
points. The definition is deceivingly simple: thanks to their many useful
properties they have found applications in psychometrics, crystallography,
machine learning, wireless sensor networks, acoustics, and more. Despite the
usefulness of EDMs, they seem to be insufficiently known in the signal
processing community. Our goal is to rectify this mishap in a concise tutorial.
We review the fundamental properties of EDMs, such as rank or
(non)definiteness. We show how various EDM properties can be used to design
algorithms for completing and denoising distance data. Along the way, we
demonstrate applications to microphone position calibration, ultrasound
tomography, room reconstruction from echoes and phase retrieval. By spelling
out the essential algorithms, we hope to fast-track the readers in applying
EDMs to their own problems. Matlab code for all the described algorithms, and
to generate the figures in the paper, is available online. Finally, we suggest
directions for further research.Comment: - 17 pages, 12 figures, to appear in IEEE Signal Processing Magazine
- change of title in the last revisio
Source localization and denoising: a perspective from the TDOA space
In this manuscript, we formulate the problem of denoising Time Differences of
Arrival (TDOAs) in the TDOA space, i.e. the Euclidean space spanned by TDOA
measurements. The method consists of pre-processing the TDOAs with the purpose
of reducing the measurement noise. The complete set of TDOAs (i.e., TDOAs
computed at all microphone pairs) is known to form a redundant set, which lies
on a linear subspace in the TDOA space. Noise, however, prevents TDOAs from
lying exactly on this subspace. We therefore show that TDOA denoising can be
seen as a projection operation that suppresses the component of the noise that
is orthogonal to that linear subspace. We then generalize the projection
operator also to the cases where the set of TDOAs is incomplete. We
analytically show that this operator improves the localization accuracy, and we
further confirm that via simulation.Comment: 25 pages, 9 figure
Localization using Distance Geometry : Minimal Solvers and Robust Methods for Sensor Network Self-Calibration
In this thesis, we focus on the problem of estimating receiver and sender node positions given some form of distance measurements between them. This kind of localization problem has several applications, e.g., global and indoor positioning, sensor network calibration, molecular conformations, data visualization, graph embedding, and robot kinematics. More concretely, this thesis makes contributions in three different areas.First, we present a method for simultaneously registering and merging maps. The merging problem occurs when multiple maps of an area have been constructed and need to be combined into a single representation. If there are no absolute references and the maps are in different coordinate systems, they also need to be registered. In the second part, we construct robust methods for sensor network self-calibration using both Time of Arrival (TOA) and Time Difference of Arrival (TDOA) measurements. One of the difficulties is that corrupt measurements, so-called outliers, are present and should be excluded from the model fitting. To achieve this, we use hypothesis-and-test frameworks together with minimal solvers, resulting in methods that are robust to noise, outliers, and missing data. Several new minimal solvers are introduced to accommodate a range of receiver and sender configurations in 2D and 3D space. These solvers are formulated as polynomial equation systems which are solvedusing methods from algebraic geometry.In the third part, we focus specifically on the problems of trilateration and multilateration, and we present a method that approximates the Maximum Likelihood (ML) estimator for different noise distributions. The proposed approach reduces to an eigendecomposition problem for which there are good solvers. This results in a method that is faster and more numerically stable than the state-of-the-art, while still being easy to implement. Furthermore, we present a robust trilateration method that incorporates a motion model. This enables the removal of outliers in the distance measurements at the same time as drift in the motion model is canceled
Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates
This paper presents a novel approach for indoor acoustic source localization
using microphone arrays and based on a Convolutional Neural Network (CNN). The
proposed solution is, to the best of our knowledge, the first published work in
which the CNN is designed to directly estimate the three dimensional position
of an acoustic source, using the raw audio signal as the input information
avoiding the use of hand crafted audio features. Given the limited amount of
available localization data, we propose in this paper a training strategy based
on two steps. We first train our network using semi-synthetic data, generated
from close talk speech recordings, and where we simulate the time delays and
distortion suffered in the signal that propagates from the source to the array
of microphones. We then fine tune this network using a small amount of real
data. Our experimental results show that this strategy is able to produce
networks that significantly improve existing localization methods based on
\textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN
method exhibits better resistance against varying gender of the speaker and
different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table
Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds
In this paper we address the problems of modeling the acoustic space
generated by a full-spectrum sound source and of using the learned model for
the localization and separation of multiple sources that simultaneously emit
sparse-spectrum sounds. We lay theoretical and methodological grounds in order
to introduce the binaural manifold paradigm. We perform an in-depth study of
the latent low-dimensional structure of the high-dimensional interaural
spectral data, based on a corpus recorded with a human-like audiomotor robot
head. A non-linear dimensionality reduction technique is used to show that
these data lie on a two-dimensional (2D) smooth manifold parameterized by the
motor states of the listener, or equivalently, the sound source directions. We
propose a probabilistic piecewise affine mapping model (PPAM) specifically
designed to deal with high-dimensional data exhibiting an intrinsic piecewise
linear structure. We derive a closed-form expectation-maximization (EM)
procedure for estimating the model parameters, followed by Bayes inversion for
obtaining the full posterior density function of a sound source direction. We
extend this solution to deal with missing data and redundancy in real world
spectrograms, and hence for 2D localization of natural sound sources such as
speech. We further generalize the model to the challenging case of multiple
sound sources and we propose a variational EM framework. The associated
algorithm, referred to as variational EM for source separation and localization
(VESSL) yields a Bayesian estimation of the 2D locations and time-frequency
masks of all the sources. Comparisons of the proposed approach with several
existing methods reveal that the combination of acoustic-space learning with
Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table
- …