379 research outputs found
Spatial Sound Localization via Multipath Euclidean Distance Matrix Recovery
A novel localization approach is proposed in order to find the position of an individual source using recordings of a single microphone in a reverberant enclosure. The multipath propagation is modeled by multiple virtual microphones as images of the actual single microphone and a multipath distance matrix is constructed whose components consist of the squared distances between the pairs of microphones (real or virtual) or the squared distances between the microphones and the source. The distances between the actual and virtual microphones are computed from the geometry of the enclosure. The microphone-source distances correspond to the support of the early reflections in the room impulse response associated with the source signal acquisition. The low-rank property of the Euclidean distance matrix is exploited to identify this correspondence. Source localization is achieved through optimizing the location of the source matching those measurements. The recording time of the microphone and generation of the source signal is asynchronous and estimated via the proposed procedure. Furthermore, a theoretically optimal joint localization and synchronization algorithm is derived by formulating the source localization as minimization of a quartic cost function. It is shown that the global minimum of the proposed cost function can be efficiently computed by converting it to a generalized trust region subproblem. Numerical simulations on synthetic data and real data recordings obtained by practical tests show the effectiveness of the proposed approach
Compressive Matched-Field Processing
Source localization by matched-field processing (MFP) generally involves
solving a number of computationally intensive partial differential equations.
This paper introduces a technique that mitigates this computational workload by
"compressing" these computations. Drawing on key concepts from the recently
developed field of compressed sensing, it shows how a low-dimensional proxy for
the Green's function can be constructed by backpropagating a small set of
random receiver vectors. Then, the source can be located by performing a number
of "short" correlations between this proxy and the projection of the recorded
acoustic data in the compressed space. Numerical experiments in a Pekeris ocean
waveguide are presented which demonstrate that this compressed version of MFP
is as effective as traditional MFP even when the compression is significant.
The results are particularly promising in the broadband regime where using as
few as two random backpropagations per frequency performs almost as well as the
traditional broadband MFP, but with the added benefit of generic applicability.
That is, the computationally intensive backpropagations may be computed offline
independently from the received signals, and may be reused to locate any source
within the search grid area
Raking the Cocktail Party
We present the concept of an acoustic rake receiver---a microphone beamformer
that uses echoes to improve the noise and interference suppression. The rake
idea is well-known in wireless communications; it involves constructively
combining different multipath components that arrive at the receiver antennas.
Unlike spread-spectrum signals used in wireless communications, speech signals
are not orthogonal to their shifts. Therefore, we focus on the spatial
structure, rather than temporal. Instead of explicitly estimating the channel,
we create correspondences between early echoes in time and image sources in
space. These multiple sources of the desired and the interfering signal offer
additional spatial diversity that we can exploit in the beamformer design.
We present several "intuitive" and optimal formulations of acoustic rake
receivers, and show theoretically and numerically that the rake formulation of
the maximum signal-to-interference-and-noise beamformer offers significant
performance boosts in terms of noise and interference suppression. Beyond
signal-to-noise ratio, we observe gains in terms of the \emph{perceptual
evaluation of speech quality} (PESQ) metric for the speech quality. We
accompany the paper by the complete simulation and processing chain written in
Python. The code and the sound samples are available online at
\url{http://lcav.github.io/AcousticRakeReceiver/}.Comment: 12 pages, 11 figures, Accepted for publication in IEEE Journal on
Selected Topics in Signal Processing (Special Issue on Spatial Audio
Identifying High-Traffic Patterns in the Workplace With Radio Tomographic Imaging in 3D Wireless Sensor Networks
The rapid progress of wireless communication and embedded mircro-sensing electro-mechanical systems (MEMS) technologies has resulted in a growing confidence in the use of wireless sensor networks (WSNs) comprised of low-cost, low-power devices performing various monitoring tasks. Radio Tomographic Imaging (RTI) is a technology for localizing, tracking, and imaging device-free objects in a WSN using the change in received signal strength (RSS) of the radio links the object is obstructing. This thesis employs an experimental indoor three-dimensional (3-D) RTI network constructed of 80 wireless radios in a 100 square foot area. Experimental results are presented from a series of stationary target localization and target tracking experiments using one and two targets. Preliminary results demonstrate a 3-D RTI network can be effectively used to generate 3-D RSS-based images to extract target features such as size and height, and identify high-traffic patterns in the workplace by tracking asset movement
Greedy routing and virtual coordinates for future networks
At the core of the Internet, routers are continuously struggling with
ever-growing routing and forwarding tables. Although hardware advances
do accommodate such a growth, we anticipate new requirements e.g. in
data-oriented networking where each content piece has to be referenced
instead of hosts, such that current approaches relying on global
information will not be viable anymore, no matter the hardware
progress. In this thesis, we investigate greedy routing methods that
can achieve similar routing performance as today but use much less
resources and which rely on local information only. To this end, we
add specially crafted name spaces to the network in which virtual
coordinates represent the addressable entities. Our scheme enables participating
routers to make forwarding decisions using only neighbourhood information,
as the overarching pseudo-geometric name space structure already
organizes and incorporates "vicinity" at a global level.
A first challenge to the application of greedy routing on virtual
coordinates to future networks is that of "routing dead-ends"
that are local minima due to the difficulty of consistent coordinates
attribution. In this context, we propose a routing recovery scheme
based on a multi-resolution embedding of the network in low-dimensional Euclidean spaces.
The recovery is performed by routing greedily on a blurrier view of the network. The
different network detail-levels are obtained though the embedding of
clustering-levels of the graph. When compared with
higher-dimensional embeddings of a given network, our method shows a
significant diminution of routing failures for similar header and
control-state sizes.
A second challenge to the application of virtual coordinates and
greedy routing to future networks is the support of
"customer-provider" as well as "peering" relationships between
participants, resulting in a differentiated services
environment. Although an application of greedy routing within such a
setting would combine two very common fields of today's networking
literature, such a scenario has, surprisingly, not been studied so
far. In this context we propose two approaches to address this scenario.
In a first approach we implement a path-vector protocol similar to
that of BGP on top of a greedy embedding of the network. This allows
each node to build a spatial map associated with each of its
neighbours indicating the accessible regions. Routing is then
performed through the use of a decision-tree classifier taking the
destination coordinates as input. When applied on a real-world dataset
(the CAIDA 2004 AS graph) we demonstrate an up to 40% compression ratio of
the routing control information at the network's core as well as a computationally efficient
decision process comparable to methods such as binary trees and tries.
In a second approach, we take inspiration from consensus-finding in social
sciences and transform the three-dimensional distance data structure
(where the third dimension encodes the service differentiation) into a
two-dimensional matrix on which classical embedding tools can be used.
This transformation is achieved by agreeing on a set of
constraints on the inter-node distances guaranteeing an
administratively-correct greedy routing. The computed distances are
also enhanced to encode multipath support. We demonstrate a good
greedy routing performance as well as an above 90% satisfaction of multipath constraints
when relying on the non-embedded obtained distances on synthetic datasets.
As various embeddings of the consensus distances do not fully exploit their multipath potential, the use of compression techniques such as transform coding to
approximate the obtained distance allows for better routing performances
A parallel hypothesis method of autonomous underwater vehicle navigation
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution June 2009This research presents a parallel hypothesis method for autonomous underwater vehicle
navigation that enables a vehicle to expand the operating envelope of existing
long baseline acoustic navigation systems by incorporating information that is not
normally used. The parallel hypothesis method allows the in-situ identification of
acoustic multipath time-of-flight measurements between a vehicle and an external
transponder and uses them in real-time to augment the navigation algorithm during
periods when direct-path time-of-flight measurements are not available. A proof of
concept was conducted using real-world data obtained by the Woods Hole Oceanographic
Institution Deep Submergence Lab's Autonomous Benthic Explorer (ABE)
and Sentry autonomous underwater vehicles during operations on the Juan de Fuca
Ridge.
This algorithm uses a nested architecture to break the navigation solution down
into basic building blocks for each type of available external information. The algorithm
classifies external information as either line of position or gridded observations.
For any line of position observation, the algorithm generates a multi-modal block
of parallel position estimate hypotheses. The multimodal hypotheses are input into
an arbiter which produces a single unimodal output. If a priori maps of gridded
information are available, they are used within the arbiter structure to aid in the
elimination of false hypotheses. For the proof of concept, this research uses ranges
from a single external acoustic transponder in the hypothesis generation process and
grids of low-resolution bathymetric data from a ship-based multibeam sonar in the
arbitration process.
The major contributions of this research include the in-situ identification of acoustic
multipath time-of-flight measurements, the multiscale utilization of a priori low-resolution
bathymetric data in a high-resolution navigation algorithm, and the design
of a navigation algorithm with a
exible architecture. This flexible architecture allows
the incorporation of multimodal beliefs without requiring a complex mechanism for
real-time hypothesis generation and culling, and it allows the real-time incorporation
of multiple types of external information as they become available in situ into the
overall navigation solution
Structured Sparsity Models for Reverberant Speech Separation
We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition
Listening to Distances and Hearing Shapes:Inverse Problems in Room Acoustics and Beyond
A central theme of this thesis is using echoes to achieve useful, interesting, and sometimes surprising results. One should have no doubts about the echoes' constructive potential; it is, after all, demonstrated masterfully by Nature. Just think about the bat's intriguing ability to navigate in unknown spaces and hunt for insects by listening to echoes of its calls, or about similar (albeit less well-known) abilities of toothed whales, some birds, shrews, and ultimately people. We show that, perhaps contrary to conventional wisdom, multipath propagation resulting from echoes is our friend. When we think about it the right way, it reveals essential geometric information about the sources--channel--receivers system. The key idea is to think of echoes as being more than just delayed and attenuated peaks in 1D impulse responses; they are actually additional sources with their corresponding 3D locations. This transformation allows us to forget about the abstract \emph{room}, and to replace it by more familiar \emph{point sets}. We can then engage the powerful machinery of Euclidean distance geometry. A problem that always arises is that we do not know \emph{a priori} the matching between the peaks and the points in space, and solving the inverse problem is achieved by \emph{echo sorting}---a tool we developed for learning correct labelings of echoes. This has applications beyond acoustics, whenever one deals with waves and reflections, or more generally, time-of-flight measurements. Equipped with this perspective, we first address the ``Can one hear the shape of a room?'' question, and we answer it with a qualified ``yes''. Even a single impulse response uniquely describes a convex polyhedral room, whereas a more practical algorithm to reconstruct the room's geometry uses only first-order echoes and a few microphones. Next, we show how different problems of localization benefit from echoes. The first one is multiple indoor sound source localization. Assuming the room is known, we show that discretizing the Helmholtz equation yields a system of sparse reconstruction problems linked by the common sparsity pattern. By exploiting the full bandwidth of the sources, we show that it is possible to localize multiple unknown sound sources using only a single microphone. We then look at indoor localization with known pulses from the geometric echo perspective introduced previously. Echo sorting enables localization in non-convex rooms without a line-of-sight path, and localization with a single omni-directional sensor, which is impossible without echoes. A closely related problem is microphone position calibration; we show that echoes can help even without assuming that the room is known. Using echoes, we can localize arbitrary numbers of microphones at unknown locations in an unknown room using only one source at an unknown location---for example a finger snap---and get the room's geometry as a byproduct. Our study of source localization outgrew the initial form factor when we looked at source localization with spherical microphone arrays. Spherical signals appear well beyond spherical microphone arrays; for example, any signal defined on Earth's surface lives on a sphere. This resulted in the first slight departure from the main theme: We develop the theory and algorithms for sampling sparse signals on the sphere using finite rate-of-innovation principles and apply it to various signal processing problems on the sphere
Source localization via time difference of arrival
Accurate localization of a signal source, based on the signals collected by a number of receiving sensors deployed in the source surrounding area is a problem of interest in various fields. This dissertation aims at exploring different techniques to improve the localization accuracy of non-cooperative sources, i.e., sources for which the specific transmitted symbols and the time of the transmitted signal are unknown to the receiving sensors. With the localization of non-cooperative sources, time difference of arrival (TDOA) of the signals received at pairs of sensors is typically employed.
A two-stage localization method in multipath environments is proposed. During the first stage, TDOA of the signals received at pairs of sensors is estimated. In the second stage, the actual location is computed from the TDOA estimates. This later stage is referred to as hyperbolic localization and it generally involves a non-convex optimization. For the first stage, a TDOA estimation method that exploits the sparsity of multipath channels is proposed. This is formulated as an f1-regularization problem, where the f1-norm is used as channel sparsity constraint. For the second stage, three methods are proposed to offer high accuracy at different computational costs. The first method takes a semi-definite relaxation (SDR) approach to relax the hyperbolic localization to a convex optimization. The second method follows a linearized formulation of the problem and seeks a biased estimate of improved accuracy. A third method is proposed to exploit the source sparsity. With this, the hyperbolic localization is formulated as an an f1-regularization problem, where the f1-norm is used as source sparsity constraint. The proposed methods compare favorably to other existing methods, each of them having its own advantages. The SDR method has the advantage of simplicity and low computational cost. The second method may perform better than the SDR approach in some situations, but at the price of higher computational cost. The l1-regularization may outperform the first two methods, but is sensitive to the choice of a regularization parameter. The proposed two-stage localization approach is shown to deliver higher accuracy and robustness to noise, compared to existing TDOA localization methods.
A single-stage source localization method is explored. The approach is coherent in the sense that, in addition to the TDOA information, it utilizes the relative carrier phases of the received signals among pairs of sensors. A location estimator is constructed based on a maximum likelihood metric. The potential of accuracy improvement by the coherent approach is shown through the Cramer Rao lower bound (CRB). However, the technique has to contend with high peak sidelobes in the localization metric, especially at low signal-to-noise ratio (SNR). Employing a small antenna array at each sensor is shown to lower the sidelobes level in the localization metric.
Finally, the performance of time delay and amplitude estimation from samples of the received signal taken at rates lower than the conventional Nyquist rate is evaluated. To this end, a CRB is developed and its variation with system parameters is analyzed. It is shown that while with noiseless low rate sampling there is no estimation accuracy loss compared to Nyquist sampling, in the presence of additive noise the performance degrades significantly. However, increasing the low sampling rate by a small factor leads to significant performance improvement, especially for time delay estimation
Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments
La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori.
Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final.
Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización.
Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo.
El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas.
La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado
- …