38 research outputs found
Optimized Acoustic Localization with SRP-PHAT for Monitoring in Distributed Sensor Networks
Acoustic localization by means of sensor arrays has a variety of applications, from conference telephony to environment monitoring. Many of these tasks are appealing for implementation on embedded systems, however large dataflows and computational complexity of multi-channel signal processing impede the development of such systems. This paper proposes a method of acoustic localization targeted for distributed systems, such as Wireless Sensor Networks (WSN). The method builds on an optimized localization algorithm of Steered Response Power with Phase Transform (SRP-PHAT) and simplifies it further by reducing the initial search region, in which the sound source is contained. The sensor array is partitioned into sub-blocks, which may be implemented as independent nodes of WSN. For the region reduction two approaches are handled. One is based on Direction of Arrival estimation and the other - on multilateration. Both approaches are tested on real signals for speaker localization and industrial machinery monitoring applications. Experiment results indicate the method’s potency in both these tasks
Speaker Localization and Detection in Videoconferencing Environments Using a Modified SRP-PHAT Algorithm
[EN] The Steered Response Power - Phase Transform (SRP-PHAT) algorithm has been shown to be one of the most robust sound source localization approaches operating in noisy and reverberant environments. However, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this paper, we introduce an effective strategy which performs a full exploration of the sampled
space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation. The modified SRP-PHAT functional has been successfully implemented in a real time speaker localization system for multiparticipant videoconferencing environments. Moreover, a localization-based speech-non speech frame discriminator is presented.This work was supported by the Ministry of Education and Science under the project TEC2009-14414-C03-01.Martà Guerola, A.; Cobos Serrano, M.; Aguilera MartÃ, E.; López Monfort, JJ. (2011). Speaker Localization and Detection in Videoconferencing Environments Using a Modified SRP-PHAT Algorithm. Waves. 3:40-47. http://hdl.handle.net/10251/57648S4047
Real-Time Sound Source Localization in Videoconferencing Environments
[ES] Los mecanismos de Localización de Fuentes de Sonido (SSL) han sido ampliamente estudiados. Muchas
aplicaciones como sistemas de teleconferencia o realzado de voz necesitan la localización de una o más
fuentes acústicas. Además es esencial localizar las fuentes incluso en ambientes ruidosos y con reverberación. Se ha demostrado que el Steered Response Power (SRP) es un método más robusto que los
métodos de dos pasos basados en la diferencia de tiempo de llegada. El problema en el cálculo del SRP
es que es necesario el uso de un mallado fino lo que implica un coste computacional muy alto para
ser utilizado en sistemas de tiempo real. Con este propósito, se ha introducido una nueva estrategia
(función modificada SRP-PHAT) que puede ser usada en un sistema de tiempo real con un coste computacional
bajo. Además se ha demostrado que la distribución estadÃstica de las posiciones estimadas
cuando el hablante está activo puede ser utilizado satisfactoriamente para distinguir fragmentos de
habla y no habla. El principal objetivo de este trabajo es describir nuestra nueva propuesta e integrarla
en un sistema de localización y detección de hablantes en tiempo real. Se mostrara la aplicabilidad del
método en un entorno real de videoconferencia usando una cámara acústicamente dirigida.[EN] Sound Source Localization (SSL) mechanisms have been extensively studied. Many applications like
teleconferencing or speech enhancement systems require the localization of one or more acoustic
sources. Moreover, it is essential to localize sources also in noisy and reverberant environments. It
has been shown that computing the Steered Response Power (SRP) is more robust approach than twostage,
direct time-difference of arrival methods. The problem with computing the SRP is that a fine
grid search procedure is needed, which is too expensive for a real-time system. To this end, it has been
introduced a new strategy (modified SRP-PHAT functional) which can be used for a real-time system
with a low computational cost. Moreover, it has been demonstrated that the statistical distribution of
location estimates when a speaker is active can be successfully used to discriminate between speech and
non-speech frames. The main objective of this work is to describe our new localization approach and
integrate it into a real-time speaker localization and detection system. The applicability of the method
will be shown for a real videoconferencing environment using an acoustically-driven steering cameraMartà Guerola, A. (2010). Real-Time Sound Source Localization in Videoconferencing Environments. http://hdl.handle.net/10251/27143.Archivo delegad
Low-Complexity Steered Response Power Mapping based on Nyquist-Shannon Sampling
The steered response power (SRP) approach to acoustic source localization
computes a map of the acoustic scene from the frequency-weighted output power
of a beamformer steered towards a set of candidate locations. Equivalently, SRP
may be expressed in terms of time-domain generalized cross-correlations (GCCs)
at lags equal to the candidate locations' time-differences of arrival (TDOAs).
Due to the dense grid of candidate locations, each of which requires inverse
Fourier transform (IFT) evaluations, conventional SRP exhibits a high
computational complexity. In this paper, we propose a low-complexity SRP
approach based on Nyquist-Shannon sampling. Noting that on the one hand the
range of possible TDOAs is physically bounded, while on the other hand the GCCs
are bandlimited, we critically sample the GCCs around their TDOA interval and
approximate the SRP map by interpolation. In usual setups, the number of sample
points can be orders of magnitude less than the number of candidate locations
and frequency bins, yielding a significant reduction of IFT computations at a
limited interpolation cost. Simulations comparing the proposed approximation
with conventional SRP indicate low approximation errors and equal localization
performance. MATLAB and Python implementations are available online
A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling
The Steered Response Power – Phase Transform
(SRP-PHAT) algorithm has been shown to be one of the most robust
sound source localization approaches operating in noisy and
reverberant environments. However, its practical implementation
is usually based on a costly fine grid-search procedure, making
the computational cost of the method a real issue. In this letter,
we introduce an effective strategy that extends the conventional
SRP-PHAT functional with the aim of considering the volume
surrounding the discrete locations of the spatial grid. As a result,
the modified functional performs a full exploration of the sampled
space rather than computing the SRP at discrete spatial positions,
increasing its robustness and allowing for a coarser spatial grid.
To this end, the Generalized Cross-Correlation (GCC) function
corresponding to each microphone pair must be properly accumulated
according to the defined microphone setup. Experiments
carried out under different acoustic conditions confirm the validity
of the proposed approach.Manuscript received September 06, 2010; revised October 22, 2010; accepted October 27, 2010. Date of publication November 11, 2010; date of current version December 16, 2010. This work was suported by the The Spanish Ministry of Science and Innovation supported this work under the project TEC2009-14414-C03-01. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Constantine L. Kotropoulos.Cobos Serrano, M.; Martà Guerola, A.; López Monfort, JJ. (2011). A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling. IEEE Signal Processing Letters. 18:71-74. doi:10.1109/LSP.2010.2091502S71741
Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays
In the analysis of acoustic scenes, often the occurring sounds have to be
detected in time, recognized, and localized in space. Usually, each of these
tasks is done separately. In this paper, a model-based approach to jointly
carry them out for the case of multiple simultaneous sources is presented and
tested. The recognized event classes and their respective room positions are
obtained with a single system that maximizes the combination of a large set of
scores, each one resulting from a different acoustic event model and a
different beamformer output signal, which comes from one of several
arbitrarily-located small microphone arrays. By using a two-step method, the
experimental work for a specific scenario consisting of meeting-room acoustic
events, either isolated or overlapped with speech, is reported. Tests carried
out with two datasets show the advantage of the proposed approach with respect
to some usual techniques, and that the inclusion of estimated priors brings a
further performance improvement.Comment: Computational acoustic scene analysis, microphone array signal
processing, acoustic event detectio
Acoustic source localisation and tracking using microphone arrays
This thesis considers the domain of acoustic source localisation and tracking in an indoor environment.
Acoustic tracking has applications in security, human-computer interaction, and the
diarisation of meetings. Source localisation and tracking is typically a computationally expensive
task, making it hard to process on-line, especially as the number of speakers to track increases.
Much of the literature considers single-source localisation, however a practical system
must be able to cope with multiple speakers, possibly active simultaneously, without knowing
beforehand how many speakers are present. Techniques are explored for reducing the computational
requirements of an acoustic localisation system. Techniques to localise and track
multiple active sources are also explored, and developed to be more computationally efficient
than the current state of the art algorithms, whilst being able to track more speakers.
The first contribution is the modification of a recent single-speaker source localisation technique,
which improves the localisation speed. This is achieved by formalising the implicit assumption
by the modified algorithm that speaker height is uniformly distributed on the vertical
axis. Estimating height information effectively reduces the search space where speakers have
previously been detected, but who may have moved over the horizontal-plane, and are unlikely
to have significantly changed height. This is developed to allow multiple non-simultaneously
active sources to be located. This is applicable when the system is given information from a
secondary source such as a set of cameras allowing the efficient identification of active speakers
rather than just the locations of people in the environment.
The next contribution of the thesis is the application of a particle swarm technique to significantly
further decrease the computational cost of localising a single source in an indoor environment,
compared the state of the art. Several variants of the particle swarm technique are
explored, including novel variants designed specifically for localising acoustic sources. Each
method is characterised in terms of its computational complexity as well as the average localisation
error. The techniques’ responses to acoustic noise are also considered, and they are
found to be robust.
A further contribution is made by using multi-optima swarm techniques to localise multiple
simultaneously active sources. This makes use of techniques which extend the single-source
particle swarm techniques to finding multiple optima of the acoustic objective function. Several
techniques are investigated and their performance in terms of localisation accuracy and computational
complexity is characterised. Consideration is also given to how these metrics change
when an increasing number of active speakers are to be localised.
Finally, the application of the multi-optima localisation methods as an input to a multi-target
tracking system is presented. Tracking multiple speakers is a more complex task than tracking
single acoustic source, as observations of audio activity must be associated in some way with
distinct speakers. The tracker used is known to be a relatively efficient technique, and the nature
of the multi-optima output format is modified to allow the application of this technique to the
task of speaker tracking