38 research outputs found

    Optimized Acoustic Localization with SRP-PHAT for Monitoring in Distributed Sensor Networks

    Get PDF
    Acoustic localization by means of sensor arrays has a variety of applications, from conference telephony to environment monitoring. Many of these tasks are appealing for implementation on embedded systems, however large dataflows and computational complexity of multi-channel signal processing impede the development of such systems. This paper proposes a method of acoustic localization targeted for distributed systems, such as Wireless Sensor Networks (WSN). The method builds on an optimized localization algorithm of Steered Response Power with Phase Transform (SRP-PHAT) and simplifies it further by reducing the initial search region, in which the sound source is contained. The sensor array is partitioned into sub-blocks, which may be implemented as independent nodes of WSN. For the region reduction two approaches are handled. One is based on Direction of Arrival estimation and the other - on multilateration. Both approaches are tested on real signals for speaker localization and industrial machinery monitoring applications. Experiment results indicate the method’s potency in both these tasks

    Speaker Localization and Detection in Videoconferencing Environments Using a Modified SRP-PHAT Algorithm

    Full text link
    [EN] The Steered Response Power - Phase Transform (SRP-PHAT) algorithm has been shown to be one of the most robust sound source localization approaches operating in noisy and reverberant environments. However, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this paper, we introduce an effective strategy which performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation. The modified SRP-PHAT functional has been successfully implemented in a real time speaker localization system for multiparticipant videoconferencing environments. Moreover, a localization-based speech-non speech frame discriminator is presented.This work was supported by the Ministry of Education and Science under the project TEC2009-14414-C03-01.Martí Guerola, A.; Cobos Serrano, M.; Aguilera Martí, E.; López Monfort, JJ. (2011). Speaker Localization and Detection in Videoconferencing Environments Using a Modified SRP-PHAT Algorithm. Waves. 3:40-47. http://hdl.handle.net/10251/57648S4047

    Real-Time Sound Source Localization in Videoconferencing Environments

    Full text link
    [ES] Los mecanismos de Localización de Fuentes de Sonido (SSL) han sido ampliamente estudiados. Muchas aplicaciones como sistemas de teleconferencia o realzado de voz necesitan la localización de una o más fuentes acústicas. Además es esencial localizar las fuentes incluso en ambientes ruidosos y con reverberación. Se ha demostrado que el Steered Response Power (SRP) es un método más robusto que los métodos de dos pasos basados en la diferencia de tiempo de llegada. El problema en el cálculo del SRP es que es necesario el uso de un mallado fino lo que implica un coste computacional muy alto para ser utilizado en sistemas de tiempo real. Con este propósito, se ha introducido una nueva estrategia (función modificada SRP-PHAT) que puede ser usada en un sistema de tiempo real con un coste computacional bajo. Además se ha demostrado que la distribución estadística de las posiciones estimadas cuando el hablante está activo puede ser utilizado satisfactoriamente para distinguir fragmentos de habla y no habla. El principal objetivo de este trabajo es describir nuestra nueva propuesta e integrarla en un sistema de localización y detección de hablantes en tiempo real. Se mostrara la aplicabilidad del método en un entorno real de videoconferencia usando una cámara acústicamente dirigida.[EN] Sound Source Localization (SSL) mechanisms have been extensively studied. Many applications like teleconferencing or speech enhancement systems require the localization of one or more acoustic sources. Moreover, it is essential to localize sources also in noisy and reverberant environments. It has been shown that computing the Steered Response Power (SRP) is more robust approach than twostage, direct time-difference of arrival methods. The problem with computing the SRP is that a fine grid search procedure is needed, which is too expensive for a real-time system. To this end, it has been introduced a new strategy (modified SRP-PHAT functional) which can be used for a real-time system with a low computational cost. Moreover, it has been demonstrated that the statistical distribution of location estimates when a speaker is active can be successfully used to discriminate between speech and non-speech frames. The main objective of this work is to describe our new localization approach and integrate it into a real-time speaker localization and detection system. The applicability of the method will be shown for a real videoconferencing environment using an acoustically-driven steering cameraMartí Guerola, A. (2010). Real-Time Sound Source Localization in Videoconferencing Environments. http://hdl.handle.net/10251/27143.Archivo delegad

    Low-Complexity Steered Response Power Mapping based on Nyquist-Shannon Sampling

    Full text link
    The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are bandlimited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. MATLAB and Python implementations are available online

    A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling

    Full text link
    The Steered Response Power – Phase Transform (SRP-PHAT) algorithm has been shown to be one of the most robust sound source localization approaches operating in noisy and reverberant environments. However, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this letter, we introduce an effective strategy that extends the conventional SRP-PHAT functional with the aim of considering the volume surrounding the discrete locations of the spatial grid. As a result, the modified functional performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid. To this end, the Generalized Cross-Correlation (GCC) function corresponding to each microphone pair must be properly accumulated according to the defined microphone setup. Experiments carried out under different acoustic conditions confirm the validity of the proposed approach.Manuscript received September 06, 2010; revised October 22, 2010; accepted October 27, 2010. Date of publication November 11, 2010; date of current version December 16, 2010. This work was suported by the The Spanish Ministry of Science and Innovation supported this work under the project TEC2009-14414-C03-01. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Constantine L. Kotropoulos.Cobos Serrano, M.; Martí Guerola, A.; López Monfort, JJ. (2011). A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling. IEEE Signal Processing Letters. 18:71-74. doi:10.1109/LSP.2010.2091502S71741

    Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays

    Get PDF
    In the analysis of acoustic scenes, often the occurring sounds have to be detected in time, recognized, and localized in space. Usually, each of these tasks is done separately. In this paper, a model-based approach to jointly carry them out for the case of multiple simultaneous sources is presented and tested. The recognized event classes and their respective room positions are obtained with a single system that maximizes the combination of a large set of scores, each one resulting from a different acoustic event model and a different beamformer output signal, which comes from one of several arbitrarily-located small microphone arrays. By using a two-step method, the experimental work for a specific scenario consisting of meeting-room acoustic events, either isolated or overlapped with speech, is reported. Tests carried out with two datasets show the advantage of the proposed approach with respect to some usual techniques, and that the inclusion of estimated priors brings a further performance improvement.Comment: Computational acoustic scene analysis, microphone array signal processing, acoustic event detectio

    Acoustic source localisation and tracking using microphone arrays

    Get PDF
    This thesis considers the domain of acoustic source localisation and tracking in an indoor environment. Acoustic tracking has applications in security, human-computer interaction, and the diarisation of meetings. Source localisation and tracking is typically a computationally expensive task, making it hard to process on-line, especially as the number of speakers to track increases. Much of the literature considers single-source localisation, however a practical system must be able to cope with multiple speakers, possibly active simultaneously, without knowing beforehand how many speakers are present. Techniques are explored for reducing the computational requirements of an acoustic localisation system. Techniques to localise and track multiple active sources are also explored, and developed to be more computationally efficient than the current state of the art algorithms, whilst being able to track more speakers. The first contribution is the modification of a recent single-speaker source localisation technique, which improves the localisation speed. This is achieved by formalising the implicit assumption by the modified algorithm that speaker height is uniformly distributed on the vertical axis. Estimating height information effectively reduces the search space where speakers have previously been detected, but who may have moved over the horizontal-plane, and are unlikely to have significantly changed height. This is developed to allow multiple non-simultaneously active sources to be located. This is applicable when the system is given information from a secondary source such as a set of cameras allowing the efficient identification of active speakers rather than just the locations of people in the environment. The next contribution of the thesis is the application of a particle swarm technique to significantly further decrease the computational cost of localising a single source in an indoor environment, compared the state of the art. Several variants of the particle swarm technique are explored, including novel variants designed specifically for localising acoustic sources. Each method is characterised in terms of its computational complexity as well as the average localisation error. The techniques’ responses to acoustic noise are also considered, and they are found to be robust. A further contribution is made by using multi-optima swarm techniques to localise multiple simultaneously active sources. This makes use of techniques which extend the single-source particle swarm techniques to finding multiple optima of the acoustic objective function. Several techniques are investigated and their performance in terms of localisation accuracy and computational complexity is characterised. Consideration is also given to how these metrics change when an increasing number of active speakers are to be localised. Finally, the application of the multi-optima localisation methods as an input to a multi-target tracking system is presented. Tracking multiple speakers is a more complex task than tracking single acoustic source, as observations of audio activity must be associated in some way with distinct speakers. The tracker used is known to be a relatively efficient technique, and the nature of the multi-optima output format is modified to allow the application of this technique to the task of speaker tracking
    corecore