12 research outputs found

    Low-Complexity Steered Response Power Mapping based on Nyquist-Shannon Sampling

    Full text link
    The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are bandlimited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. MATLAB and Python implementations are available online

    Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

    Full text link
    This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table

    Exploiting joint sparsity for far-field microphone array sound source localization

    Get PDF
    Abstract(#br)The presence of far-field noise and reverberation poses significant challenges to the conventional microphone array sound source localization approaches. Consider the sparsity contained in the source direction vector, source localization can be transformed into a compressed sensing (CS) problem by constructing the redundancy frequency domain room impulse response (RIR) matrix as CS measurement matrix. In this paper a new sparse recovery model is derived by decomposing the RIR into delay response term and reverberation response term to facilitate reverberation mitigation via frequency domain accumulation. Furthermore, as the source direction vector of adjacent speech frames tends to exhibit similar sparse pattern, namely, the direction of source can be assumed to keep static within this short period, thus there exists substantial correlation of spatial sparsity among adjacent speech frames. In this paper, under the framework of distributed compressed sensing (DCS), multiple source direction vectors are treated as sparse solutions with common spatial support to derive a joint sparse recovery algorithm for far-field source localization. The experimental results obtained in the context of a uniform circle array (UCA) show that the proposed algorithm is capable of yielding better estimation performance compared with the traditional algorithms

    Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments

    Get PDF
    La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori. Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final. Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización. Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo. El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas. La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado

    An Adaptive Neural Mechanism for Acoustic Motion Perception with Varying Sparsity

    Get PDF
    Biological motion-sensitive neural circuits are quite adept in perceiving the relative motion of a relevant stimulus. Motion perception is a fundamental ability in neural sensory processing and crucial in target tracking tasks. Tracking a stimulus entails the ability to perceive its motion, i.e., extracting information about its direction and velocity. Here we focus on auditory motion perception of sound stimuli, which is poorly understood as compared to its visual counterpart. In earlier work we have developed a bio-inspired neural learning mechanism for acoustic motion perception. The mechanism extracts directional information via a model of the peripheral auditory system of lizards. The mechanism uses only this directional information obtained via specific motor behaviour to learn the angular velocity of unoccluded sound stimuli in motion. In nature however the stimulus being tracked may be occluded by artefacts in the environment, such as an escaping prey momentarily disappearing behind a cover of trees. This article extends the earlier work by presenting a comparative investigation of auditory motion perception for unoccluded and occluded tonal sound stimuli with a frequency of 2.2 kHz in both simulation and practice. Three instances of each stimulus are employed, differing in their movement velocities–0.5°/time step, 1.0°/time step and 1.5°/time step. To validate the approach in practice, we implement the proposed neural mechanism on a wheeled mobile robot and evaluate its performance in auditory tracking

    LOCALIZATION OF STATIONARY SOURCE OF FLOOR VIBRATION USING THE STEERED RESPONSE POWER METHOD

    Get PDF
    If the generated vibration in a building exceeds the acceptable limit design for a floor system, it is necessary to identify the source of vibration, a process known as localization. The objective of this study is the localization of stationary vibration sources, and the approach used is the steered response power (SRP) method. This method has already been shown to work well for wireless and acoustical applications to locate transmitter and sound sources, respectively. To the writer’s knowledge, this study is the first application of the SRP method to locate vibration sources using floor vibration measurements. However, because waves behave differently when propagated through a concrete floor as opposed to the air, this method has been significantly modified for the application presented herein. The key and prerequisite parameter for most vibration-sensing-localization approaches is wave propagation speed (WPS). The accuracy of these approaches therefore depends on the accuracy of the WPS estimate. The WPS of a concrete floor system is a function of parameters with high variability due to the mechanical and dynamic properties of the floor. This makes the task of vibration-sensing-localization challenging for the aforementioned approaches. The SRP method has been employed because it is based on an algorithm to post-process all received signals together and such structural variability is less likely to affect the accuracy; therefore, the SRP method is more robust. Most localization approaches are based on ideal wave propagation, e.g., constant propagation speed in all directions and vibration energy decreasing predictably as the source-sensor distance increases. However, such ideal propagation does not occur in many real-world structural systems such as a concrete floor. In this study, the WPS was estimated empirically in orthogonal directions using the cross-correlation function. The SRP method used herein was adopted to use the estimated WPS in orthogonal directions as an input parameter and then automatically interpolating the corresponding propagation speed for all other directions. This is another advantage of this method over existing methods. The experiment was conducted on the second floor of a full-scale, concrete-framed building at the University of Kentucky. The WPS was estimated in orthogonal directions using an electrodynamic shaker and seven accelerometers. The shaker applied an excitation force and acted as the source of vibration, and the accelerometers were put in various locations on the floor and measured the response. Using the estimated WPS and corresponding measurement data, the SRP method was able to locate the vibration source within 2.0 m in a floor approximately 13.4 m by 8.4 m in size
    corecore