163 research outputs found

    Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework

    Get PDF
    The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods

    Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments

    Get PDF
    La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori. Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final. Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización. Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo. El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas. La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado

    Exploiting CNNs for Improving Acoustic Source Localization in Noisy and Reverberant Conditions

    Get PDF
    This paper discusses the application of convolutional neural networks (CNNs) to minimum variance distortionless response localization schemes. We investigate the direction of arrival estimation problems in noisy and reverberant conditions using a uniform linear array (ULA). CNNs are used to process the multichannel data from the ULA and to improve the data fusion scheme, which is performed in the steered response power computation. CNNs improve the incoherent frequency fusion of the narrowband response power by weighting the components, reducing the deleterious effects of those components affected by artifacts due to noise and reverberation. The use of CNNs avoids the necessity of previously encoding the multichannel data into selected acoustic cues with the advantage to exploit its ability in recognizing geometrical pattern similarity. Experiments with both simulated and real acoustic data demonstrate the superior localization performance of the proposed SRP beamformer with respect to other state-of-the-art techniques

    Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors

    No full text
    Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments

    Three-Dimensional Geometry Inference of Convex and Non-Convex Rooms using Spatial Room Impulse Responses

    Get PDF
    This thesis presents research focused on the problem of geometry inference for both convex- and non-convex-shaped rooms, through the analysis of spatial room impulse responses. Current geometry inference methods are only applicable to convex-shaped rooms, requiring between 6--78 discretely spaced measurement positions, and are only accurate under certain conditions, such as a first-order reflection for each boundary being identifiable across all, or some subset of, these measurements. This thesis proposes that by using compact microphone arrays capable of capturing spatiotemporal information, boundary locations, and hence room shape for both convex and non-convex cases, can be inferred, using only a sufficient number of measurement positions to ensure each boundary has a first-order reflection attributable to, and identifiable in, at least one measurement. To support this, three research areas are explored. Firstly, the accuracy of direction-of-arrival estimation for reflections in binaural room impulse responses is explored, using a state-of-the-art methodology based on binaural model fronted neural networks. This establishes whether a two-microphone array can produce accurate enough direction-of-arrival estimates for geometry inference. Secondly, a spherical microphone array based spatiotemporal decomposition workflow for analysing reflections in room impulse responses is explored. This establishes that simultaneously arriving reflections can be individually detected, relaxing constraints on measurement positions. Finally, a geometry inference method applicable to both convex and more complex non-convex shaped rooms is proposed. Therefore, this research expands the possible scenarios in which geometry inference can be successfully applied at a level of accuracy comparable to existing work, through the use of commonly used compact microphone arrays. Based on these results, future improvements to this approach are presented and discussed in detail

    3-D Beamspace ML Based Bearing Estimator Incorporating Frequency Diversity and Interference Cancellation

    Get PDF
    The problem of low-angle radar tracking utilizing an array of antennas is considered. In the low-angle environment, echoes return from a low flying target via a specular path as well as a direct path. The problem is compounded by the fact that the two signals arrive within a beamwidth of each other and are usually fully correlated, or coherent. In addition, the SNR at each antenna element is typically low and only a small number of data samples, or snapshots, is available for processing due to the rapid movement of the target. Theoretical studies indicates that the Maximum Likelihood (ML) method is the only reliable estimation procedure in this type of scenario. However, the classical ML estimator involves a multi-dimensional search over a multi-modal surface and is consequently computationally burdensome. In order to facilitate real time processing, we here propose the idea of beamspace domain processing in which the element space snapshot vectors are first operated on by a reduced Butler matrix composed of three orthogonal beamforming weight vectors facilitating a simple, closed-form Beamspace Domain ML (BDML) estimator for the direct and specular path angles. The computational simplicity of the method arises from the fact that the respective beams associated with the three columns of the reduced Butler matrix have all but three nulls in common. The performance of the BDML estimator is enhanced by incorporating the estimation of the complex reflection coefficient and the bisector angle, respectively, for the symmetric and nonsymmetric multipath cases. To minimize the probability of track breaking, the use of frequency diversity is incorporated. The concept of coherent signal subspace processing is invoked as a means for retaining the computational simplicity of single frequency operation. With proper selection of the auxiliary frequencies, it is shown that perfect focusing may be achieved without iterating. In order to combat the effects of strong interfering sources, a novel scheme is presented for adaptively forming the three beams which retains the feature of common nulls

    A weighted MVDR beamformer based on SVM learning for sound source localization

    Get PDF
    3noA weighted minimum variance distortionless response (WMVDR) algorithm for near-field sound localization in a reverberant environment is presented. The steered response power computation of the WMVDR is based on a machine learning component which improves the incoherent frequency fusion of the narrowband power maps. A support vector machine (SVM) classifier is adopted to select the components of the fusion. The skewness measure of the narrowband power map marginal distribution is showed to be an effective feature for the supervised learning of the power map selection. Experiments with both simulated and real data demonstrate the improvement of the WMVDR beamformer localization accuracy with respect to other state-of-the-art techniques.partially_openopenSalvati, Daniele; Drioli, Carlo; Foresti, Gian LucaSalvati, Daniele; Drioli, Carlo; Foresti, Gian Luc

    Robust Near-Field Adaptive Beamforming with Distance Discrimination

    Get PDF
    This paper proposes a robust near-field adaptive beamformer for microphone array applications in small rooms. Robustness against location errors is crucial for near-field adaptive beamforming due to the difficulty in estimating near-field signal locations especially the radial distances. A near-field regionally constrained adaptive beamformer is proposed to design a set of linear constraints by filtering on a low rank subspace of the near-field signal over a spatial region and frequency band such that the beamformer response over the designed spatial-temporal region can be accurately controlled by a small number of linear constraint vectors. The proposed constraint design method is a systematic approach which guarantees real arithmetic implementation and direct time domain algorithms for broadband beamforming. It improves the robustness against large errors in distance and directions of arrival, and achieves good distance discrimination simultaneously. We show with a nine-element uniform linear array that the proposed near-field adaptive beamformer is robust against distance errors as large as ±32% of the presumed radial distance and angle errors up to ±20⁰. It can suppress a far field interfering signal with the same angle of incidence as a near-field target by more than 20 dB with no loss of the array gain at the near-field target. The significant distance discrimination of the proposed near-field beamformer also helps to improve the dereverberation gain and reduce the desired signal cancellation in reverberant environments

    Sensor array signal processing : two decades later

    Get PDF
    Caption title.Includes bibliographical references (p. 55-65).Supported by Army Research Office. DAAL03-92-G-115 Supported by the Air Force Office of Scientific Research. F49620-92-J-2002 Supported by the National Science Foundation. MIP-9015281 Supported by the ONR. N00014-91-J-1967 Supported by the AFOSR. F49620-93-1-0102Hamid Krim, Mats Viberg
    corecore