1,648 research outputs found
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
A Geometric Approach to Sound Source Localization from Time-Delay Estimates
This paper addresses the problem of sound-source localization from time-delay
estimates using arbitrarily-shaped non-coplanar microphone arrays. A novel
geometric formulation is proposed, together with a thorough algebraic analysis
and a global optimization solver. The proposed model is thoroughly described
and evaluated. The geometric analysis, stemming from the direct acoustic
propagation model, leads to necessary and sufficient conditions for a set of
time delays to correspond to a unique position in the source space. Such sets
of time delays are referred to as feasible sets. We formally prove that every
feasible set corresponds to exactly one position in the source space, whose
value can be recovered using a closed-form localization mapping. Therefore we
seek for the optimal feasible set of time delays given, as input, the received
microphone signals. This time delay estimation problem is naturally cast into a
programming task, constrained by the feasibility conditions derived from the
geometric analysis. A global branch-and-bound optimization technique is
proposed to solve the problem at hand, hence estimating the best set of
feasible time delays and, subsequently, localizing the sound source. Extensive
experiments with both simulated and real data are reported; we compare our
methodology to four state-of-the-art techniques. This comparison clearly shows
that the proposed method combined with the branch-and-bound algorithm
outperforms existing methods. These in-depth geometric understanding, practical
algorithms, and encouraging results, open several opportunities for future
work.Comment: 13 pages, 2 figures, 3 table, journa
Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees
This paper addresses the problem of ad hoc microphone array calibration where
only partial information about the distances between microphones is available.
We construct a matrix consisting of the pairwise distances and propose to
estimate the missing entries based on a novel Euclidean distance matrix
completion algorithm by alternative low-rank matrix completion and projection
onto the Euclidean distance space. This approach confines the recovered matrix
to the EDM cone at each iteration of the matrix completion algorithm. The
theoretical guarantees of the calibration performance are obtained considering
the random and locally structured missing entries as well as the measurement
noise on the known distances. This study elucidates the links between the
calibration error and the number of microphones along with the noise level and
the ratio of missing distances. Thorough experiments on real data recordings
and simulated setups are conducted to demonstrate these theoretical insights. A
significant improvement is achieved by the proposed Euclidean distance matrix
completion algorithm over the state-of-the-art techniques for ad hoc microphone
array calibration.Comment: In Press, available online, August 1, 2014.
http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal
Processing, 201
Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments
La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, asà como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometrÃa de los sensores (micrófonos) es desconocida a priori.
Las técnicas existentes hasta ahora requerÃan de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final.
Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización.
Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros fÃsicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo.
El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del tÃpico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas.
La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado
Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates
This paper presents a novel approach for indoor acoustic source localization
using microphone arrays and based on a Convolutional Neural Network (CNN). The
proposed solution is, to the best of our knowledge, the first published work in
which the CNN is designed to directly estimate the three dimensional position
of an acoustic source, using the raw audio signal as the input information
avoiding the use of hand crafted audio features. Given the limited amount of
available localization data, we propose in this paper a training strategy based
on two steps. We first train our network using semi-synthetic data, generated
from close talk speech recordings, and where we simulate the time delays and
distortion suffered in the signal that propagates from the source to the array
of microphones. We then fine tune this network using a small amount of real
data. Our experimental results show that this strategy is able to produce
networks that significantly improve existing localization methods based on
\textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN
method exhibits better resistance against varying gender of the speaker and
different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table
Surround by Sound: A Review of Spatial Audio Recording and Reproduction
In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No.
61671380 and Australian Research Council Discovery Scheme DE 150100363
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
- …