133 research outputs found
Blind Sparse-nonnegative (BSN) Channel Identification for Acousitic Time-Difference-of-Arrival Estimation
Estimating time-difference-of-arrival (TDOA) remains a challenging task when acoustic environments are reverberant and noisy. Blind channel identification approaches for TDOA estimation explicitly model multipath reflections and have been demonstrated to be effective in dealing with reverberation. Unfortunately, existing blind channel identification algorithms are sensitive to ambient noise. This papers hows how to resolve the noise sensitivity issue by exploiting prior knowledge about an acoustic room impulse response (RIR), namely, an acoustic RIR can be modeled by a sparse-nonnegative FIR filter. This paper shows how to formulate a single-input two-output blind channel identification into a least square convex optimization, and how to incorporate the sparsity and nonnegativity priors so that the resulting optimization remains convex and can be solved efficiently. The proposed blind sparse-nonnegative (BSN) channel identification approach for TDOA estimation is not only robust to reverberation, but also robust to ambient noise, as demonstrated by simulations and experiments in real acoustic environments
Indoor Source Localization of Radio Frequency Transmitters Using Blind Channel Identification Techniques
Locating transmitters is a research area that is becoming increasingly relevant as technology advances. It is especially useful for determining the location of livestock, drones, keys, phones, tablets, etc. As a result of this push for locating devices, many algorithms have been developed to determine source locations. Most source location algorithms and techniques rely on a line of sight , or a direct path between the source and the receivers to provide accurate results.
Indoor environments pose a challenge to locating transmitters due to the many surfaces that allow radio waves to interact (reflect, refract, and generally distort) with them. Because of the effects of the radio wave interactions, a direct path from the transmitter to the receivers may not be possible inside, increasing the difficulty. This problem is further augmented when the transmitter is transmitting an unknown signal in an unknown environment.
This research derives algorithms to address these issues. The algorithms are tested via simulations and real-world environmental testing
Mathematical modelling ano optimization strategies for acoustic source localization in reverberant environments
La presente Tesis se centra en el uso de técnicas modernas de optimización y de procesamiento de audio para la localización precisa y robusta de personas dentro de un entorno reverberante dotado con agrupaciones (arrays) de micrófonos. En esta tesis se han estudiado diversos aspectos de la localización sonora, incluyendo el modelado, la algoritmia, así como el calibrado previo que permite usar los algoritmos de localización incluso cuando la geometría de los sensores (micrófonos) es desconocida a priori.
Las técnicas existentes hasta ahora requerían de un número elevado de micrófonos para obtener una alta precisión en la localización. Sin embargo, durante esta tesis se ha desarrollado un nuevo método que permite una mejora de más del 30\% en la precisión de la localización con un número reducido de micrófonos. La reducción en el número de micrófonos es importante ya que se traduce directamente en una disminución drástica del coste y en un aumento de la versatilidad del sistema final.
Adicionalmente, se ha realizado un estudio exhaustivo de los fenómenos que afectan al sistema de adquisición y procesado de la señal, con el objetivo de mejorar el modelo propuesto anteriormente. Dicho estudio profundiza en el conocimiento y modelado del filtrado PHAT (ampliamente utilizado en localización acústica) y de los aspectos que lo hacen especialmente adecuado para localización.
Fruto del anterior estudio, y en colaboración con investigadores del instituto IDIAP (Suiza), se ha desarrollado un sistema de auto-calibración de las posiciones de los micrófonos a partir del ruido difuso presente en una sala en silencio. Esta aportación relacionada con los métodos previos basados en la coherencia. Sin embargo es capaz de reducir el ruido atendiendo a parámetros físicos previamente conocidos (distancia máxima entre los micrófonos). Gracias a ello se consigue una mejor precisión utilizando un menor tiempo de cómputo.
El conocimiento de los efectos del filtro PHAT ha permitido crear un nuevo modelo que permite la representación 'sparse' del típico escenario de localización. Este tipo de representación se ha demostrado ser muy conveniente para localización, permitiendo un enfoque sencillo del caso en el que existen múltiples fuentes simultáneas.
La última aportación de esta tesis, es el de la caracterización de las Matrices TDOA (Time difference of arrival -Diferencia de tiempos de llegada, en castellano-). Este tipo de matrices son especialmente útiles en audio pero no están limitadas a él. Además, este estudio transciende a la localización con sonido ya que propone métodos de reducción de ruido de las medias TDOA basados en una representación matricial 'low-rank', siendo útil, además de en localización, en técnicas tales como el beamforming o el autocalibrado
Under-modelled blind system identification for time delay estimation in reverberant environments
In multichannel systems, acoustic time delay estimation (TDE) is a challenging problem in reverberant environments. Although blind system identification (BSI) based methods have been proposed which utilize a realistic signal model for the room impulse response (RIR), their TDE performance depends strongly on that of the BSI, which is often inaccurate in practice when the identified responses are under-modelled. In this paper, we propose a new under-modelled BSI based method for TDE in reverberant environments. An under-modelled BSI algorithm is derived, which is based on maximizing the cross-correlation of the cross-filtered signals rather than minimizing the cross-relation error, and also exploits the sparsity of the early part of the RIR. For TDE, this new criterion can be viewed as a generalization of conventional cross-correlation-based TDE methods by considering a more realistic model for the early RIR. Depending on the microphone spacing, only a short early part of each RIR is identified, and the time delays are estimated based on the peak locations in the identified early RIRs. Experiments in different reverberant environments with speech source signals demonstrate the effectiveness of the proposed method
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
Spatial features of reverberant speech: estimation and application to recognition and diarization
Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed. An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced.
Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.Open Acces
System Identification with Applications in Speech Enhancement
As the increasing popularity of integrating hands-free telephony on mobile portable devices
and the rapid development of voice over internet protocol, identification of acoustic
systems has become desirable for compensating distortions introduced to speech signals
during transmission, and hence enhancing the speech quality. The objective of this research
is to develop system identification algorithms for speech enhancement applications
including network echo cancellation and speech dereverberation.
A supervised adaptive algorithm for sparse system identification is developed for
network echo cancellation. Based on the framework of selective-tap updating scheme
on the normalized least mean squares algorithm, the MMax and sparse partial update
tap-selection strategies are exploited in the frequency domain to achieve fast convergence
performance with low computational complexity. Through demonstrating how
the sparseness of the network impulse response varies in the transformed domain, the
multidelay filtering structure is incorporated to reduce the algorithmic delay.
Blind identification of SIMO acoustic systems for speech dereverberation in the
presence of common zeros is then investigated. First, the problem of common zeros is
defined and extended to include the presence of near-common zeros. Two clustering algorithms
are developed to quantify the number of these zeros so as to facilitate the study
of their effect on blind system identification and speech dereverberation. To mitigate such
effect, two algorithms are developed where the two-stage algorithm based on channel
decomposition identifies common and non-common zeros sequentially; and the forced
spectral diversity approach combines spectral shaping filters and channel undermodelling
for deriving a modified system that leads to an improved dereverberation performance.
Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased
dereverberation techniques. Comprehensive simulations and discussions demonstrate
the effectiveness of the aforementioned algorithms. A discussion on possible directions
of prospective research on system identification techniques concludes this thesis
- …