Search CORE

503 research outputs found

Comparison of Binaural RTF-Vector-Based Direction of Arrival Estimation Methods Exploiting an External Microphone

Author: Doclo Simon
Fejgin Daniel
Publication venue
Publication date: 11/04/2021
Field of study

In this paper we consider a binaural hearing aid setup, where in addition to the head-mounted microphones an external microphone is available. For this setup, we investigate the performance of several relative transfer function (RTF) vector estimation methods to estimate the direction of arrival(DOA) of the target speaker in a noisy and reverberant acoustic environment. More in particular, we consider the state-of-the-art covariance whitening (CW) and covariance subtraction (CS) methods, either incorporating the external microphone or not, and the recently proposed spatial coherence (SC) method, requiring the external microphone. To estimate the DOA from the estimated RTF vector, we propose to minimize the frequency-averaged Hermitian angle between the estimated head-mounted RTF vector and a database of prototype head-mounted RTF vectors. Experimental results with stationary and moving speech sources in a reverberant environment with diffuse-like noise show that the SC method outperforms the CS method and yields a similar DOA estimation accuracy as the CW method at a lower computational complexity.Comment: Submitted to EUSIPCO 202

arXiv.org e-Print Archive

Relative Transfer Function Vector Estimation for Acoustic Sensor Networks Exploiting Covariance Matrix Structure

Author: Doclo Simon
Gode Henri
Middelberg Wiebke
Publication venue
Publication date: 27/10/2023
Field of study

In many multi-microphone algorithms for noise reduction, an estimate of the relative transfer function (RTF) vector of the target speaker is required. The state-of-the-art covariance whitening (CW) method estimates the RTF vector as the principal eigenvector of the whitened noisy covariance matrix, where whitening is performed using an estimate of the noise covariance matrix. In this paper, we consider an acoustic sensor network consisting of multiple microphone nodes. Assuming uncorrelated noise between the nodes but not within the nodes, we propose two RTF vector estimation methods that leverage the block-diagonal structure of the noise covariance matrix. The first method modifies the CW method by considering only the diagonal blocks of the estimated noise covariance matrix. In contrast, the second method only considers the off-diagonal blocks of the noisy covariance matrix, but cannot be solved using a simple eigenvalue decomposition. When applying the estimated RTF vector in a minimum variance distortionless response beamformer, simulation results for real-world recordings in a reverberant environment with multiple noise sources show that the modified CW method performs slightly better than the CW method in terms of SNR improvement, while the off-diagonal selection method outperforms a biased RTF vector estimate obtained as the principal eigenvector of the noisy covariance matrix.Comment: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz NY, USA, Oct. 202

arXiv.org e-Print Archive

RTF-Based Binaural MVDR Beamformer Exploiting an External Microphone in a Diffuse Noise Field

Author: Doclo S.
Gößling N.
Publication venue
Publication date: 12/07/2018
Field of study

Besides suppressing all undesired sound sources, an important objective of a binaural noise reduction algorithm for hearing devices is the preservation of the binaural cues, aiming at preserving the spatial perception of the acoustic scene. A well-known binaural noise reduction algorithm is the binaural minimum variance distortionless response beamformer, which can be steered using the relative transfer function (RTF) vector of the desired source, relating the acoustic transfer functions between the desired source and all microphones to a reference microphone. In this paper, we propose a computationally efficient method to estimate the RTF vector in a diffuse noise field, requiring an additional microphone that is spatially separated from the head-mounted microphones. Assuming that the spatial coherence between the noise components in the head-mounted microphone signals and the additional microphone signal is zero, we show that an unbiased estimate of the RTF vector can be obtained. Based on real-world recordings, experimental results for several reverberation times show that the proposed RTF estimator outperforms the widely used RTF estimator based on covariance whitening and a simple biased RTF estimator in terms of noise reduction and binaural cue preservation performance.Comment: Accepted at ITG Conference on Speech Communication 201

arXiv.org e-Print Archive

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Author: Bohac Marek
Koldovsky Zbynek
Malek Jiri
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 11/12/2019
Field of study

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

arXiv.org e-Print Archive

DSpace@TUL

First Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Angular Power Spectrum

Author: Barnes C.
Bennett C. L.
Halpern M.
Hill R. S.
Hinshaw G.
Jarosik N.
Kogut A.
Komatsu E.
Limon M.
Meyer S. S.
Page L.
Spergel D. N.
Tucker G. S.
Verde L.
Weiland J.
Wollack E.
Wright E. L.
Publication venue: 'University of Chicago Press'
Publication date: 11/02/2003
Field of study

We present the angular power spectrum derived from the first-year Wilkinson Microwave Anisotropy Probe (WMAP) sky maps. We study a variety of power spectrum estimation methods and data combinations and demonstrate that the results are robust. The data are modestly contaminated by diffuse Galactic foreground emission, but we show that a simple Galactic template model is sufficient to remove the signal. Point sources produce a modest contamination in the low frequency data. After masking ~700 known bright sources from the maps, we estimate residual sources contribute ~3500 uK^2 at 41 GHz, and ~130 uK^2 at 94 GHz, to the power spectrum l*(l+1)*C_l/(2*pi) at l=1000. Systematic errors are negligible compared to the (modest) level of foreground emission. Our best estimate of the power spectrum is derived from 28 cross-power spectra of statistically independent channels. The final spectrum is essentially independent of the noise properties of an individual radiometer. The resulting spectrum provides a definitive measurement of the CMB power spectrum, with uncertainties limited by cosmic variance, up to l~350. The spectrum clearly exhibits a first acoustic peak at l=220 and a second acoustic peak at l~540 and it provides strong support for adiabatic initial conditions. Kogut et al. (2003) analyze the C_l^TE power spectrum, and present evidence for a relatively high optical depth, and an early period of cosmic reionization. Among other things, this implies that the temperature power spectrum has been suppressed by \~30% on degree angular scales, due to secondary scattering.Comment: One of thirteen companion papers on first-year WMAP results submitted to ApJ; 44 pages, 14 figures; a version with higher quality figures is also available at http://lambda.gsfc.nasa.gov/product/map/map_bibliography.htm

arXiv.org e-Print Archive

Crossref

CERN Document Server

Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation

Author: Gómez García Ángel Manuel
López Espejo Iván
Martín Doñas Juan M.
Peinado Herreros Antonio Miguel
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish Ministry of Education through the National Program FPU under Grant FPU15/0416

Multidisciplinary Digital Publishing Institute

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada

VBN