Search CORE

449 research outputs found

Covariance Blocking and Whitening Method for Successive Relative Transfer Function Vector Estimation in Multi-Speaker Scenarios

Author: Doclo Simon
Gode Henri
Publication venue
Publication date: 24/10/2023
Field of study

This paper addresses the challenge of estimating the relative transfer function (RTF) vectors of multiple speakers in a noisy and reverberant environment. More specifically, we consider a scenario where two speakers activate successively. In this scenario, the RTF vector of the first speaker can be estimated in a straightforward way and the main challenge lies in estimating the RTF vector of the second speaker during segments where both speakers are simultaneously active. To estimate the RTF vector of the second speaker the so-called blind oblique projection (BOP) method determines the oblique projection operator that optimally blocks the second speaker. Instead of blocking the second speaker, in this paper we propose a covariance blocking and whitening (CBW) method, which first blocks the first speaker and applies whitening using the estimated noise covariance matrix and then estimates the RTF vector of the second speaker based on a singular value decomposition. When using the estimated RTF vectors of both speakers in a linearly constrained minimum variance beamformer, simulation results using real-world recordings for multiple speaker positions demonstrate that the proposed CBW method outperforms the conventional BOP and covariance whitening methods in terms of signal-to-interferer-and-noise ratio improvement.Comment: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct 22-25, 202

arXiv.org e-Print Archive

Comparison of Binaural RTF-Vector-Based Direction of Arrival Estimation Methods Exploiting an External Microphone

Author: Doclo Simon
Fejgin Daniel
Publication venue
Publication date: 11/04/2021
Field of study

In this paper we consider a binaural hearing aid setup, where in addition to the head-mounted microphones an external microphone is available. For this setup, we investigate the performance of several relative transfer function (RTF) vector estimation methods to estimate the direction of arrival(DOA) of the target speaker in a noisy and reverberant acoustic environment. More in particular, we consider the state-of-the-art covariance whitening (CW) and covariance subtraction (CS) methods, either incorporating the external microphone or not, and the recently proposed spatial coherence (SC) method, requiring the external microphone. To estimate the DOA from the estimated RTF vector, we propose to minimize the frequency-averaged Hermitian angle between the estimated head-mounted RTF vector and a database of prototype head-mounted RTF vectors. Experimental results with stationary and moving speech sources in a reverberant environment with diffuse-like noise show that the SC method outperforms the CS method and yields a similar DOA estimation accuracy as the CW method at a lower computational complexity.Comment: Submitted to EUSIPCO 202

arXiv.org e-Print Archive

RTF-Based Binaural MVDR Beamformer Exploiting an External Microphone in a Diffuse Noise Field

Author: Doclo S.
Gößling N.
Publication venue
Publication date: 12/07/2018
Field of study

Besides suppressing all undesired sound sources, an important objective of a binaural noise reduction algorithm for hearing devices is the preservation of the binaural cues, aiming at preserving the spatial perception of the acoustic scene. A well-known binaural noise reduction algorithm is the binaural minimum variance distortionless response beamformer, which can be steered using the relative transfer function (RTF) vector of the desired source, relating the acoustic transfer functions between the desired source and all microphones to a reference microphone. In this paper, we propose a computationally efficient method to estimate the RTF vector in a diffuse noise field, requiring an additional microphone that is spatially separated from the head-mounted microphones. Assuming that the spatial coherence between the noise components in the head-mounted microphone signals and the additional microphone signal is zero, we show that an unbiased estimate of the RTF vector can be obtained. Based on real-world recordings, experimental results for several reverberation times show that the proposed RTF estimator outperforms the widely used RTF estimator based on covariance whitening and a simple biased RTF estimator in terms of noise reduction and binaural cue preservation performance.Comment: Accepted at ITG Conference on Speech Communication 201

arXiv.org e-Print Archive

First Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Angular Power Spectrum

Author: Barnes C.
Bennett C. L.
Halpern M.
Hill R. S.
Hinshaw G.
Jarosik N.
Kogut A.
Komatsu E.
Limon M.
Meyer S. S.
Page L.
Spergel D. N.
Tucker G. S.
Verde L.
Weiland J.
Wollack E.
Wright E. L.
Publication venue: 'University of Chicago Press'
Publication date: 11/02/2003
Field of study

We present the angular power spectrum derived from the first-year Wilkinson Microwave Anisotropy Probe (WMAP) sky maps. We study a variety of power spectrum estimation methods and data combinations and demonstrate that the results are robust. The data are modestly contaminated by diffuse Galactic foreground emission, but we show that a simple Galactic template model is sufficient to remove the signal. Point sources produce a modest contamination in the low frequency data. After masking ~700 known bright sources from the maps, we estimate residual sources contribute ~3500 uK^2 at 41 GHz, and ~130 uK^2 at 94 GHz, to the power spectrum l*(l+1)*C_l/(2*pi) at l=1000. Systematic errors are negligible compared to the (modest) level of foreground emission. Our best estimate of the power spectrum is derived from 28 cross-power spectra of statistically independent channels. The final spectrum is essentially independent of the noise properties of an individual radiometer. The resulting spectrum provides a definitive measurement of the CMB power spectrum, with uncertainties limited by cosmic variance, up to l~350. The spectrum clearly exhibits a first acoustic peak at l=220 and a second acoustic peak at l~540 and it provides strong support for adiabatic initial conditions. Kogut et al. (2003) analyze the C_l^TE power spectrum, and present evidence for a relatively high optical depth, and an early period of cosmic reionization. Among other things, this implies that the temperature power spectrum has been suppressed by \~30% on degree angular scales, due to secondary scattering.Comment: One of thirteen companion papers on first-year WMAP results submitted to ApJ; 44 pages, 14 figures; a version with higher quality figures is also available at http://lambda.gsfc.nasa.gov/product/map/map_bibliography.htm

arXiv.org e-Print Archive

Crossref

CERN Document Server

First Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Data Processing Methods and Systematic Errors Limits

Author: A. Kogut
C. Barnes
C. L. Bennett
D. N. Spergel
E. Komatsu
E. L. Wright
G. Hinshaw
G. S. Tucker
M. Halpern
M. Limon
M. R. Greason
M. R. Nolta
Ms H
N. Jarosik
N. Odegard
O. Doré
R. Bean
R. S. Hill
S. S. Meyer
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2003
Field of study

We describe the calibration and data processing methods used to generate full-sky maps of the cosmic microwave background (CMB) from the first year of Wilkinson Microwave Anisotropy Probe (WMAP) observations. Detailed limits on residual systematic errors are assigned based largely on analyses of the flight data supplemented, where necessary, with results from ground tests. The data are calibrated in flight using the dipole modulation of the CMB due to the observatory's motion around the Sun. This constitutes a full-beam calibration source. An iterative algorithm simultaneously fits the time-ordered data to obtain calibration parameters and pixelized sky map temperatures. The noise properties are determined by analyzing the time-ordered data with this sky signal estimate subtracted. Based on this, we apply a pre-whitening filter to the time-ordered data to remove a low level of 1/f noise. We infer and correct for a small ~1% transmission imbalance between the two sky inputs to each differential radiometer, and we subtract a small sidelobe correction from the 23 GHz (K band) map prior to further analysis. No other systematic error corrections are applied to the data. Calibration and baseline artifacts, including the response to environmental perturbations, are negligible. Systematic uncertainties are comparable to statistical uncertainties in the characterization of the beam response. Both are accounted for in the covariance matrix of the window function and are propagated to uncertainties in the final power spectrum. We characterize the combined upper limits to residual systematic uncertainties through the pixel covariance matrix.Comment: One of 13 companion papers on first-year WMAP results submitted to ApJ; 58 pages with 14 figures; a version with higher quality figures is at http://lambda.gsfc.nasa.gov/product/map/map_bibliography.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Author: Bohac Marek
Koldovsky Zbynek
Malek Jiri
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 11/12/2019
Field of study

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

arXiv.org e-Print Archive

DSpace@TUL

Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation

Author: Gómez García Ángel Manuel
López Espejo Iván
Martín Doñas Juan M.
Peinado Herreros Antonio Miguel
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish Ministry of Education through the National Program FPU under Grant FPU15/0416

Multidisciplinary Digital Publishing Institute

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada

VBN