Search CORE

36 research outputs found

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Author: Bohac Marek
Koldovsky Zbynek
Malek Jiri
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 11/12/2019
Field of study

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

arXiv.org e-Print Archive

DSpace@TUL

Comparison of Binaural RTF-Vector-Based Direction of Arrival Estimation Methods Exploiting an External Microphone

Author: Doclo Simon
Fejgin Daniel
Publication venue
Publication date: 11/04/2021
Field of study

In this paper we consider a binaural hearing aid setup, where in addition to the head-mounted microphones an external microphone is available. For this setup, we investigate the performance of several relative transfer function (RTF) vector estimation methods to estimate the direction of arrival(DOA) of the target speaker in a noisy and reverberant acoustic environment. More in particular, we consider the state-of-the-art covariance whitening (CW) and covariance subtraction (CS) methods, either incorporating the external microphone or not, and the recently proposed spatial coherence (SC) method, requiring the external microphone. To estimate the DOA from the estimated RTF vector, we propose to minimize the frequency-averaged Hermitian angle between the estimated head-mounted RTF vector and a database of prototype head-mounted RTF vectors. Experimental results with stationary and moving speech sources in a reverberant environment with diffuse-like noise show that the SC method outperforms the CS method and yields a similar DOA estimation accuracy as the CW method at a lower computational complexity.Comment: Submitted to EUSIPCO 202

arXiv.org e-Print Archive

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

Author: Serizel Romain
Vincent Emmanuel
Wang Ziteng
Yan Yonghong
Publication venue
Publication date: 14/11/2017
Field of study

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance. In this paper, we present an experimental study on these linear filters in a specific speech recognition task, namely the CHiME-4 challenge, which features real recordings in multiple noisy environments. Specifically, the rank-1 MWF is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance. To fulfill the underlying rank-1 assumption, the speech covariance matrix is reconstructed based on eigenvectors or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with alternative multichannel linear filters under the same framework, which involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask estimation. The proposed filter outperforms alternative ones, leading to a 40% relative Word Error Rate (WER) reduction compared with the baseline Weighted Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER reduction compared with the GEV-BAN method. The results also suggest that the speech recognition accuracy correlates more with the Mel-frequency cepstral coefficients (MFCC) feature variance than with the noise reduction or the speech distortion level.Comment: for Computer Speech and Languag

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Optimal Binaural LCMV Beamforming in Complex Acoustic Scenarios: Theoretical and Practical Insights

Author: Doclo S.
Gößling N.
Marquardt D.
Merks I.
Zhang T.
Publication venue
Publication date: 12/07/2018
Field of study

Binaural beamforming algorithms for head-mounted assistive listening devices are crucial to improve speech quality and speech intelligibility in noisy environments, while maintaining the spatial impression of the acoustic scene. While the well-known BMVDR beamformer is able to preserve the binaural cues of one desired source, the BLCMV beamformer uses additional constraints to also preserve the binaural cues of interfering sources. In this paper, we provide theoretical and practical insights on how to optimally set the interference scaling parameters in the BLCMV beamformer for an arbitrary number of interfering sources. In addition, since in practice only a limited temporal observation interval is available to estimate all required beamformer quantities, we provide an experimental evaluation in a complex acoustic scenario using measured impulse responses from hearing aids in a cafeteria for different observation intervals. The results show that even rather short observation intervals are sufficient to achieve a decent noise reduction performance and that a proposed threshold on the optimal interference scaling parameters leads to smaller binaural cue errors in practice.Comment: To appear in Proc. IWAENC 201

arXiv.org e-Print Archive

User-Symbiotic Speech Enhancement for Hearing Aids

Author: Hoang Poul
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2022
Field of study

VBN

Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation

Author: Gómez García Ángel Manuel
López Espejo Iván
Martín Doñas Juan M.
Peinado Herreros Antonio Miguel
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish Ministry of Education through the National Program FPU under Grant FPU15/0416

Multidisciplinary Digital Publishing Institute

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada

VBN