Search CORE

606 research outputs found

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Author: Bohac Marek
Koldovsky Zbynek
Malek Jiri
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 11/12/2019
Field of study

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

arXiv.org e-Print Archive

DSpace@TUL

Knowledge-aided STAP in heterogeneous clutter using a hierarchical bayesian algorithm

Author: Besson Olivier
Bidon Stéphanie
Tourneret Jean-Yves
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2011
Field of study

This paper addresses the problem of estimating the covariance matrix of a primary vector from heterogeneous samples and some prior knowledge, under the framework of knowledge-aided space-time adaptive processing (KA-STAP). More precisely, a Gaussian scenario is considered where the covariance matrix of the secondary data may differ from the one of interest. Additionally, some knowledge on the primary data is supposed to be available and summarized into a prior matrix. Two KA-estimation schemes are presented in a Bayesian framework whereby the minimum mean square error (MMSE) estimates are derived. The first scheme is an extension of a previous work and takes into account the non-homogeneity via an original relation. {In search of simplicity and to reduce the computational load, a second estimation scheme, less complex, is proposed and omits the fact that the environment may be heterogeneous.} Along the estimation process, not only the covariance matrix is estimated but also some parameters representing the degree of \emph{a priori} and/or the degree of heterogeneity. Performance of the two approaches are then compared using STAP synthetic data. STAP filter shapes are analyzed and also compared with a colored loading technique

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Real-Time Dual-Microphone Speech Enhancement

Author: Boyer François-Raymond
Savaria Yvon
Trabelsi Abdelaziz
Publication venue: 'IntechOpen'
Publication date: 14/03/2012
Field of study

IntechOpen

Postfiltering Using Multichannel Spectral Estimation in Multispeaker Environments

Author: Dam Hai
Dam Hai Huyen Heidi
Low Siow Yong
Nordholm Sven
Publication venue: 'Hindawi Limited'
Publication date: 01/08/2007
Field of study

This paper investigates the problem of enhancing a single desired speech source from a mixture of signals in multispeaker environments. A beamformer structure is proposed which combines a fixed beamformer with postfiltering. In the first stage, the fixed multiobjective optimal beamformer is designed to spatially extract the desired source by suppressing all other undesired sources. In the second stage, a multichannel power spectral estimator is proposed and incorporated in the postfilter, thus enabling further suppression capability. The combined scheme exploits both spatial and spectral characteristics of the signals. Two new multichannel spectral estimation methods are proposed for the postfiltering using, respectively, inner product and joint diagonalization. Evaluations using recordings from a real-room environment show that the proposed beamformer offers a good interference suppression level whilst maintaining a low-distortion level of the desired source

Southampton (e-Prints Soton)

Crossref

Directory of Open Access Journals

espace@Curtin

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Author: Barker J.
Marxer R.
Nugraha A.A.
Vincent E.
Watanabe S.
Publication venue: 'Elsevier BV'
Publication date: 18/11/2016
Field of study

Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME- 3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of di↵erent noise environments, di↵erent numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on di↵erent noise environments and di↵erent microphones barely a↵ects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge, which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing

Crossref

INRIA a CCSD electronic archive server

White Rose Research Online

HAL-Rennes 1

Convolutive Blind Source Separation Methods

Author: Kjems Ulrik
Larsen Jan
Parra Lucas C.
Pedersen Michael Syskind
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks

CiteSeerX

Online Research Database In Technology

Enhancement of Periodic Signals:with Application to Speech Signals

Author: Jensen Jesper Rindom
Publication venue
Publication date: 01/01/2012
Field of study

VBN