606 research outputs found

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

    Knowledge-aided STAP in heterogeneous clutter using a hierarchical bayesian algorithm

    Get PDF
    This paper addresses the problem of estimating the covariance matrix of a primary vector from heterogeneous samples and some prior knowledge, under the framework of knowledge-aided space-time adaptive processing (KA-STAP). More precisely, a Gaussian scenario is considered where the covariance matrix of the secondary data may differ from the one of interest. Additionally, some knowledge on the primary data is supposed to be available and summarized into a prior matrix. Two KA-estimation schemes are presented in a Bayesian framework whereby the minimum mean square error (MMSE) estimates are derived. The first scheme is an extension of a previous work and takes into account the non-homogeneity via an original relation. {In search of simplicity and to reduce the computational load, a second estimation scheme, less complex, is proposed and omits the fact that the environment may be heterogeneous.} Along the estimation process, not only the covariance matrix is estimated but also some parameters representing the degree of \emph{a priori} and/or the degree of heterogeneity. Performance of the two approaches are then compared using STAP synthetic data. STAP filter shapes are analyzed and also compared with a colored loading technique

    Postfiltering Using Multichannel Spectral Estimation in Multispeaker Environments

    Get PDF
    This paper investigates the problem of enhancing a single desired speech source from a mixture of signals in multispeaker environments. A beamformer structure is proposed which combines a fixed beamformer with postfiltering. In the first stage, the fixed multiobjective optimal beamformer is designed to spatially extract the desired source by suppressing all other undesired sources. In the second stage, a multichannel power spectral estimator is proposed and incorporated in the postfilter, thus enabling further suppression capability. The combined scheme exploits both spatial and spectral characteristics of the signals. Two new multichannel spectral estimation methods are proposed for the postfiltering using, respectively, inner product and joint diagonalization. Evaluations using recordings from a real-room environment show that the proposed beamformer offers a good interference suppression level whilst maintaining a low-distortion level of the desired source

    An analysis of environment, microphone and data simulation mismatches in robust speech recognition

    Get PDF
    Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME- 3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of di↵erent noise environments, di↵erent numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on di↵erent noise environments and di↵erent microphones barely a↵ects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge, which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing

    Convolutive Blind Source Separation Methods

    Get PDF
    In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks
    • …
    corecore