30,151 research outputs found

    Maximum-likelihood estimation of delta-domain model parameters from noisy output signals

    Get PDF
    Fast sampling is desirable to describe signal transmission through wide-bandwidth systems. The delta-operator provides an ideal discrete-time modeling description for such fast-sampled systems. However, the estimation of delta-domain model parameters is usually biased by directly applying the delta-transformations to a sampled signal corrupted by additive measurement noise. This problem is solved here by expectation-maximization, where the delta-transformations of the true signal are estimated and then used to obtain the model parameters. The method is demonstrated on a numerical example to improve on the accuracy of using a shift operator approach when the sample rate is fast

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion
    • ā€¦
    corecore