Search CORE

4,067 research outputs found

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Author: Bohac Marek
Koldovsky Zbynek
Malek Jiri
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 11/12/2019
Field of study

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion

arXiv.org e-Print Archive

DSpace@TUL

TasNet: time-domain audio separation network for real-time, single-channel speech separation

Author: Luo Yi
Mesgarani Nima
Publication venue
Publication date: 17/04/2018
Field of study

Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, time-frequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs. This method removes the frequency decomposition step and reduces the separation problem to estimation of source masks on encoder outputs which is then synthesized by the decoder. Our system outperforms the current state-of-the-art causal and noncausal speech separation algorithms, reduces the computational cost of speech separation, and significantly reduces the minimum required latency of the output. This makes TasNet suitable for applications where low-power, real-time implementation is desirable such as in hearable and telecommunication devices.Comment: Camera ready version for ICASSP 2018, Calgary, Canad

arXiv.org e-Print Archive

Crossref

Algorithms for Source Separation - with Cocktail Party Applications

Author: Olsson Rasmus Kongsgaard
Publication venue
Publication date: 01/11/2007
Field of study

Online Research Database In Technology

Blind image separation based on exponentiated transmuted Weibull distribution

Author: Adam A. M.
El-aziz M. E. Abd
Farouk R. M.
Publication venue
Publication date: 11/05/2016
Field of study

In recent years the processing of blind image separation has been investigated. As a result, a number of feature extraction algorithms for direct application of such image structures have been developed. For example, separation of mixed fingerprints found in any crime scene, in which a mixture of two or more fingerprints may be obtained, for identification, we have to separate them. In this paper, we have proposed a new technique for separating a multiple mixed images based on exponentiated transmuted Weibull distribution. To adaptively estimate the parameters of such score functions, an efficient method based on maximum likelihood and genetic algorithm will be used. We also calculate the accuracy of this proposed distribution and compare the algorithmic performance using the efficient approach with other previous generalized distributions. We find from the numerical results that the proposed distribution has flexibility and an efficient resultComment: 14 pages, 12 figures, 4 tables. International Journal of Computer Science and Information Security (IJCSIS),Vol. 14, No. 3, March 2016 (pp. 423-433

arXiv.org e-Print Archive