4,067 research outputs found
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
TasNet: time-domain audio separation network for real-time, single-channel speech separation
Robust speech processing in multi-talker environments requires effective
speech separation. Recent deep learning systems have made significant progress
toward solving this problem, yet it remains challenging particularly in
real-time, short latency applications. Most methods attempt to construct a mask
for each source in time-frequency representation of the mixture signal which is
not necessarily an optimal representation for speech separation. In addition,
time-frequency decomposition results in inherent problems such as
phase/magnitude decoupling and long time window which is required to achieve
sufficient frequency resolution. We propose Time-domain Audio Separation
Network (TasNet) to overcome these limitations. We directly model the signal in
the time-domain using an encoder-decoder framework and perform the source
separation on nonnegative encoder outputs. This method removes the frequency
decomposition step and reduces the separation problem to estimation of source
masks on encoder outputs which is then synthesized by the decoder. Our system
outperforms the current state-of-the-art causal and noncausal speech separation
algorithms, reduces the computational cost of speech separation, and
significantly reduces the minimum required latency of the output. This makes
TasNet suitable for applications where low-power, real-time implementation is
desirable such as in hearable and telecommunication devices.Comment: Camera ready version for ICASSP 2018, Calgary, Canad
Blind image separation based on exponentiated transmuted Weibull distribution
In recent years the processing of blind image separation has been
investigated. As a result, a number of feature extraction algorithms for direct
application of such image structures have been developed. For example,
separation of mixed fingerprints found in any crime scene, in which a mixture
of two or more fingerprints may be obtained, for identification, we have to
separate them. In this paper, we have proposed a new technique for separating a
multiple mixed images based on exponentiated transmuted Weibull distribution.
To adaptively estimate the parameters of such score functions, an efficient
method based on maximum likelihood and genetic algorithm will be used. We also
calculate the accuracy of this proposed distribution and compare the
algorithmic performance using the efficient approach with other previous
generalized distributions. We find from the numerical results that the proposed
distribution has flexibility and an efficient resultComment: 14 pages, 12 figures, 4 tables. International Journal of Computer
Science and Information Security (IJCSIS),Vol. 14, No. 3, March 2016 (pp.
423-433
- …