1,731 research outputs found
Speech Enhancement using Hmm and Snmf(Os)
The speech enhancement is the process to enhance the speech signal by reducing the noise from the signal as well as improving the quality of the signal. The speech signal enhancement requires various techniques associated with the signal noise removal as well as the signal patch fixation in order to enhance the frequency of the speech signal. In this paper, we have proposed the new speech enhancement model for the speech enhancement with the amalgamation of the various speech processing techniques together. The proposed model is equipped with the Supervised sparse non-negative matrix factorization (S-SNMF) along with hidden markov model (HMM) and noise reducing filter to overcome the problem of the signal enhancement by reducing the missing values and by enhancing the signal on the weak points detected under the application of the HMM. The experimental results have proved the efficiency of the proposed model in comparison with the existing model. The improvement of nearly 50% has been recorded from the parameters of peak signal to noise ratio (PSNR), mean squared error (MSE), signal to noise ratio (SNR) etc
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
Denoising sound signals in a bioinspired non-negative spectro-temporal domain
The representation of sound signals at the cochlea and auditory cortical level has been studied as an alternative to classical analysis methods. In this work, we put forward a recently proposed feature extraction method called approximate auditory cortical representation, based on an approximation to the statistics of discharge patterns at the primary auditory cortex. The approach here proposed estimates a non-negative sparse coding with a combined dictionary of atoms. These atoms represent the spectro-temporal receptive fields of the auditory cortical neurons, and are calculated from the auditory spectrograms of clean signal and noise. The denoising is carried out on noisy signals by the reconstruction of the signal discarding the atoms corresponding to the noise. Experiments are presented using synthetic (chirps) and real data (speech), in the presence of additive noise. For the evaluation of the new method and its variants, we used two objective measures: the perceptual evaluation of speech quality and the segmental signal-to-noise ratio. Results show that the proposed method improves the quality of the signals, mainly under severe degradation.Fil: MartÃnez, César Ernesto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Goddard, J.. Universidad Autónoma Metropolitana; MéxicoFil: Di Persia, Leandro Ezequiel. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Nacional de Entre RÃos. Facultad de IngenierÃa; Argentin
- …