Search CORE

14,213 research outputs found

Enhancement of adaptive de-correlation filtering separation model for robust speech recognition

Author: Hu Rong, 1972-
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file.Title from title screen of research.pdf file (viewed on September 25, 2007)Vita.Thesis (Ph. D.) University of Missouri-Columbia 2007.The development of automatic speech recognition (ASR) technology has enabled an increasing number of applications. However, the robustness of ASR under real acoustic environments still remains to be a challenge for practical applications. Interfering speech and background noise have severe degrading effects on ASR. Speech source separation separates target speech from interfering speech but its performance is affected by adverse environmental conditions of acoustical reverberation and background noise. This dissertation works on the enhancement of a speech source separation technique, namely adaptive decorrelation filtering (ADF), for robust ASR applications. To overcome these difficulties and develop practical ADF speech separation algorithms for robust ASR, improvements are introduced in several aspects. From the perspectives of speech spectral characteristics, prewhitening procedures are applied to flatten the long-term speech spectrum to improve adaptation robustness and decrease ADF estimation error. To speedup convergence rate, block-iterative implementation and variable step-size (VSS) methods are proposed. To exploit scenarios where multiple pairs of sensors are available, multi-ADF postprocessing is developed. To overcome the limitations of ADF separation model under background noise, procedures of noise-compensation (NC) and adaptive speech enhancement are proposed for the achievement of improved robustness in diffuse noise. Speech separation simulations and speech recognition experiments are carried out based on TIMIT database and ATR acoustic measurement database. Evaluations of the methods presented in this dissertation demonstrate significant improvement of performances over baseline ADF algorithm in speech separation and recognition.Includes bibliographical reference

University of Missouri: MOspace

Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition

Author: Joshi Vikas
Kumar D. S. Pavan
Prasad N. Vishnu
Umesh S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/07/2013
Field of study

In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. Finally, an MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed. The modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93% absolute improvements over Aurora-2 and Aurora-4 baseline models respectively. Run-time adaptation shows 9.89% absolute improvement in modified framework as compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR adaptation on HMMs.Comment: Submitted to Automatic Speech Recognition and Understanding (ASRU) 2013 Worksho

arXiv.org e-Print Archive

Crossref

A Subband-Based SVM Front-End for Robust ASR

Author: Ager Matthew
Cvetkovic Zoran
Sollich Peter
Yousafzai Jibran
Publication venue
Publication date: 24/12/2013
Field of study

This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

arXiv.org e-Print Archive

King's Research Portal

Deep Denoising for Hearing Aid Applications

Author: Aubreville Marc
Ehrensperger Kai
Graf Benjamin
Maier Andreas
Puder Henning
Rosenkranz Tobias
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/05/2018
Field of study

Reduction of unwanted environmental noises is an important feature of today's hearing aids (HA), which is why noise reduction is nowadays included in almost every commercially available device. The majority of these algorithms, however, is restricted to the reduction of stationary noises. In this work, we propose a denoising approach based on a three hidden layer fully connected deep learning network that aims to predict a Wiener filtering gain with an asymmetric input context, enabling real-time applications with high constraints on signal delay. The approach is employing a hearing instrument-grade filter bank and complies with typical hearing aid demands, such as low latency and on-line processing. It can further be well integrated with other algorithms in an existing HA signal processing chain. We can show on a database of real world noise signals that our algorithm is able to outperform a state of the art baseline approach, both using objective metrics and subject tests.Comment: submitted to IWAENC 201

arXiv.org e-Print Archive

Crossref

Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition

Author: Alimohad Abdennour
Bouridane Ahmed
Guessoum Abderrezak
Publication venue: 'MDPI AG'
Publication date: 01/10/2014
Field of study

In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features

Multidisciplinary Digital Publishing Institute

Northumbria Research Link

Directory of Open Access Journals

PubMed Central