14,213 research outputs found
Enhancement of adaptive de-correlation filtering separation model for robust speech recognition
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file.Title from title screen of research.pdf file (viewed on September 25, 2007)Vita.Thesis (Ph. D.) University of Missouri-Columbia 2007.The development of automatic speech recognition (ASR) technology has enabled an increasing number of applications. However, the robustness of ASR under real acoustic environments still remains to be a challenge for practical applications. Interfering speech and background noise have severe degrading effects on ASR. Speech source separation separates target speech from interfering speech but its performance is affected by adverse environmental conditions of acoustical reverberation and background noise. This dissertation works on the enhancement of a speech source separation technique, namely adaptive decorrelation filtering (ADF), for robust ASR applications. To overcome these difficulties and develop practical ADF speech separation algorithms for robust ASR, improvements are introduced in several aspects. From the perspectives of speech spectral characteristics, prewhitening procedures are applied to flatten the long-term speech spectrum to improve adaptation robustness and decrease ADF estimation error. To speedup convergence rate, block-iterative implementation and variable step-size (VSS) methods are proposed. To exploit scenarios where multiple pairs of sensors are available, multi-ADF postprocessing is developed. To overcome the limitations of ADF separation model under background noise, procedures of noise-compensation (NC) and adaptive speech enhancement are proposed for the achievement of improved robustness in diffuse noise. Speech separation simulations and speech recognition experiments are carried out based on TIMIT database and ATR acoustic measurement database. Evaluations of the methods presented in this dissertation demonstrate significant improvement of performances over baseline ADF algorithm in speech separation and recognition.Includes bibliographical reference
Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition
In this paper, a modification to the training process of the popular SPLICE
algorithm has been proposed for noise robust speech recognition. The
modification is based on feature correlations, and enables this stereo-based
algorithm to improve the performance in all noise conditions, especially in
unseen cases. Further, the modified framework is extended to work for
non-stereo datasets where clean and noisy training utterances, but not stereo
counterparts, are required. Finally, an MLLR-based computationally efficient
run-time noise adaptation method in SPLICE framework has been proposed. The
modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of
Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93%
absolute improvements over Aurora-2 and Aurora-4 baseline models respectively.
Run-time adaptation shows 9.89% absolute improvement in modified framework as
compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR
adaptation on HMMs.Comment: Submitted to Automatic Speech Recognition and Understanding (ASRU)
2013 Worksho
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
Deep Denoising for Hearing Aid Applications
Reduction of unwanted environmental noises is an important feature of today's
hearing aids (HA), which is why noise reduction is nowadays included in almost
every commercially available device. The majority of these algorithms, however,
is restricted to the reduction of stationary noises. In this work, we propose a
denoising approach based on a three hidden layer fully connected deep learning
network that aims to predict a Wiener filtering gain with an asymmetric input
context, enabling real-time applications with high constraints on signal delay.
The approach is employing a hearing instrument-grade filter bank and complies
with typical hearing aid demands, such as low latency and on-line processing.
It can further be well integrated with other algorithms in an existing HA
signal processing chain. We can show on a database of real world noise signals
that our algorithm is able to outperform a state of the art baseline approach,
both using objective metrics and subject tests.Comment: submitted to IWAENC 201
Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition
In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
- …