681 research outputs found

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    Speech Enhancement Using Modulation-Domain Kalman Filtering with Active Speech Level Normalized Log-Spectrum Global Priors

    Get PDF
    We describe a single-channel speech enhancement algorithm that is based on modulation-domain Kalman filtering that tracks the inter-frame time evolution of the speech logpower spectrum in combination with the long-term average speech log-spectrum. We use offline-trained log-power spectrum global priors incorporated in the Kalman filter prediction and update steps for enhancing noise suppression. In particular, we train and utilize Gaussian mixture model priors for speech in the log-spectral domain that are normalized with respect to the active speech level. The Kalman filter update step uses the log-power spectrum global priors together with the local priors obtained from the Kalman filter prediction step. The logspectrum Kalman filtering algorithm, which uses the theoretical phase factor distribution and improves the modeling of the modulation features, is evaluated in terms of speech quality. Different algorithm configurations, dependent on whether global priors and/or Kalman filter noise tracking are used, are compared in various noise types

    Adaptive Hidden Markov Noise Modelling for Speech Enhancement

    Get PDF
    A robust and reliable noise estimation algorithm is required in many speech enhancement systems. The aim of this thesis is to propose and evaluate a robust noise estimation algorithm for highly non-stationary noisy environments. In this work, we model the non-stationary noise using a set of discrete states with each state representing a distinct noise power spectrum. In this approach, the state sequence over time is conveniently represented by a Hidden Markov Model (HMM). In this thesis, we first present an online HMM re-estimation framework that models time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics by a sequential model update procedure that tracks the noise characteristics during the absence of speech. In addition the algorithm will when necessary create new model states to represent novel noise spectra and will merge existing states that have similar characteristics. We then extend our work in robust noise estimation during speech activity by incorporating a speech model into our existing noise model. The noise characteristics within each state are updated based on a speech presence probability which is derived from a modified Minima controlled recursive averaging method. We have demonstrated the effectiveness of our noise HMM in tracking both stationary and highly non-stationary noise, and shown that it gives improved performance over other conventional noise estimation methods when it is incorporated into a standard speech enhancement algorithm

    Exploration and Optimization of Noise Reduction Algorithms for Speech Recognition in Embedded Devices

    Get PDF
    Environmental noise present in real-life applications substantially degrades the performance of speech recognition systems. An example is an in-car scenario where a speech recognition system has to support the man-machine interface. Several sources of noise coming from the engine, wipers, wheels etc., interact with speech. Special challenge is given in an open window scenario, where noise of traffic, park noise, etc., has to be regarded. The main goal of this thesis is to improve the performance of a speech recognition system based on a state-of-the-art hidden Markov model (HMM) using noise reduction methods. The performance is measured with respect to word error rate and with the method of mutual information. The noise reduction methods are based on weighting rules. Least-squares weighting rules in the frequency domain have been developed to enable a continuous development based on the existing system and also to guarantee its low complexity and footprint for applications in embedded devices. The weighting rule parameters are optimized employing a multidimensional optimization task method of Monte Carlo followed by a compass search method. Root compression and cepstral smoothing methods have also been implemented to boost the recognition performance. The additional complexity and memory requirements of the proposed system are minimum. The performance of the proposed system was compared to the European Telecommunications Standards Institute (ETSI) standardized system. The proposed system outperforms the ETSI system by up to 8.6 % relative increase in word accuracy and achieves up to 35.1 % relative increase in word accuracy compared to the existing baseline system on the ETSI Aurora 3 German task. A relative increase of up to 18 % in word accuracy over the existing baseline system is also obtained from the proposed weighting rules on large vocabulary databases. An entropy-based feature vector analysis method has also been developed to assess the quality of feature vectors. The entropy estimation is based on the histogram approach. The method has the advantage to objectively asses the feature vector quality regardless of the acoustic modeling assumption used in the speech recognition system

    Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

    Get PDF
    An effective way to increase noise robustness in automatic speech recognition (ASR) systems is feature enhancement based on an analytical distortion model that describes the effects of noise on the speech features. One of such distortion models that has been reported to achieve a good trade-off between accuracy and simplicity is the masking model. Under this model, speech distortion caused by environmental noise is seen as a spectral mask and, as a result, noisy speech features can be either reliable (speech is not masked by noise) or unreliable (speech is masked). In this paper, we present a detailed overview of this model and its applications to noise robust ASR. Firstly, using the masking model, we derive a spectral reconstruction technique aimed at enhancing the noisy speech features. Two problems must be solved in order to perform spectral reconstruction using the masking model: (1) mask estimation, i.e. determining the reliability of the noisy features, and (2) feature imputation, i.e. estimating speech for the unreliable features. Unlike missing data imputation techniques where the two problems are considered as independent, our technique jointly addresses them by exploiting a priori knowledge of the speech and noise sources in the form of a statistical model. Secondly, we propose an algorithm for estimating the noise model required by the feature enhancement technique. The proposed algorithm fits a Gaussian mixture model to the noise by iteratively maximising the likelihood of the noisy speech signal so that noise can be estimated even during speech-dominating frames. A comprehensive set of experiments carried out on the Aurora-2 and Aurora-4 databases shows that the proposed method achieves significant improvements over the baseline system and other similar missing data imputation techniques
    corecore