10 research outputs found

    State dependent feature component selection for noise robust ASR

    No full text
    The acoustic environment in which speech is recorded has a strong influence on the statistical distributions of observed acoustic features. In order to make ASR insensitive to noise it is crucial that these distributions are similar in the training and testing condition. Mostly, it is attempted to compensate for the impact of noise by estimating the noise characteristics from the signal. In this paper we explore the feasibility of a new method to increase noise robustness: We try to exploit a priori knowledge stored in clean speech models. Using Mel bank log-energy features, recognition is done by ignoring the model components for features that contained little energy during training. This strategy aims at recognition results that are determined more strongly by the match in the high-energy rather than by the mismatch in the low-energy model components. Application of the new method to clean speech data confirms that discarding components below a certain energy threshold does not deteriorate recognition performance. Experiments with noisy data, however, show that performance gains are relatively small. This paper explains why that is the case and why, despite the limited success, the outcomes suggest that the method still could prove to be a valuable addition to data-driven methods like (bounded) marginalisation

    Acoustic Backing-Off In The Local Distance Computation For Robust Automatic Speech Recognition

    No full text
    In this paper we propose to introduce backing-off in the acoustic contributions of the local distance functions used during Viterbi decoding as an operationalisation of missing feature theory for increased recognition robustness. Acoustic backing-off effectively removes the detrimental influence of outlier values from the local decisions in the Viterbi algorithm. It does so without the need for prior knowledge that specific features are missing. Acoustic backing-off avoids any kind of explicit outlier detection. This paper provides a proof of concept of acoustic backing-off in the context of connected digit recognition over the telephone, using artificial distortions of the acoustic observations. It is shown that the word error rate can be maintained at the level of 2:5% obtained for undisturbed features, even in the case where a conventional local distance computation without backing-off leads to a word error rate ? 80:0%. The approach appears to be able to handle up to four independe..

    Acoustic Backing-off as an implementation of missing feature theory

    Get PDF
    Contains fulltext : 75056.pdf (author's version ) (Open Access)19 p

    Acoustic Backing-Off As An Implementation Of Missing Feature Theory

    No full text
    In this paper, we discuss acoustic backing-off as a method to improve automatic speech recognition robustness. Acoustic backing-off aims to achieve the same objective as the marginalization approach of Missing Feature Theory: The detrimental influence of outlier values is effectively removed from the local distance computation in the Viterbi algorithm. The proposed method is based on one of the principles of Robust Statistical Pattern Matching: During recognition the local distance function is modeled using a mixture of the distribution observed during training and a distribution describing observations not previously seen. In order to asses the effectiveness of the new method we used artificial distortions of the acoustic vectors in connected digit recognition over telephone lines. We found that acoustic backing-off is capable of restoring recognition performance almost to the level observed for the undisturbed features, even in cases where a conventional local distance function completely fails. These results show that recognition robustness can be improved using a marginalization approach where making the distinction between reliable and corrupted feature values is wired into the recognition process. In addition, the results show that application of acoustic backing-off is not limited to feature representations based on filter bank outputs

    Acoustic Backing-Off As An Implementation Of Missing Feature Theory

    No full text
    Acoustic backing-off was recently proposed as an operationalisation of missing feature theory for increased recognition robustness. Acoustic backing-off effectively removes the detrimental influence of outlier values from the local decisions in the Viterbi algorithm without any kind of explicit outlier detection. In the context of connected digit recognition over telephone lines, it is shown that with more than 30% of the static mel-frequency cepstral coefficients disturbed, acoustic backing-off is capable of reducing the word error rate by one order of magnitude. Furthermore, our results indicate that the effectiveness of acoustic backing-off is optimal when dispersion of distortions due to acoustic feature transformations is minimal. 1. INTRODUCTION Recently, it was shown that missing feature theory can be used for improved robustness of automatic speech recognition (ASR) systems [1], [2]. According to missing feature theory, recognition performance in adverse conditions can be mai..

    Acoustic Pre-Processing For Optimal Effectivity Of Missing Feature Theory

    No full text
    In this paper we investigate acoustic backing-off as an operationalization of Missing Feature Theory with the aim to increase recognition robustness. Acoustic backing-off effectively diminishes the detrimental influence of outlier values by using a new model of the probability density function of the feature values. The technique avoids the need for explicit outlier detection. Situations that are handled best by Missing Feature Theory are those where only part of the coefficients are disturbed and the rest of the vector is unaffected. Consequently, one may predict that acoustic feature representations that smear local spectrotemporal distortions over all feature vector elements are inherently less suitable for automatic speech recognition. Our experiments seem to confirm this prediction. Using additive band limited noise as a distortion and comparing four different types of feature representations, we found that the best recognition performance is obtained with recognizers that use acoustic backingoff and that operate on feature types that minimally smear the distortion

    Additive background noise as a source of non-linear Mismatch In The . . .

    No full text
    The aim of this investigation is to determine to what extent automatic speech recognition may be enhanced if, in addition to the linear compensation accomplished by mean and variance normalisation, a non-linear mismatch reduction technique is applied to the cepstral and energy features, respectively. An additional goal is to determine whether the degree of mismatch between the feature distributions of the training and test data that is associated with acoustic mismatch, di#ers for the cepstral and energy features. Towards these aims, two non-linear mismatch reduction techniques -- time domain noise reduction and histogram normalisation -- were evaluated on the Aurora2 digit recognition task as well as on a continuous speech recognition task with noisy test conditions similar to those in the Aurora2 experiments. The experimental results show that recognition performance is enhanced by the application of both non-linear mismatch reduction techniques. The best results are obtained when the two techniques are applied simultaneously. The results also reveal that the mismatch in the energy features is quantitatively and qualitatively much larger than the corresponding mismatch associated with the cepstral coe#cients. The most substantial gains in average recognition rate are therefore accomplished by reducing training-test mismatch for the energy features

    Analysis of Disturbed Acoustic Features

    No full text
    An analysis method was developed to study the impact of training-test mismatch due to the presence of additive noise. The contributions of individual observation vector components to the emission cost are determined in the matched and mismatched condition and histograms are computed for these contributions in each condition. Subsequently, a measure of mismatch is defined based on differences between the two histograms. By means of two illustrative experiments it is shown to what extent this emission cost mismatch measure can be used to identify the features that cause the most important mismatch and how in certain cases this type of information may be helpful to increase recognition accuracy by applying acoustic backing-off to selected features only. Some limitations of the approach are also discussed
    corecore