18 research outputs found

    Acoustic feature combination for speech recognition

    No full text
    In this thesis, the use of multiple acoustic features of the speech signal is considered for speech recognition. The goals of this thesis are twofold: on the one hand, new acoustic features are developed, on the other hand, feature combination methods are investigated in order to find an effective integration of the newly developed features into state-of-the-art speech recognition systems. The most commonly used feature extraction methods are the Mel Frequency Cepstrum Coefficients (MFCC), Perceptual Linear Prediction (PLP), and variations of these techniques. These methods are mainly based on the models of the human auditory system. A detailed review of the implementation of these features is presented in this thesis. There have also been attempts at using articulatory motivated acoustic features for speech recognition which are motivated by models of the human speech production system. This thesis focuses partially on the development of new articulatory motivated acoustic features. The voicing information is one of the most commonly used articulatory features. Three voicing extraction methods are presented in this work followed by a systematic comparison. Besides the analysis of the voicing feature, the novel spectrum derivative feature is introduced which aims to capture the differences between magnitude spectra produced by obstruent and sonant consonants. The articulatory motivated features are tested in combinations with state-of-the-art acoustic features based on auditory models mainly. The features are combined both directly using Linear Discriminant Analysis (LDA) as well as indirectly on model level using Discriminative Model Combination (DMC). Both methods have already been used successfully in automatic speech recognition systems. In this work, a comparative study is presented which describes and analyzes the application of these methods to feature combination. Robustness issues of the LDA based method are addressed which are induced by increasing the amount of acoustic features coefficients. An application of DMC to feature combination is introduced based on the splitting of the acoustic model into separate scalable knowledge sources. After the analysis of the individual methods, a comparison is carried out on the basis of the underlying acoustic emission models. Experimental results are presented for small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The combination of the Vocal Tract Length Normalized MFCC and articulatory motivated features demonstrates that additional articulatory information can even improve the performance of speaker adapted systems. The word error rate is reduced from 1.8% to 1.5% on the SieTill, a German digit string recognition task. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.2% on the VerbMobil II, a German large vocabulary conversational speech task, and from 14.1% to 13.5% on the European Parliament Plenary Sessions task

    Feature Combination using Linear Discriminant Analysis and its Pitfalls

    Get PDF

    Extraction Methods of Voicing Feature for Robust Speech Recognition

    Get PDF
    In this paper, three different voicing features are studied as additional acoustic features for continuous speech recognition. The harmonic product spectrum based feature is extracted in frequency domain while the autocorrelation and the average magnitude difference based methods work in time domain. The algorithms produce a measure of voicing for each time frame. The voicing measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis to choose the most relevant features. Experiments have been performed on small and large vocabulary tasks. The three different voicing measures combined with MFCCs resulted in similar improvements in word error rate: improvements of up to 14% on the smallvocabulary task and improvements of up to 6% on the largevocabulary task relative to using MFCC alone with the same overall number of parameters in the system

    Robust Speech Recognition Using A Voiced-Unvoiced Feature

    No full text
    In this paper, a voiced-unvoiced measure is used as acoustic feature for continuous speech recognition. The voiced-unvoiced measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis (LDA) to choose the most relevant features. Experiments were performed on the SieTill (German digit strings recorded over telephone line) and on the SPINE (English spontaneous speech under different simulated noisy environments) corpus. The additional voiced-unvoiced measure results in improvements in word error rate (WER) of up to 11% relative to using MFCC alone with the same overall number of parameters in the system

    Acoustic Feature Combination for Robust Speech Recognition

    No full text
    In this paper, we consider the use of multiple acoustic features of the speech signal for robust speech recognition. We investigate the combination of various auditory based (Mel Frequency Cepstrum Coefficients, Perceptual Linear Prediction, etc.) and articulatory based (voicedness) features. Features are combined by a Linear Discriminant Analysis based and by a log-linear model combination based techniques. We describe the two feature combination techniques and compare the experimental results. Experiments performed on the large-vocabulary task VerbMobil II (German conversational speech) show that the accuracy of automatic speech recognition systems can be improved by the combination of different acoustic features. 1
    corecore