Comparison of speech parameterization techniques for Slovenian language

Abstract

Abstract-The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In modern speech recognition systems melfrequency cepstral coefficients (MFCC) or perceptual linear prediction coefficients (PLP) are the two main techniques used. MFCC method is known to give better results when audio recordings are of high quality (no background noise, quality microphone) whereas the PLP performs better when the quality of audio is poor. In an attempt to close the gap between the two methods some modifications to the original PLP method are presented. They are mainly based on using a modified melfilter bank with a number of filters resembling the number of spectral coefficients. In our work the effectiveness of proposed changes to PLP (RPLP features) were tested and compared against the MFCC and original PLP acoustic features. A number of 3-state HMM acoustic models were build using different acoustic feature setups (different filter banks, different number of filters) in order to assess which parameterization technique gives superior recognition accuracy. To achieve a more robust estimate of the recognition results when using various parameterizations three databases of different audio quality were used

    Similar works