4 research outputs found

    Continuous Density Hidden Markov Model for Hindi Speech Recognition

    Get PDF
    State of the art automatic speech recognitionsystem uses Mel frequency cepstral coefficients as featureextractor along with Gaussian mixture model for acousticmodeling but there is no standard value to assign number ofmixture component in speech recognition process.Currentchoice of mixture component is arbitrary with littlejustification. Also the standard set for European languagescan not be used in Hindi speech recognition due to mismatchin database size of the languages.Parameter estimation withtoo many or few component may inappropriately estimatethe mixture model. Therefore, number of mixture isimportant for initial estimation of expectation maximizationprocess. In this research work, the authors estimate numberof Gaussian mixture component for Hindi database basedupon the size of vocabulary.Mel frequency cepstral featureand perceptual linear predictive feature along with itsextended variations with delta-delta-delta feature have beenused to evaluate this number based on optimal recognitionscore of the system . Comparitive analysis of recognitionperformance for both the feature extraction methods onmedium size Hindi database is also presented in thispaper.HLDA has been used as feature reduction techniqueand also its impact on the recognition score has beenhighlighted

    ACOUSTIC-PHONETIC FEATURE BASED DIALECT IDENTIFICATION IN HINDI SPEECH

    Full text link

    cROVER: Context-augmented Speech Recognizer based on Multi-Decoders' Output

    Get PDF
    The growing need for designing and implementing reliable voice-based human-machine interfaces has inspired intensive research work in the field of voice-enabled systems, and greater robustness and reliability are being sought for those systems. Speech recognition has become ubiquitous. Automated call centers, smart phones, dictation and transcription software are among the many systems currently being designed and involving speech recognition. The need for highly accurate and optimized recognizers has never been more crucial. The research community is very actively involved in developing powerful techniques to combine the existing feature extraction methods for a better and more reliable information capture from the analog signal, as well as enhancing the language and acoustic modeling procedures to better adapt for unseen or distorted speech signal patterns. Most researchers agree that one of the most promising approaches for the problem of reducing the Word Error Rate (WER) in large vocabulary speech transcription, is to combine two or more speech recognizers and then generate a new output, in the expectation that it provides a lower error rate. The research work proposed here aims at enhancing and boosting even further the performance of the well-known Recognizer Output Voting Error Reduction (ROVER) combination technique. This is done through its integration with an error filtering approach. The proposed system is referred to as cROVER, for context-augmented ROVER. The principal idea is to flag erroneous words following the combination of the word transition networks through a scanning process at each slot of the resulting network. This step aims at eliminating some transcription errors and thus facilitating the voting process within ROVER. The error detection technique consists of spotting semantic outliers in a given decoder's transcription output. Due to the fact that most error detection techniques suffer from a high false positive rate, we propose to combine the error filtering techniques to compensate for the poor performance of each of the individual error classifiers. Experimental results, have shown that the proposed cROVER approach is able to reduce the relative WER by almost 10% through adequate combination of speech decoders. The approaches proposed here are generic enough to be used by any number of speech decoders and with any type of error filtering technique. A novel voting mechanism has also been proposed. The new confidence-based voting scheme has been inspired from the cROVER approach. The main idea consists of using the confidence scores collected from the contextual analysis, during the scoring of each word in the transition network. The new voting scheme outperformed ROVER's original voting, by up to 16% in terms of relative WER reduction
    corecore