14 research outputs found

    On a model-robust training method for speech recognition

    Full text link

    An inequality for rational functions with applications to some statistical estimation problems

    No full text

    Automatic phonetic baseform determination

    No full text
    Phonetic baseforms are the basic recognition units in most large vocabulary speech recognition systems. These base-forms are usually determined by hand once a vocabulary is chosen and not modified thereafter. However, many applica-tions of speech recognition, such as dictation transcription, are hampered by a fixed vocabulary and require the user be able to add new words to the vocabulary. At least one phonetic base-form must be assigned to each new word to properly integrate the word into the recognition system. Dictionary lookup is of-ten unsuccessful in determining a phonetic baseform because new words are often names or task-specific jargon; also, talk-ers tend to have idiosyncratic pronunciations for a substantial fraction of words. This paper describes a series of experiments in which the phonetic baseform is deduced automatically for new words by utilizing actual utterances of the new word in conjunction with a set of automatically derived spelling-to-sound rules. We evaluated recognition performance on new words spoken by two different talkers when the phonetic base-forms were extracted via the above approach. The error rates on these new words were found to be comparable to or better than when the phonetic baseforms were derived by hand, thus validating the basic approach.

    Large Vocabulary Natural Language Continuous Speech Recognition

    No full text
    The present paper describes our current research on automatic speech recognition of continuously read sentences from a naturally-occurring corpus: office correspondence. The recognition system combines features from our current isolated-word recognition system and from our previously developed continuous speech recognition systems. It consists of an acoustic processor, an acoustic channel model, a language model, and a linguistic decoder. Some new features in the recognizer relative to our isolated-word speech recognition system include the use of a fast match to rapidly prune to a manageable number the candidates considered by the detailed match, multiple pronunciations of all function words, and modelling of interphone coarticulatory behavior. To date, we have recorded training and test data from a set of 10 male talkers. The test data consist of 50 sentences drawn from spontaneously generated memos covered by a 5000 word vocabulary. The perplexity of the test sentences was found to be 93; none of the sentences were part of the data used to generate the language model. Preliminary (speaker-dependent) recognition results on these talkers yielded an average word error rate of 11.0%
    corecore