Article thumbnail

A syllable based approach for improved recognition of spoken names

By Abhinav Sethy and Shrikanth Narayanan


Recognition of spoken names has traditionally been a difficult task for speech recognition systems because of the large variations in speaking styles, linguistic origins and pronunciation found in names. The linguistic nature of names makes it difficult to automatically generate pronunciation variations. For many applications the list of names tends to be in the order of several thousand names, making spoken name recognition a high perplexity task. Use of multiple pronunciations to account for the variations in names further increases the perplexity of the recognition system substantially. In this paper we propose the use of the syllable as the acoustic unit for spoken name recognition and show how pronunciation variation modeling by syllables can help in improving recognition performance and reducing the system perplexity. We present results comparing systems using context dependent phones with syllable based systems, and demonstrate that a significant increase in recognition accuracy and speed, can be achieved by using the syllable as the acoustic unit for spoken name recognition. With a Finite State Grammar network for spoken name recognition, the observed recognition error rate for the syllable-based system was 30 % less than the phoneme-based system. For a phone/syllable level bigram based recognition networks the observed recognition error rate for syllable-based system was about 40 % less than the phoneme system. 1

Year: 2002
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.