Skip to main content
Article thumbnail
Location of Repository

Automatic Extraction of Phonetically Rich Sentences from Large Text Corpus of Indian Languages

By Karunesh Arora, Sunita Arora, Kapil Verma and S S Agrawal


A set of phonetically rich sentences is a requirement for representing different speech units, to be used for developing Automatic Speech Recognition and Speech Synthesis Systems. Selecting such a set from a large text corpus without modifying the characteristics of the corpus is still a difficult task. A major concern in this process is to decide on what basis sentences must be chosen so that it covers all phonetic aspects of the language under study in a minimum possible size. This paper describes a simple process of automatically extracting such set of sentences from a large text corpus of a given Indian Language and also presents an algorithm for the process. The process discussed in this paper is language independent and works for most of the Indian Languages. The extent of success, in terms of phonetic richness of the sentences, achieved in the process is also discussed. 1

Year: 2009
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.