1 research outputs found
Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi
Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one
exception: whether a schwa represented in the orthography is pronounced or
unpronounced (deleted). Previous work has attempted to predict schwa deletion
in a rule-based fashion using prosodic or phonetic analysis. We present the
first statistical schwa deletion classifier for Hindi, which relies solely on
the orthography as the input and outperforms previous approaches. We trained
our model on a newly-compiled pronunciation lexicon extracted from various
online dictionaries. Our best Hindi model achieves state of the art
performance, and also achieves good performance on a closely related language,
Punjabi, without modification.Comment: 4 pages, 1 figure. To be published in the 2020 Annual Conference of
the Association for Computational Linguistics (https://acl2020.org/