Extracting pronunciation rules for phonemic variants


Various automated techniques can be used to generalise from phonemic lexicons through the extraction of grapheme-to-phoneme rule sets. These techniques are particularly useful when developing pronunciation models for previously unmodelled languages: a frequent requirement when developing multilingual speech processing systems. However, many of the learning algorithms (such as Dynamically Expanding Context or Default&Refine) experience difficulty in accommodating alternate pronunciations that occur in the training lexicon. In this paper we propose an approach for the incorporation of phonemic variants in a typical instancebased learning algorithm, Default&Refine. We investigate the use of a combined ‘pseudo-phoneme ’ associated with a set of ‘generation restriction rules ’ to model those phonemes that are consistently realised as two or more variants in the training lexicon. We evaluate the effectiveness of this approach using the Oxford Advanced Learners Dictionary, a publicly available English pronunciation lexicon. We find that phonemic variation exhibits sufficient regularity to be modelled through extracted rules, and that acceptable variants may be underrepresented in the studied lexicon. The proposed method is applicable to many approaches besides the Default&Refine algorithm, and provides a simple but effective technique for including phonemic variants in grapheme-to-phoneme rule extraction frameworks. 1

Similar works

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.