This paper describes methods for annotating recorded speech with information hypothesised to be important for the pronunciation of words in discourse context. Annotation is structured into six hierarchically ordered tiers, each tier corresponding to a segmentally defined linguistic unit. Automatic methods are used to segment and annotate the respective annotation tiers. Decision tree models trained on annotation from elicited monologue showed a phoneme error rate of 9.91%, corresponding to a 55.25 % error reduction compared to using a canonical pronunciation representation from a lexicon for estimating the phonetic realisation
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.