An Independent Assessment of Phonetic Distinctive Feature Sets used to Model Pronunciation Variation

Abstract

Thesis (Master's)--University of Washington, 2014It has been consistently shown that Automatic Speech Recognition (ASR) performance on casual, spontaneous speech is much worse than on carefully planned or read speech by as much as double the word error rate, and that variation in pronunciation is the main reason for this degradation of performance. Thus far, any attempts to mitigate this have fallen well below expectations. Phonetic Distinctive Features show promise from a theoretical standpoint, but have thus far not been fully incorporated into an end-to-end ASR system. Work incorporating distinctive features into ASR is widespread and varied, and each project uses a unique set of features based on the authors' linguistic intuitions, so the results of these experiments cannot be fully and fairly compared. In this work, I attempt to determine which style of distinctive feature set is best suited to model pronunciation variation in ASR based on measures of surface phone prediction accuracy and efficiency of the decision tree model. Using a non-exhaustive, representative set of phonetic distinctive feature sets, decision trees were trained, one per canonical base form phone, under two experimental conditions: words in isolation, and words in sequence. These models were tested against a comparable held-out test set, and an additional data set of canonical pronunciations used to simulate formal speech. It was found that a multi-valued articulatory-based feature set provided a far more compact model that yielded comparable accuracy results, while in a comparison of binary feature sets, the model with feature redundancy provided a far more robust model, with slightly higher accuracy and, where it predicted an incorrect phone, it was closer to the actual gold standard phone than the other feature sets' predictions

    Similar works