A deep learning approach to assessing non-native pronunciation of English using phone distances

Gales, MJF; Knill, KM; Kyriakopoulos, K

A deep learning approach to assessing non-native pronunciation of English using phone distances

Authors: MJF Gales
KM Knill
K Kyriakopoulos
Publication date: 1 January 2018
Publisher: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Doi

Abstract

The way a non-native speaker pronounces the phones of a language is an important predictor of their proficiency. In grading spontaneous speech, the pairwise distances between generative statistical models trained on each phone have been shown to be powerful features. This paper presents a deep learning alternative to model-based phone distances in the form of a tunable Siamese network feature extractor to extract distance metrics directly from the audio frame sequence. Features are extracted at the phone instance level and combined to phone-level representations using an attention mechanism. Pair-wise distances between phone features are then projected through a feed-forward layer to predict score. The extraction stage is initialised on either a binary phone instance-pair classification task, or to mimic the model-based features, then the whole system is fine-tuned end-to-end, optimising the learning of the distance metric to the score prediction task. This method is therefore more adaptable and more sensitive to phone instance level phenomena. Its performance is compared agains

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Apollo (Cambridge)

oai:www.repository.cam.ac.uk:1...

Last time updated on 12/01/2019