Optimized Prediction of Fluency of L2 English Based on Interpretable Network Using Quantity of Phonation and Quality of Pronunciation

Abstract

This paper presents results of a joint project between an engineering team of a university and an educational team of another to develop an online fluency assessment system for Japanese learners of English. A picture description corpus of English spoken by 90 learners and 10 native speakers was used, where fluency was rated by other 10 native raters for each speaker manually. The assessment system was built to predict the averaged manual scores. For system development, a special focus was put on two separate purposes. The assessment system was trained in such an analytical way that teachers can know and discuss which speech features contribute more to fluency prediction, and in such a technical way that teachers' knowledge can be involved for training the system, which can be further optimized using an interpretable network. Experiments showed that quality-of-pronunciation features are much more helpful than quantity-of-phonation features, and the optimized system reached an extremely high correlation of 0.956 with the averaged manual scores, which is higher than the maximum of inter-rater correlations (0.910)

    Similar works