Synthetic voices in the foreign language context

Abstract

This study evaluated the voice of a modern English text-to-speech (TTS) system in an English as a foreign language (EFL) context in terms of its speech quality, ability to be understood by L2 users, and potential for focus on specific language forms. Twenty-nine Brazilian EFL learners listened to stories and sentences, produced by a TTS voice and a human voice, and rated them on a 6-point Likert scale according to holistic criteria for evaluating pronunciation: Comprehensibility, naturalness, and accuracy. In addition, they were asked to answer a set of comprehension questions (to assess understanding), to complete a dictation/transcription task to measure intelligibility, and to identify whether the target past -ed form was present or not in decontextualized sentences. Results indicate that the performance of both the TTS and human voices were perceived similarly in terms of comprehensibility, while ratings for naturalness were unfavorable for the synthesized voice. For text comprehension, dictation, and aural identification tasks, participants performed relatively similarly in response to both voices. These findings suggest that TTS systems have the potential to be used as pedagogical tools for L2 learning, particularly in EFL settings, where natural occurrence of the target language is limited or non-existent

    Similar works