Article thumbnail

Disfluency Insertion for Spontaneous TTS: Formalization and Proof of Concept

By Raheel Qader, Gwénolé Lecorvé, Damien Lolive and Pascale Sébillot


International audienceThis paper presents an exploratory work to automatically insert disfluencies in text-to-speech (TTS) systems. The objective is to make TTS more spontaneous and expressive. To achieve this, we propose to focus on the linguistic level of speech through the insertion of pauses, repetitions and revisions. We formalize the problem as a theoretical process, where transformations are iteratively composed. This is a novel contribution since most of the previous work either focus on the detection or cleaning of linguistic disfluencies in speech transcripts, or solely concentrate on acoustic phenomena in TTS, especially pauses. We present a first implementation of the proposed process using conditional random fields and language models. The objective and perceptual evalation conducted on an English corpus of spontaneous speech show that our proposition is effective to generate disfluencies, and highlights perspectives for future improvements

Topics: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Publisher: 'Springer Science and Business Media LLC'
Year: 2018
DOI identifier: 10.1007/978-3-030-00810-9_4
OAI identifier: oai:HAL:hal-01840798v1

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles