Disfluent speech synthesis is necessary in some applications such as automatic film dubbing or spoken translation. This paper presents a model for the generation of synthetic disfluent speech based on inserting each element of a disfluency in a context where they can be considered fluent. Prosody obtained by the application of standard techniques on these new sentences is used for the synthesis of the disfluent sentence. In addition, local modifications are applied to segmental units adjacent to disfluency elements. Experiments evidence that duration follows this behavior, what supports the feasibility of the model. Index Terms: speech synthesis, disfluent speech, prosody, disfluencies. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.