2 research outputs found

    A review of state-of-the-art speech modelling methods for the parameterisation of expressive synthetic speech

    Get PDF
    This document will review a sample of available voice modelling and transformation techniques, in view of an application in expressive unit-selection based speech synthesis in the framework of the PAVOQUE project. The underlying idea is to introduce some parametric modification capabilities at the level of the synthesis system, in order to compensate for the sparsity and rigidity, in terms of available emotional speaking styles, of the databases used to define speech synthesis voices. For this work, emotion-related parametric modifications will be restricted to the domains of voice quality and prosody, as suggested by several reviews addressing the vocal correlates of emotions (Schröder, 2001; Schröder, 2004; Roehling et al., 2006). The present report will start with a review of some techniques related to voice quality modelling and modification. First, it will explore the techniques related to glottal flow modelling. Then, it will review the domain of cross-speaker voice transformations, in view of a transposition to the domain of cross-emotion voice transformations. This topic will be exposed from the perspective of the parametric spectral modelling of speech and then from the perspective of available spectral transformation techniques. Then, the domain of prosodic parameterisation and modification will be reviewed

    An Enhanced ABS/OLA Sinusoidal Model For Waveform Synthesis In TTS

    No full text
    This paper describes a method for text-to-speech waveform synthesis based on the Analysis-by-Synthesis/Overlap-Add (ABS/OLA) sinusoidal model. This model has been shown in previous work to be a useful framework for pitch and time-scale modification of both speech and music signals. This paper explores some extensions of the original ABS/OLA formulation that attempt to overcome specific artifacts, including a phase dithering approach for unvoiced speech synthesis and an improved pitch modification method that compensates for undesirable energy modulation effects. The implementation of the model within a text-to-speech synthesis (TTS) system is described, and the results of a listener evaluation of the method are discussed
    corecore