31 research outputs found

    Statistical parametric speech synthesis based on sinusoidal models

    Get PDF
    This study focuses on improving the quality of statistical speech synthesis based on sinusoidal models. Vocoders play a crucial role during the parametrisation and reconstruction process, so we first lead an experimental comparison of a broad range of the leading vocoder types. Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes can generate high quality of speech compared with source-filter ones, component sinusoids are correlated with each other, and the number of parameters is also high and varies in each frame, which constrains its application for statistical speech synthesis. Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease and fix the number of components typically used in the standard sinusoidal model. Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system (HTS), two strategies for modelling sinusoidal parameters have been compared. In the first method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM are statistically modelled directly. In the second method (INT parameterisation), we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients (RDC)) for modelling. Our results show that HDM with intermediate parameters can generate comparable quality to STRAIGHT. As correlations between features in the dynamic model cannot be modelled satisfactorily by a typical HMM-based system with diagonal covariance, we have applied and tested a deep neural network (DNN) for modelling features from these two methods. To fully exploit DNN capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling and waveform generation. For DNN training, we propose to use multi-task learning to model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We conclude from our results that sinusoidal models are indeed highly suited for statistical parametric synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based equivalent when used in conjunction with DNNs. To further improve the voice quality, phase features generated from the proposed vocoder also need to be parameterised and integrated into statistical modelling. Here, an alternative statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued back-propagation algorithm using a logarithmic minimisation criterion which includes both amplitude and phase errors is used as a learning rule. Three parameterisation methods are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued amplitude with minimum phase and complex-valued amplitude with mixed phase. Our results show the potential of using CVNNs for modelling both real and complex-valued acoustic features. Overall, this thesis has established competitive alternative vocoders for speech parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for the parametric statistical speech synthesis

    Colloquium Signaalanalyse en Spraak:22 en 23 oktober 1990 : reader

    Get PDF

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Interactions in Virtual Worlds:Proceedings Twente Workshop on Language Technology 15

    Get PDF

    Proceedings of the 9th international conference on disability, virtual reality and associated technologies (ICDVRAT 2012)

    Get PDF
    The proceedings of the conferenc

    Methods in Contemporary Linguistics

    Get PDF
    The present volume is a broad overview of methods and methodologies in linguistics, illustrated with examples from concrete research. It collects insights gained from a broad range of linguistic sub-disciplines, ranging from core disciplines to topics in cross-linguistic and language-internal diversity or to contributions towards language, space and society. Given its critical and innovative nature, the volume is a valuable source for students and researchers of a broad range of linguistic interests

    Methods in Contemporary Linguistics

    Get PDF
    The present volume is a broad overview of methods and methodologies in linguistics, illustrated with examples from concrete research. It collects insights gained from a broad range of linguistic sub-disciplines, ranging from core disciplines to topics in cross-linguistic and language-internal diversity or to contributions towards language, space and society. Given its critical and innovative nature, the volume is a valuable source for students and researchers of a broad range of linguistic interests

    Methods in Contemporary Linguistics

    Get PDF
    The present volume is a broad overview of methods and methodologies in linguistics, illustrated with examples from concrete research. It collects insights gained from a broad range of linguistic sub-disciplines, ranging from core disciplines to topics in cross-linguistic and language-internal diversity or to contributions towards language, space and society. Given its critical and innovative nature, the volume is a valuable source for students and researchers of a broad range of linguistic interests
    corecore