Search CORE

31 research outputs found

Statistical parametric speech synthesis based on sinusoidal models

Author: Hu Qiong
Publication venue: The University of Edinburgh
Publication date: 07/07/2017
Field of study

This study focuses on improving the quality of statistical speech synthesis based on sinusoidal models. Vocoders play a crucial role during the parametrisation and reconstruction process, so we first lead an experimental comparison of a broad range of the leading vocoder types. Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes can generate high quality of speech compared with source-filter ones, component sinusoids are correlated with each other, and the number of parameters is also high and varies in each frame, which constrains its application for statistical speech synthesis. Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease and fix the number of components typically used in the standard sinusoidal model. Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system (HTS), two strategies for modelling sinusoidal parameters have been compared. In the first method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM are statistically modelled directly. In the second method (INT parameterisation), we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients (RDC)) for modelling. Our results show that HDM with intermediate parameters can generate comparable quality to STRAIGHT. As correlations between features in the dynamic model cannot be modelled satisfactorily by a typical HMM-based system with diagonal covariance, we have applied and tested a deep neural network (DNN) for modelling features from these two methods. To fully exploit DNN capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling and waveform generation. For DNN training, we propose to use multi-task learning to model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We conclude from our results that sinusoidal models are indeed highly suited for statistical parametric synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based equivalent when used in conjunction with DNNs. To further improve the voice quality, phase features generated from the proposed vocoder also need to be parameterised and integrated into statistical modelling. Here, an alternative statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued back-propagation algorithm using a logarithmic minimisation criterion which includes both amplitude and phase errors is used as a learning rule. Three parameterisation methods are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued amplitude with minimum phase and complex-valued amplitude with mixed phase. Our results show the potential of using CVNNs for modelling both real and complex-valued acoustic features. Overall, this thesis has established competitive alternative vocoders for speech parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for the parametric statistical speech synthesis

Edinburgh Research Archive

Colloquium Signaalanalyse en Spraak:22 en 23 oktober 1990 : reader

Author
Publication venue: Instituut voor Perceptie Onderzoek (IPO)
Publication date: 03/10/1990
Field of study

Pure OAI Repository

Proceedings of the 7th Sound and Music Computing Conference

Author: Emilia Gómez
Perfecto Herrera
Rafael Ramirez
Publication venue: SMC Network
Publication date: 25/07/2010
Field of study

Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

ZENODO

Interactions in Virtual Worlds:Proceedings Twente Workshop on Language Technology 15

Author
Publication venue: 'University Library/University of Twente'
Publication date: 19/05/1999
Field of study

University of Twente Research Information

International Conference on Human-Informed Translation and Interpreting Technology (HiT-IT 2023) Proceedings

Author: Corpas Pastor G.
Mitkov R.
Monti J.
Orăsan C.
Publication venue: Incoma Ltd.
Publication date: 01/01/2023
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Fine-structure processing, frequency selectivity and speech perception in hearing-impaired listeners

Author: Dau Torsten
Strelcyk Olaf
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2008
Field of study

Online Research Database In Technology

Proceedings of the 9th international conference on disability, virtual reality and associated technologies (ICDVRAT 2012)

Author
Publication venue: The University of Reading
Publication date: 01/01/2012
Field of study

The proceedings of the conferenc

Central Archive at the University of Reading

Methods in Contemporary Linguistics

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 21/11/2022
Field of study

The present volume is a broad overview of methods and methodologies in linguistics, illustrated with examples from concrete research. It collects insights gained from a broad range of linguistic sub-disciplines, ranging from core disciplines to topics in cross-linguistic and language-internal diversity or to contributions towards language, space and society. Given its critical and innovative nature, the volume is a valuable source for students and researchers of a broad range of linguistic interests

Directory of Open Access Books (DOAB)

Methods in Contemporary Linguistics

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library

Methods in Contemporary Linguistics

Author
Publication venue
Publication date
Field of study

OAPEN Library