Search CORE

8 research outputs found

Gesture Control of HMM-Based Singing Voice Synthesis

Author: Astrinaki Maria
Clark Robert
Oura K.
Veaux Christophe
Yamagishi Junichi
Publication venue
Publication date: 01/08/2013
Field of study

Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion

Author: Chng E. S.
Kinnunen T
Li Haizhou
Wu Zhizheng
Publication venue
Publication date: 01/01/2010
Field of study

In voice conversion, frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplicity and text-independence of the frame-level conversion while yielding high-quality conversion. We achieve these goals by (1) introducing a text-independent tri-frame alignment method, (2) including delta features of F0 into Gaussian mixture model (GMM) conversion and (3) reducing the well-known GMM oversmoothing effect by F0 histogram equalization. Our objective and subjective experiments on the CMU Arctic corpus indicate improvements over both the mean/variance normalization and the baseline GMM conversion

CiteSeerX

Edinburgh Research Explorer

Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis

Author: Nurk Tõnis
Publication venue: Tartu Ülikool
Publication date: 01/01/2012
Field of study

Antud bakalaureusetöös antakse ülevaate Markovi peitmudelitel põhineva häälemudeli loomisest eestikeelse kõnesünteesi rakenduste jaoks. Esmalt tutvustatakse tekst-kõne sünteesi protsessi, kirjeldati tüüpilise sünteesisüsteemi komponente ning vaadeldakse enamlevinud paradigmade lähenemist kõnesünteesile. Täpsemalt käsitletakse statistilist parameetrilist kõnesünteesi ja selgitatakse antud töö raames kasutatud Markovi peitmudelitel põhineva sünteesisüsteemi HTS toimimismehhanisme, antakse ülevaade tema eelistest ja puudustest ning võimalikest probleemilahendustest. Praktilises osas kasutatakse Eesti Keele Instituudis koostatud ja salvestatud kõnekorpust. Välja tuuakse korpuse loomise põhimõtted ning seos kõnesünteesisüsteemi lingvistilise töötluse mooduliga ning sellest tulenevad piirangud. Kirjeldatakse tekstianalüüsi arendamisega kaasnenud muutusi häälikusüsteemi valikul. Ära märgitakse kõnekorpuse salvestamisega seotud aspektid ja materjalide hindamise põhimõtted ning analüüsitakse korpuse kvaliteeti mõjutanud leide, millest tulenevalt on muudetud järgnevate korpuste koostamise põhimõtteid. Töö eesmärgiks olnud häälemudeli loomisel tuuakse esmalt välja süsteemi HTS kohandamine eesti keelele, mis sisuliselt tähendab foneetilise ja fonoloogilise spetsifikatsiooni koostamist ja treeningmaterjalide ettevalmistamist. Kuna soovitakse võtta häälemudel kasutusele eestikeelse kõnesünteesi rakendustes, tuleb spetsifikatsioon ühildada saadaval oleva tekstianalüüsi omaga. Katseid tehakse erinevate kõnejuhtide erinevate alamkorpustega ja eksperimenteeritakse lingvistilise spetsifikatsiooniga. Välja tuuakse mees- ja naishäälele treenitud mudelitega genereeritud sünteeskõne näited, mille põhjal antakse ka hinnang mudelite headusele. Ootuspärase tulemusena leitakse, et olulisimad tegurid häälemudeli kvaliteedi juures on treeningkorpuse maht ja kvaliteet. Teine määrav komponent on tekstianalüüs ja tema võimekus efektiivselt teisendada ortograafiline tekst hääldustekstiks. Olulisuselt kolmandaks headuse hinnangu mõjutajaks hinnatakse foneetiliste ja fonoloogiliste kontekstitegurite optimeerimine. Lõpuks tuuakse ära võimalikud tegevused, mille tulemusena on võimalik Markovi peitmudelitel põhineva kõnemudeliga genereeritud sünteeskõne kvaliteeti tõsta.The main purpose of this thesis is to create hidden Markov model based speech models for both male and female voice for Estonian text-to-speech synthesis. To begin with, a brief overview of text-to-speech synthesis process is given, alongside with description of components in a typical speech synthesis system and popular techniques in common use. Subsequently, the thesis focuses on statistical parametric speech synthesis in particular. The technique called hidden Markov model-based speech synthesis which is utilized in the system HTS (HMM-based Speech Synthesis System) is described. HTS is employed to generate voice models needed for this bachelor work. Discussed are the advantages and drawbacks of the system HTS and described are solutions to some of the problems. In the practical part of the work the creation of speech corpus in Institute of the Estonian Language is analyzed. Presented are the guidelines for creation of the corpus as well as its connection with text analysis module and related constraints. Described are the changes to phonetic system in use followed from development of text analysis modules. Given are the aspects related to recording the speech corpus and guidelines to evaluate the quality of the signal produced. Analyzed are the unforeseen findings that affect quality of the corpus and from these new guidelines for corpus construction are derived. Described is the process of adapting Estonian-related training data and linguistic specification to the system HTS. Linguistic specification is compatible with text analysis module in order to enable implementation of the trained voice models to Estonian speech synthesis applications. Experiments are carried out on data from different speakers, subcorpora and linguistic specifications. Presented are examples of generated speech for both male and female voice models trained with HTS. Speech model evaluation process has given expected findings. The most important factors that affect voice model quality are the quality and size of training corpus. It is followed by the ability of text analysis module to generate accurate pronounciation text and optimizing of phonetical and phonological contextual factors. In the end, proposed are two possible courses of action to improve the quality of HMM-based speech models trained: implementation of STRAIGHT vocoder to reduce buzzyness of synthesized speech and optimizing of phonetical and phonological contextual factors

DSpace at Tartu University Library

HMM Based Text-to-Speech Synthesis for Telugu

Author: Gugulothu Narendhar
Publication venue
Publication date: 01/01/2016
Field of study

This thesis describes a novel approach to build a general purpose working Telugu text-to- speech synthesis system (TTS) based on hidden Markov model (HMM) which is reasonably intelligible, natural sounding and exible. There have been several attempts proposed to use HMM for constructing TTS systems. Most of such systems are based on waveform concatenation techniques. To fully convey information present in speech signals, text-to-speech synthesis systems are required to have an ability to generate natural sounding speech with arbitrary speakers individualities and emotions (e.g., anger, sadness, joy). To represent all these factors the Mel- cepstral coefficients are extracted as spectral parameters. Excitation parameters are extracted using fundamental frequency(F0)

Research Archive of Indian Institute of Technology Hyderabad

Voice characteristics conversion for HMM-based speech synthesis system

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

Crossref

Voice characteristics conversion for HMM-based speech synthesis system

Author: MASUKO TAKASHI
益子貴史
Publication venue
Publication date: 30/11/2006
Field of study

Institutional Repositories DataBase (IRDB)