Search CORE

2,583 research outputs found

Recommended from our members

Unsupervised intralingual and cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction

Author: Byrne William
Gibson Matthew
Publication venue: IEEE Transactions on Audio, Speech, and Language Processing
Publication date: 01/01/2010
Field of study

Hidden Markov model (HMM)-based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to estimate the transcription of the adaptation data. This paper firstly presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Secondly, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Thirdly, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation

Apollo (Cambridge)

Speech Synthesis Based on Hidden Markov Models

Author: Nankaku Y.
Oura K.
Toda T.
Tokuda K.
Yamagishi J.
Zen H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

Edinburgh Research Explorer

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

Author: A Borthwick
A Ratnaparkhi
AL Berger
AW Black
B Picart
CJ Leggetter
Fahimeh Bahmaninezhad
H Kawahara
H Liang
H Zen
H Zen
H Zen
H Zen
H Zen
H Zen
Hossein Sameti
J Ghomeshi
J Nocedal
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
JJ Odell
K Hashimoto
K Hashimoto
K Oura
K Shinoda
K Tokuda
K Tokuda
K Tokuda
K Yu
K Yu
L Qin
M Bijankhan
M Gibson
MJ Gales
R Kubichek
S Sakai
S Takaki
S Takaki
Simon King
SJ Young
Soheil Khorram
T Drugman
T Drugman
T Koriyama
T Toda
T Toda
T Yoshimura
T Yoshimura
Thomas Drugman
V Rangarajan
VV Digalakis
Y Qian
YJ Wu
YJ Wu
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Edinburgh Research Explorer

Recent development of the HMM-based speech synthesis system (HTS)

Author: Black Alan W
Masuko Takashi
Nose Takashi
Oura Keiichiro
Sako Shinji
Toda Tomoki
Tokuda Keiichi
Yamagishi Junichi
Zen Heiga
Publication venue
Publication date: 01/01/2009
Field of study

A statistical parametric approach to speech synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generate from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMM-based speech synthesis system (HTS)” to provide a research and development toolkit for statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans

CiteSeerX

NAIST Academic Repository

Edinburgh Research Archive

Edinburgh Research Explorer

Hokkaido University Collection of Scholarly and Academic Papers

An integrated approach to speech recognition using phrase-based units

Author: Watkins Christopher James
Publication venue: University of East Anglia
Publication date: 01/01/2010
Field of study

University of East Anglia digital repository

PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

Author: Matějka Pavel
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2009
Field of study

Práce pojednává o fonotaktickém a akustickém přístupu pro automatické rozpoznávání jazyka. První část práce pojednává o fonotaktickém přístupu založeném na výskytu fonémových sekvenci v řeči. Nejdříve je prezentován popis vývoje fonémového rozpoznávače jako techniky pro přepis řeči do sekvence smysluplných symbolů. Hlavní důraz je kladen na dobré natrénování fonémového rozpoznávače a kombinaci výsledků z několika fonémových rozpoznávačů trénovaných na různých jazycích (Paralelní fonémové rozpoznávání následované jazykovými modely (PPRLM)). Práce také pojednává o nové technice anti-modely v PPRLM a studuje použití fonémových grafů místo nejlepšího přepisu. Na závěr práce jsou porovnány dva přístupy modelování výstupu fonémového rozpoznávače -- standardní n-gramové jazykové modely a binární rozhodovací stromy. Hlavní přínos v akustickém přístupu je diskriminativní modelování cílových modelů jazyků a první experimenty s kombinací diskriminativního trénování a na příznacích, kde byl odstraněn vliv kanálu. Práce dále zkoumá různé druhy technik fúzi akustického a fonotaktického přístupu. Všechny experimenty jsou provedeny na standardních datech z NIST evaluaci konané v letech 2003, 2005 a 2007, takže jsou přímo porovnatelné s výsledky ostatních skupin zabývajících se automatickým rozpoznáváním jazyka. S fúzí uvedených technik jsme posunuli state-of-the-art výsledky a dosáhli vynikajících výsledků ve dvou NIST evaluacích.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

Digital library of Brno University of Technology

National Repository of Grey Literature