Search CORE

792 research outputs found

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Author: Jiang Ziyue
Liu Jinglin
Ren Yi
Yang Qian
Ye Zhenhui
Zhao Zhou
Zhe Su
Publication venue
Publication date: 09/10/2022
Field of study

Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective. The code is available at \url{https://github.com/Zain-Jiang/Dict-TTS}.Comment: Accepted by NeurIPS 202

arXiv.org e-Print Archive

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Author: Han Cong
Jiang Xilin
Li Yinghao Aaron
Mesgarani Nima
Publication venue
Publication date: 20/01/2023
Field of study

Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions. Subjective evaluations show that our phoneme-level BERT encoder has significantly improved the mean opinion scores (MOS) of rated naturalness of synthesized speech compared with the state-of-the-art (SOTA) StyleTTS baseline on out-of-distribution (OOD) texts

arXiv.org e-Print Archive

Lexical representations of Chinese single characters tested through reformed and standard phonetic compounds

Author: Ng Yuet-yee
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/1992
Field of study

A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), University of Hong Kong, April 30, 1992.Thesis (B.Sc)--University of Hong Kong, 1992Also available in print.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub

Cross-Cultural/Linguistic Differences in the Prevalence of Developmental Dyslexia and the Hypothesis of Granularity and Transparency

Author: Taeko N. Wydell
Publication venue: 'IntechOpen'
Publication date: 18/04/2012
Field of study

IntechOpen

Word-reading and word-spelling styles of french beginners : Do all children learn to read and to spell in the same way?

Author: Eme Elsa
Golder Caroline
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

This article explores the styles of word reading and word spelling used by beginning readers in the French language. The aim of the study was to find out whether “sub-lexical” and “lexical” styles of reliance, which has been observed in children learning to read and spell in English, exists in French, a language with a more transparent orthography. A sample of 159 subjects were assessed on their reading and spelling of regular words, irregular words and nonwords. Cluster analyses on reading/spelling performances led us to identify various profiles, among which sub-lexical and lexical styles could be discerned. These profiles were then compared across a set of linguistic tasks in order to look for factors that might be related to individual differences in reading/spelling styles. Overall, our findings suggest that quantitative level differences explain most individual variation in literacy. These results are discussed in relation to developmental models of reading and spelling in different orthographic systems

HAL Université de Tours