792 research outputs found

    Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

    Full text link
    Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective. The code is available at \url{https://github.com/Zain-Jiang/Dict-TTS}.Comment: Accepted by NeurIPS 202

    Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

    Full text link
    Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions. Subjective evaluations show that our phoneme-level BERT encoder has significantly improved the mean opinion scores (MOS) of rated naturalness of synthesized speech compared with the state-of-the-art (SOTA) StyleTTS baseline on out-of-distribution (OOD) texts

    Lexical representations of Chinese single characters tested through reformed and standard phonetic compounds

    Get PDF
    A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), University of Hong Kong, April 30, 1992.Thesis (B.Sc)--University of Hong Kong, 1992Also available in print.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Word-reading and word-spelling styles of french beginners : Do all children learn to read and to spell in the same way?

    Get PDF
    This article explores the styles of word reading and word spelling used by beginning readers in the French language. The aim of the study was to find out whether “sub-lexical” and “lexical” styles of reliance, which has been observed in children learning to read and spell in English, exists in French, a language with a more transparent orthography. A sample of 159 subjects were assessed on their reading and spelling of regular words, irregular words and nonwords. Cluster analyses on reading/spelling performances led us to identify various profiles, among which sub-lexical and lexical styles could be discerned. These profiles were then compared across a set of linguistic tasks in order to look for factors that might be related to individual differences in reading/spelling styles. Overall, our findings suggest that quantitative level differences explain most individual variation in literacy. These results are discussed in relation to developmental models of reading and spelling in different orthographic systems
    • …
    corecore