521 research outputs found

    Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

    Full text link
    For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use inter-utterance linguistic information to improve the performance of PSP. Multi-level contextual information, which includes both inter-utterance and intrautterance linguistic information, is extracted by a hierarchical encoder from character level, utterance level and discourse level of the input text. Then a multi-task learning (MTL) decoder predicts prosodic boundaries from multi-level contextual information. Objective evaluation results on two datasets show that our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH). It demonstrates the effectiveness of using multi-level contextual information for PSP. Subjective preference tests also indicate the naturalness of synthesized speeches are improved.Comment: Accepted by Interspeech202

    Using Phonological Phrase Segmentation to Improve Automatic Keyword Spotting for the Highly Agglutinating Hungarian Language

    Get PDF
    This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor N-gram language modelling capabilities. Therefore, the applied keyword spotting system is based on confidence scores computed as a ratio of acoustic scores obtained in two ways: firstly, by decoding with an universal background model; and secondly, by decoding with a keyword model embedded into filler models. Prosody is used to perform an automatic phonological phrase alignment for speech, proven to be useful for automatic partial word boundary detection in fixed stress languages. Several features deduced from the phonological phrase alignment are investigated to rescore baseline confidence scores both in a rule-based and in a data-driven manner. Results show that in relevant operating points of the system, a false alarm reduction of 10% - 40% can be reached by the same miss probability rates

    Generation of prosody and speech for Mandarin Chinese

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Extending AuToBI to prominence detection in European Portuguese

    Get PDF
    This paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however, is almost impractical for extensive data sets. For that reason, automatic systems such as AuToBi stand as an alternate solution. We have started by applying the AuToBI prosodic event detection system using the existing English models to the prediction of prominent prosodic events (accents) in European Portuguese. This approach achieved an overall accuracy of 74% for prominence detection, similar to state-of-the-art results for other languages. Later, we have trained new models using prepared and spontaneous Portuguese data, achieving a considerable improvement of about 6% accuracy (absolute) over the existing English models. The achieved results are quite encouraging and provide a starting point for automatically predicting prominent events in European Portuguese.info:eu-repo/semantics/publishedVersio

    A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn

    Get PDF
    The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters
    corecore