521 research outputs found
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays
an important role in producing natural and intelligible speech. Although
inter-utterance linguistic information can influence the speech interpretation
of the target utterance, previous works on PSP mainly focus on utilizing
intrautterance linguistic information of the current utterance only. This work
proposes to use inter-utterance linguistic information to improve the
performance of PSP. Multi-level contextual information, which includes both
inter-utterance and intrautterance linguistic information, is extracted by a
hierarchical encoder from character level, utterance level and discourse level
of the input text. Then a multi-task learning (MTL) decoder predicts prosodic
boundaries from multi-level contextual information. Objective evaluation
results on two datasets show that our method achieves better F1 scores in
predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase
(IPH). It demonstrates the effectiveness of using multi-level contextual
information for PSP. Subjective preference tests also indicate the naturalness
of synthesized speeches are improved.Comment: Accepted by Interspeech202
Using Phonological Phrase Segmentation to Improve Automatic Keyword Spotting for the Highly Agglutinating Hungarian Language
This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor N-gram language modelling capabilities. Therefore, the applied keyword spotting system is based on confidence scores computed as a ratio of acoustic scores obtained in two ways: firstly, by decoding with an universal background model; and secondly, by decoding with a keyword model embedded into filler models. Prosody is used to perform an automatic phonological phrase alignment for speech, proven to be useful for automatic partial word boundary detection in fixed stress languages. Several features deduced from the phonological phrase alignment are investigated to rescore baseline confidence scores both in a rule-based and in a data-driven manner. Results show that in relevant operating points of the system, a false alarm reduction of 10% - 40% can be reached by the same miss probability rates
Extending AuToBI to prominence detection in European Portuguese
This paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however, is almost impractical for extensive data sets. For that reason, automatic systems such as AuToBi stand as an alternate solution. We have started by applying the AuToBI prosodic event detection system using the existing English models to the prediction of prominent prosodic events (accents) in European Portuguese. This approach achieved an overall accuracy of 74% for prominence detection, similar to state-of-the-art results for other languages. Later, we have trained new models using prepared and spontaneous Portuguese data, achieving a considerable improvement of about 6% accuracy (absolute) over the existing English models. The achieved results are quite encouraging and provide a starting point for automatically predicting prominent events in European Portuguese.info:eu-repo/semantics/publishedVersio
A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn
The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters
- …