6 research outputs found

    The SP2 SCOPES Project on Speech Prosody

    Get PDF
    This is an overview of a Joint Research Project within the Scientific co-operation between Eastern Europe and Switzerland (SCOPES) Program of the Swiss National Science Foundation (SNFS) and Swiss Agency for Development and Cooperation (SDC). Within the SP2 SCOPES Project on Speech Prosody, in the course of the following two years, the four partners aim to collaborate on the subject of speech prosody and advance the extraction, processing, modeling and transfer of prosody for a large portfolio of European languages: French, German, Italian, English, Hungarian, Serbian, Croatian, Bosnian, Montenegrin, and Macedonian. Through the intertwined four research plans, synergies are foreseen to emerge that will build a foundation for submitting strong joint proposals for EU funding

    短语切分、重点突出与形态声调学:永宁摩梭话的语句如何形成声调组

    Get PDF
    International audienceYongning Na is a Sino-Tibetan language spoken in an area straddling the border between Yunnan and Sichuan. The Yongning Na tone system is based on three levels: L, M, and H. It comprises a host of rules that are specific to certain morphosyntactic contexts. These rules represent the bulk of what language learners must acquire to master the tone system. Different rules apply in the association of a verb with a subject or an object, the association of two nouns into a compound, that of a numeral and classifier, and that of a word and its affixes, for instance. The domain of tonal computation is referred to here as the tone group; tonal processes never apply across tone-group junctures. The present study investigates how utterances are divided into tone groups in Yongning Na, building on examples from narratives and elicited combinations. There is no hard-and-fast correspondence between syntactic structure and tone group divisions: several options are generally open for the division of an utterance into tone groups. The choice among these options depends on considerations of information structure. This study is intended as a stepping-stone towards the long-term goal of modelling the Na tonal system (its morpho-phonology and its phonetics), and placing the findings in a typological perspective.永宁摩梭话(纳语)是汉藏语族纳语组的一种语言,位于云南跟四川交界地带的永宁坝与泸沽湖地区。永宁纳语的声调系统有高、中、低三个调域。本文介绍和分析纳语中的语句如何被划分为“声调组”。选择何种声调组往往反映了语句不同的信息结构。由于句法结构跟声调组的切分没有硬性直接的对应,说话人可以选择将一个大的组块整合为一个声调组,从而形成强整合,或者也可以把语句分成一些声调组,加强不同成分的风格效应。论文用详细的例子展示了在语句分为声调组的过程中说话人选择某种变调类型的动机。结构越紧密的语句,其可能划分的声调组就越少。而当出现富于表现力的或较生动的某个词时,句子便会用声调来切分

    Intonation modelling using a muscle model and perceptually weighted matching pursuit

    Get PDF
    We propose a physiologically based intonation model using perceptual relevance. Motivated by speech synthesis from a speech-to-speech translation (S2ST) point of view, we aim at a language independent way of modelling intonation. The model presented in this paper can be seen as a generalisation of the command response (CR) model, albeit with the same modelling power. It is an additive model which decomposes intonation contours into a sum of critically damped system impulse responses. To decompose the intonation contour, we use a weighted correlation based atom decomposition algorithm (WCAD) built around a matching pursuit framework. The algorithm allows for an arbitrary precision to be reached using an iterative procedure that adds more elementary atoms to the model. Experiments are presented demonstrating that this generalised CR (GCR) model is able to model intonation as would be expected. Experiments also show that the model produces a similar number of parameters or elements as the CR model. We conclude that the GCR model is appropriate as an engineering solution for modelling prosody, and hope that it is a contribution to a deeper scientific understanding of the neurobiological process of intonation

    Intonation & Prosodic Structure in Beaver (Athabaskan) - Explorations on the language of the Danezaa

    Get PDF
    This dissertation reports on qualitative and quantitative investigations on the intonation and the prosodic structure of Beaver, an endangered Athabaskan language of Northwest Canada. The focus of the study is on the Northern Alberta dialect of Beaver, which has lexical tone and is a high marking Athabaskan language. The theoretical framework of the analysis is the Autosegmental Metrical (AM) theory. Following some background on intonation and prosody as well as the theoretical modelling, we summarize contributions dealing with intonation in languages that share certain features with Beaver, i.e. tone languages, polysynthetic languages and finally the related Athabaskan languages. After a brief introduction to the grammatical structure and the sociolinguistic situation of Northern Alberta Beaver, the database of the present study is introduced. It consists of narratives and task oriented dialogues as well as recordings elicited with stimuli sets. In the domain of intonation and prosody, three topics are investigated in detail. First, domain initial prosodic strengthening is analyzed. We show that a boundary initial position at higher constituents of the prosodic hierarchy has a lengthening effect on VOT of both aspirated and unaspirated plosives, while nasals are shortened in this context. Additionally, effects of morphological category (stem vs. prefix) and intervocalic position � two mechanisms that have been described for other Athabaskan languages � are also attested for Beaver to some degree. Second, the intonational tones that have been found in the corpus are analyzed within the AM theory. In Northern Alberta Beaver, boundary tones and phrase accents make up the intonational inventory. Most notably, an initial phrase accent is used to mark contrast, which is a device that has not been reported for the marking of information structure in other languages. Lastly, the interaction of information structure with pitch range in complex noun phrases is tested in a controlled experiment. Here, we find that pitch range is significantly wider for new information than for given, which is due to a raising of the top line, while the baseline is not affected to the same extend

    Prosody analysis and modeling for Cantonese text-to-speech.

    Get PDF
    Li Yu Jia.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references.Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1. --- TTS Technology --- p.1Chapter 1.2. --- Prosody --- p.2Chapter 1.2.1. --- What is Prosody --- p.2Chapter 1.2.2. --- Prosody from Different Perspectives --- p.3Chapter 1.2.3. --- Acoustical Parameters of Prosody --- p.3Chapter 1.2.4. --- Prosody in TTS --- p.5Chapter 1.2.4.1 --- Analysis --- p.5Chapter 1.2.4.2 --- Modeling --- p.6Chapter 1.2.4.3 --- Evaluation --- p.6Chapter 1.3. --- Thesis Objectives --- p.7Chapter 1.4. --- Thesis Outline --- p.7Reference --- p.8Chapter Chapter 2 --- Cantonese --- p.9Chapter 2.1. --- The Cantonese Dialect --- p.9Chapter 2.1.1. --- Phonology --- p.10Chapter 2.1.1.1 --- Initial --- p.11Chapter 2.1.1.2 --- Final --- p.12Chapter 2.1.1.3 --- Tone --- p.13Chapter 2.1.2. --- Phonological Constraints --- p.14Chapter 2.2. --- Tones in Cantonese --- p.15Chapter 2.2.1. --- Tone System --- p.15Chapter 2.2.2. --- Linguistic Significance --- p.18Chapter 2.2.3. --- Acoustical Realization --- p.18Chapter 2.3. --- Prosodic Variation in Continuous Cantonese Speech --- p.20Chapter 2.4. --- Cantonese Speech Corpus - CUProsody --- p.21Reference --- p.23Chapter Chapter 3 --- F0 Normalization --- p.25Chapter 3.1. --- F0 in Speech Production --- p.25Chapter 3.2. --- F0 Extraction --- p.27Chapter 3.3. --- Duration-normalized Tone Contour --- p.29Chapter 3.4. --- F0 Normalization --- p.30Chapter 3.4.1. --- Necessity and Motivation --- p.30Chapter 3.4.2. --- F0 Normalization --- p.33Chapter 3.4.2.1 --- Methodology --- p.33Chapter 3.4.2.2 --- Assumptions --- p.34Chapter 3.4.2.3 --- Estimation of Relative Tone Ratios --- p.35Chapter 3.4.2.4 --- Derivation of Phrase Curve --- p.37Chapter 3.4.2.5 --- Normalization of Absolute FO Values --- p.39Chapter 3.4.3. --- Experiments and Discussion --- p.39Chapter 3.5. --- Conclusions --- p.44Reference --- p.45Chapter Chapter 4 --- Acoustical FO Analysis --- p.48Chapter 4.1. --- Methodology of FO Analysis --- p.48Chapter 4.1.1. --- Analysis-by-Synthesis --- p.48Chapter 4.1.2. --- Acoustical Analysis --- p.51Chapter 4.2. --- Acoustical FO Analysis for Cantonese --- p.52Chapter 4.2.1. --- Analysis of Phrase Curves --- p.52Chapter 4.2.2. --- Analysis of Tone Contours --- p.55Chapter 4.2.2.1 --- Context-independent Single-tone Contours --- p.56Chapter 4.2.2.2 --- Contextual Variation --- p.58Chapter 4.2.2.3 --- Co-articulated Tone Contours of Disyllabic Word --- p.59Chapter 4.2.2.4 --- Cross-word Contours --- p.62Chapter 4.2.2.5 --- Phrase-initial Tone Contours --- p.65Chapter 4.3. --- Summary --- p.66Reference --- p.67Chapter Chapter5 --- Prosody Modeling for Cantonese Text-to-Speech --- p.70Chapter 5.1. --- Parametric Model and Non-parametric Model --- p.70Chapter 5.2. --- Cantonese Text-to-Speech: Baseline System --- p.72Chapter 5.2.1. --- Sub-syllable Unit --- p.72Chapter 5.2.2. --- Text Analysis Module --- p.73Chapter 5.2.3. --- Acoustical Synthesis --- p.74Chapter 5.2.4. --- Prosody Module --- p.74Chapter 5.3. --- Enhanced Prosody Model --- p.74Chapter 5.3.1. --- Modeling Tone Contours --- p.75Chapter 5.3.1.1 --- Word-level FO Contours --- p.76Chapter 5.3.1.2 --- Phrase-initial Tone Contours --- p.77Chapter 5.3.1.3 --- Tone Contours at Word Boundary --- p.78Chapter 5.3.2. --- Modeling Phrase Curves --- p.79Chapter 5.3.3. --- Generation of Continuous FO Contours --- p.81Chapter 5.4. --- Summary --- p.81Reference --- p.82Chapter Chapter 6 --- Performance Evaluation --- p.83Chapter 6.1. --- Introduction to Perceptual Test --- p.83Chapter 6.1.1. --- Aspects of Evaluation --- p.84Chapter 6.1.2. --- Methods of Judgment Test --- p.84Chapter 6.1.3. --- Problems in Perceptual Test --- p.85Chapter 6.2. --- Perceptual Tests for Cantonese TTS --- p.86Chapter 6.2.1. --- Intelligibility Tests --- p.86Chapter 6.2.1.1 --- Method --- p.86Chapter 6.2.1.2 --- Results --- p.88Chapter 6.2.1.3 --- Analysis --- p.89Chapter 6.2.2. --- Naturalness Tests --- p.90Chapter 6.2.2.1 --- Word-level --- p.90Chapter 6.2.2.1.1 --- Method --- p.90Chapter 6.2.2.1.2 --- Results --- p.91Chapter 6.2.3.1.3 --- Analysis --- p.91Chapter 6.2.2.2 --- Sentence-level --- p.92Chapter 6.2.2.2.1 --- Method --- p.92Chapter 6.2.2.2.2 --- Results --- p.93Chapter 6.2.2.2.3 --- Analysis --- p.94Chapter 6.3. --- Conclusions --- p.95Chapter 6.4. --- Summary --- p.95Reference --- p.96Chapter Chapter 7 --- Conclusions and Future Work --- p.97Chapter 7.1. --- Conclusions --- p.97Chapter 7.2. --- Suggested Future Work --- p.99Appendix --- p.100Appendix 1 Linear Regression --- p.100Appendix 2 36 Templates of Cross-word Contours --- p.101Appendix 3 Word List for Word-level Tests --- p.102Appendix 4 Syllable Occurrence in Word List of Intelligibility Test --- p.108Appendix 5 Wrongly Identified Word List --- p.112Appendix 6 Confusion Matrix --- p.115Appendix 7 Unintelligible Word List --- p.117Appendix 8 Noisy Word List --- p.119Appendix 9 Sentence List for Naturalness Test --- p.12

    Quantitative measurement of prosodic strength in Mandarin

    No full text
    We describe models of Mandarin prosody that allow us to make quantitative measurements of prosodic strengths. These models use Stem-ML, which is a phenomenological model of the muscle dynamics and planning process that controls the tension of the vocal folds, and therefore the pitch of speech. Because Stem-ML describes the interactions between nearby tones, we were able to capture surface tonal variations using a highly constrained model with only one template for each lexical tone category, and a single prosodic strength per word. The model accurately reproduces the intonation of the speaker, capturing 87 % of the variance of fo with these strength parameters. The result reveals alternating metrical patterns in words, and shows that the speaker marks a hierarchy of boundaries by controlling the prosodic strength of words. The strengths we obtain are also correlated with syllable duration, mutual information and part-of-speech
    corecore