795 research outputs found

    Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

    Full text link
    For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use inter-utterance linguistic information to improve the performance of PSP. Multi-level contextual information, which includes both inter-utterance and intrautterance linguistic information, is extracted by a hierarchical encoder from character level, utterance level and discourse level of the input text. Then a multi-task learning (MTL) decoder predicts prosodic boundaries from multi-level contextual information. Objective evaluation results on two datasets show that our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH). It demonstrates the effectiveness of using multi-level contextual information for PSP. Subjective preference tests also indicate the naturalness of synthesized speeches are improved.Comment: Accepted by Interspeech202

    Chinese compounds : the role of morphosyntactic structure in stress assignment in Shanghai chinese and tone sandhi in mandarin chinese

    Get PDF
    At the interface of morphosyntax and phonology, some phonological behaviors in Chinese languages are sensitive to word domain (stress assignment/stress resolution and tone sandhi). In this thesis, we focus on how morphosyntactic structures can contribute to some phonological behaviors that remain to be puzzles in the Chinese languages. Additionally, a highly-functional morphosyntax-based framework is shown to be realistic to construct a simplified and consistent model in domain construction of T3 tone sandhi in Chinese Mandarin, which has been considered challenging in the literature. Following “Little x heads” theory (Marantz 1995; Marantz 2001) and syntactic incorporated compounding structures (Harley 2009), we use a syntactic multiple-root incorporated structure for Chinese compounding structures to account for the stress assignment and stress resolution (stress clash avoidance) in Shanghai Chinese with revised Phase Impenetrability for Phonology (rPIP) (Embick 2013). Meanwhile, a tentative Concatenation rule (Pak 2008; Chen 2018) after Linearization of Morphological words is proposed to account for the domain construction in T3 tone sandhi in Mandarin Chinese, which refers to specific morphosyntactic information (morphosyntactic locality characteristics and c-command relations). Different from the literature, we add the syntactic multiple-root incorporated structure of Chinese compounding structures into the algorithm of Concatenation rule. This is proved to be essential to successfully construct a unified framework of T3 tone sandhi in Mandarin Chinese both above and below the classical word domain, showing a noteworthy ability to deal with the exceptional situations in Chen (2009), e.g., syntactic words, phonological words and complex predicates. This project supports that morphosyntax-based analysis under syntactic word formation, e.g., Concatenation rules in Distributed Morphology, is a powerful weapon to reveal the processing logic of some controversial phonological rules vaguely floating between the classical lexical and postlexical rules in the literature, e.g., sandhi behaviours. Under the current framework, differently from multimorphemic structures, the monomorphemic structures seem to be opaque in the application process of specific non-cyclic phonological rules. Such opaque monomorphemic structures can be postulated to be a product or outcome of certain phonological rules’ processing economy and efficiency, instead of a true grammatical identity

    Information structure and the referential status of linguistic expression : workshop as part of the 23th annual meetings of the Deutsche Gesellschaft für Sprachwissenschaft in Leipzig, Leipzig, February 28 - March 2, 2001

    Get PDF
    This volume comprises papers that were given at the workshop Information Structure and the Referential Status of Linguistic Expressions, which we organized during the Deutsche Gesellschaft für Sprachwissenschaft (DGfS) Conference in Leipzig in February 2001. At this workshop we discussed the connection between information structure and the referential interpretation of linguistic expressions, a topic mostly neglected in current linguistics research. One common aim of the papers is to find out to what extent the focus-background as well as the topic-comment structuring determine the referential interpretation of simple arguments like definite and indefinite NPs on the one hand and sentences on the other

    Generation of prosody and speech for Mandarin Chinese

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Universal and language-specific processing : the case of prosody

    Get PDF
    A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

    Master of Arts

    Get PDF
    thesisThis thesis investigates the predicate cleft (PC) constructions in Mandarin Chinese. Cheng & Vicente conclude that the topicalized verb and the lower verb in bare PC form a long head movement relation, discarding a remnant movement analysis based on vP-external scrambling. However, to be complete, the argument also needs to consider vP-internal scrambling observed by Soh and a selective deletion analysis. I show that vP-internal scrambling cannot serve to derive a plausible remnant movement analysis; nor can a selective deletion analysis be accomplished. Long head movement is necessary to account for Mandarin bare PC. However, although this conclusion converges with cross-linguistic treatment of predicate clefts, I point out the unreliability of idiom interpretation as a diagnostic for long head movement used in several studies. Moreover, I present the puzzling restriction on the types of categories that can undergo pied-piping with the fronted verb. Last, I show that the verb doubling effect, an unresolved issue in Cheng & Vicente, can be accounted for, if the proposal on parallel chains is adopted. The necessity of a long head movement analysis supports bare phrase structure whereby head-to-spec movement is expected. In addition, it constitutes as an empirical argument against eliminating syntactic head movement. The compositionality of idiom interpretation and the restriction on full PC are worth further study
    corecore