18 research outputs found

    Tone sandhi, prosodic phrasing, and focus marking in Wenzhou Chinese

    Get PDF
    In most languages, focus (i.e. highlighting information) is marked by modifying the melody of the sentence. But how is focus marked in a Chinese dialect with eight different citation tones and a complex tonal phonology?This thesis investigates the connection between tonal realization and tone change (tone sandhi) in Wenzhou Chinese, and whether and how such a connection is conditioned by prosodic structure and focus marking. Experiments were conducted with young speakers of Wenzhou Chinese, whose speech was acoustically analyzed so as to investigate the application domain of tone sandhi and the influence of focus thereon, the tonal realization on the word and phrase level and its interaction with focus, the pre-planning of sentential pitch, as well as the realization of referents with different information statuses. The experimental findings suggest that the application, but not the implementation, of tone sandhi is independent of focus, and that focus and prosodic structure have similar but independent effects on the realization of lexical tones. It is also shown that pitch scaling is sensitive to syntactic structure and complexity, and that the marking of givenness, broad focus, and narrow focus leads to discrete levels along the same acoustic parameters. These findings are of interest to researchers working on lexical tone, prosodic structure, and how information structure categories such as focus affect tonal realization and prosodic phrasing.LEI Universiteit LeidenNWO VIDI grant 061084338 to dr. Y. ChenLanguage Use in Past and Presen

    A study on form and function of prosody based on acoustics, interpretation, and modelling - with evidence from the analysis by synthesis of Mandarin speech prosody

    Get PDF
    An analysis-by-synthesis study on Mandarin speech prosody is conducted in the present dissertation. The features of Mandarin speech prosody are discussed by focusing on two salient aspects: the function of prosody and the form of prosody. The study attempts to find a plausible way in which the two aspects can be mapped onto each other through the functional analysis of prosody and the multi-level formal representation. The form of Mandarin speech prosody is a complex F0 picture due to the simultaneous uses of pitch contours by both lexical tones and sentential intonation. The phenomenon of tone sandhi in speech context triggers more puzzling issues when researchers are confronted with the acoustic form of Mandarin prosody. The functional use of prosody in Mandarin speech concerns: at the lexical level for word identity (Tone1, Tone2, Tone3, Tone4, and Tone0); at the sentential level for prominence marking (sentence accents) and the indication of prosodic boundaries (intonation boundary tones). In the present study, the analysis of prosodic function at the two levels provides a basic framework in coding the surface melodic form of Mandarin prosody, which consists of pitch contours in tonal units and boundary tones at the beginning and end of intonation unit. For the formal representation of Mandarin speech prosody, the surface F0 contour of each utterance is coded into a sequence of INTSINT symbols, and subject to the Prozed tool for speech synthesis. It is shown that the synthesized stimuli derived from the symbolic coding can closely follow the melodic features and correctly express the prosodic function of the original Mandarin utterances. The present study employs acoustic data, symbolic coding, and speech synthesis for the derivative mapping between prosodic function and form, which aims to interpret the complex prosodic phenomenon, and provide an insight for the annotation and analysis of Mandarin speech prosody

    Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

    Get PDF
    Variability has been one of the major challenges for both theoretical understanding and computer synthesis of speech prosody. In this paper we show that economical representation of variability is the key to effective modeling of prosody. Specifically, we report the development of PENTAtrainer—A trainable yet deterministic prosody synthesizer based on an articulatory–functional view of speech. We show with testing results on Thai, Mandarin and English that it is possible to achieve high-accuracy predictive synthesis of fundamental frequency contours with very small sets of parameters obtained through stochastic learning from real speech data. The first key component of this system is syllable-synchronized sequential target approximation—implemented as the qTA model, which is designed to simulate, for each tonal unit, a wide range of contextual variability with a single invariant target. The second key component is the automatic learning of function-specific targets through stochastic global optimization, guided by a layered pseudo-hierarchical functional annotation scheme, which requires the manual labeling of only the temporal domains of the functional units. The results in terms of synthesis accuracy demonstrate that effective modeling of the contextual variability is the key also to effective modeling of function-related variability. Additionally, we show that, being both theory-based and trainable (hence data-driven), computational systems like PENTAtrainer can serve as an effective modeling tool in basic research, with which the level of falsifiability in theory testing can be raised, and also a closer link between basic and applied research in speech science can be developed

    The effect of explicit instruction and auditory/audio-visual training on Chinese EFL learner's perception of intonation

    Get PDF
    Ph. D. Thesis (Integrated)Intonation accounts for a big part in speech intelligibility and is notoriously difficult to be acquired by L2 learners. The bulk of research on L2 intonation has focussed on the examination of learners’ intonational performance at the phonetic and phonological levels using perceptual and/or production tasks; however, empirical studies on whether and how intonation training can help improve learners’ performance are surprisingly scarce. This study fills this gap by devising instruction and training materials which were meticulously tailored for Chinese learners of English, the largest population of English learners in the world. The participants were 60 English-related majoring students from Newcastle University, most of whom wanted to become English teachers following their studies. They were pseudorandomly mapped into three groups according to their overall English proficiency. Two of the groups were taught explicitly on the forms and functions of English intonation but one selfpracticed auditorily on Audacity whereas the other audio-visually on Praat. The third group, which served as control, did not get any intonation training. Learners’ competence of intonation was assessed by a comprehension task before, immediately after, and two months after the three-week training course. Ten native speakers of Southern British English were recruited for the pre- and post-test to set a baseline for the analysis of learners’ performance. The results are: 1. Chinese EFL learners did significantly worse than native speakers in terms of understanding intonation meanings contrasted by accentuation, phrasing, and tone. 2. Learners’ comprehension ability was improved immediately after the training for all three aspects. 3. The training effect remain in the delayed post-test. 4. The audio-visual group did not perform significantly better than the auditory group. The results indicate that certain aspects of intonation are teachable and learnable, and tailor-made instruction and materials are effective and applicable in use. This study provides English teachers in China with novel ways to equip Chinese EFL learners with greater intonational competence

    The suprasegmental signaling of attitude in German and Chinese : a phonetically oriented contribution to intercultural communication

    Get PDF
    In den letzten 15 Jahren ist ein wachsendes Interesse an den Erkenntnissen der interkulturellen Kommunikationsforschung zu verzeichnen. Während sich Anthropologie, Soziologie und Kulturpsychologie mit kulturell bedingten Unterschieden in der Mentalität, der Lebensweise und des Interaktionsethos beschäftigen, interessiert sich die Linguistik - vor allem die Sozio-linguistik - für die Probleme, die auf Unterschiede in den kommunikativen Gewohnheiten der Menschen zurückzuführen sind. Hierbei treten die suprasegmentellen Mittel immer mehr in den Vordergrund: Wie die Forschungen der Interaktionalen Soziolinguistik gezeigt haben, sind viele Missverständnisse in der interkulturellen Kommunikation auf Fehlinterpretationen von Intonation, Tonhöhe, Lautstärke, Geschwindigkeit und Stimmqualität zurückzuführen. In dieser Arbeit wird der interkulturelle Ansatz der Interaktionalen Soziolinguistik mit den Arbeitsmethoden der Experimentalphonetik kombiniert und auf das Deutsche und das Chinesische (Mandarin) angewandt. Nach einer einführenden Betrachtung des chinesischen Interaktionsethos, der mit den in den westlichen Welt vorherrschenden Interaktionsnormen verglichen wird, beschäftigt sich die Arbeit schwerpunktmäßig mit den Funktionen der supra-segmentellen Mittel in den zwei Sprachen, vor allem in Bezug auf die Kommunikation von Sprechereinstellung (attitude). Zu diesem Zweck werden Dialoge mit deutschen und chinesi-schen Sprechern organisiert, in denen unterschiedliche Sprechereinstellungen zum Ausdruck kommen. Diese werden in Hörtests mit deutschen und chinesischen Muttersprachlern analysiert und anhand von Sprechverhaltensmustern (interaction strategies) beschrieben. Die phonetische Exponenz dieser Sprechverhaltensmuster in den zwei Sprachen wird dann in einer mehrteiligen phonetischen Sprachanalyse bestimmt. Der Vergleich der phonetischen Exponenz dieser Sprechverhaltensmuster im Chinesischen und Deutschen gibt Aufschluss über die Faktoren, die in der suprasegmentellen Kommunikation zwischen Sprechern dieser Sprachen zu Problemen führen können. Ein besonderes Augenmerk liegt dabei auf der Rolle der Intonation in Chinesischen - ein Bereich, der fast gänzlich unerforscht ist.The last 15 years have seen a growing interest in the concerns and achievements of inter-cultural communication research, prompting a steady increase of scholarly writings on topics like intercultural management, cross-cultural business communication and intercultural com-munication at work. Thus, researchers in anthropology, sociology and psychology are taking a growing interest in the problems arising from culturally-patterned differences in mentality, way of life and norms of interaction. Linguists, on the other hand, especially those working in the sociologically and/or anthropologically-oriented disciplines, such as interactional socio-linguistics, are examining the linguistic problems of intercultural communication - those originating in differences in the use of language. In the latter field in particular, attention has come to focus strongly on the use of the suprasegmental features of intonation, pitch, loudness, tempo and voice quality, as differences in the use of these features have been found to cause frequent and serious misunderstandings of speaker meaning and intent. In this work the intercultural approach of interactional sociolinguistics is combined with the working methods of experimental phonetics and applied to German and Chinese (Mandarin). Following an introductory examination of the Chinese norms of interaction which are compared with those of the western world, this work focuses on the communicative functions of the suprasegmental features in the two languages, in particular in the signaling of attitude. To this aim, dialogs with German and Chinese speakers are organized, in the course of which different speaker attitudes are elicited. These attitudes are determined in listening tests with native speakers of German and Chinese and described in terms of patterns of speech behavior, referred to as interaction strategies. This is followed by the determination of the phonetic exponency of these interaction strategies in the two languages, achieved with the help of conscientious phonetic speech analyses. The phonetic exponency of the interactions strategies in German and Chinese is then compared to reveal the areas which can cause problems in suprasegmental communication between speakers of these two languages. Special emphasis is placed on the role of intonation in Chinese, a field of research which is virtually untouched

    The dynamics of Japanese prosody

    Get PDF
    This dissertation explores aspects of Tokyo Japanese (Japanese henceforth) prosody through acoustic analysis and analysis-by-synthesis. It 1) revisits existing issues in Japanese prosody with the minimal use of abstract notions and 2) tests if the Parallel Encoding and Target Approximation (Xu, 2005) framework is suitable for Japanese, a pitch accent language. The first part of the dissertation considers the nature of lexical pitch accent through examining factors that affect the surface F0 realisation of an accent peak (Chapter 2) and establishing the articulatory domain that hosts a tonal target in Japanese (Chapter 3). Next, pitch accent interactions with other communicative functions are considered, specifically in terms of focus (Chapter 4) and sentence type (Chapter 5). Hypotheses using acoustic analyses from the previous Chapters are then verified through analysis-by-synthesis with articulatory synthesisers AMtrainer, PENTAtrainer1, and PENTAtrainer2 (Chapter 6). Chapter 2 provides conclusive evidence of Japanese as a two-tone language as opposed to bearing three underlying tones in its phonology, previously unresolved in existing literature. Proponents of the two-tone hypothesis gather evidence from perception: when stimuli are played in isolation, native listeners can only distinguish two tone levels (High and Low). On the other hand, production evidence reveals robustly three distinct surface F0 levels. Using a series of linear regression analyses, I show the third tone level could be interpreted as a result of pre-low raising, a common articulatory phenomenon. The F0 of an accent peak is inversely correlated with the F0 of the following low target, being an enhanced peak in preparation for the upcoming L. Interpreted together with native listeners’ inability to hear three tones when said in isolation, as repeatedly reported in previous studies, I establish Japanese has only H and L in its tonal inventory. Chapter 3 establishes the syllable as the tone-bearing unit in Japanese tonal articulation. Often described as a mora-timed language, it has been previously unclear whether articulatory tonal targets are hosted in a mora or a syllable in Japanese. When comparing accented words of various syllable structures I found that the F0 accent peak of CVCV wordsoccurs consistently earlier than that of CVn/CVCV words. CVCV words are longer in total duration so its earlier F0 peak is a result of a shorter tone-bearing unit (i.e. two consecutive short morae/syllables). CVn/CVV words on the other hand have a later peak F0 due to hosting an articulatory target as a long syllable, rather than two short morae. I further verified the syllable hypothesis using two articulatory synthesisers, PENTAtrainer1 and PENTAtrainer2. The syllable as a tone-bearing unit incurs fewer predictors but provides better learning accuracy. Chapter 4 explores focus prosody in declarative sentences. Using a newly collected corpus of 6251 sentences that controls for accent condition, focus condition, sentence type, and sentence length, I challenge the widely held idea that post-focus compression of F0 range is accent-independent. Currently it is generally accepted that regardless of the accent condition of the focused word, the excursion size of ‘initial rise’ that marks the beginning of the first word 4 after focus is shrunken. However, confining the notion of post-focus compression to initial-rise (usually extending across only two morae) sets Japanese apart from other languages like English or Mandarin, where such compression is robust across the entire post-focus domain. I show that when F0 range is measured across a wider domain, compression is absent. Where post-focus compression is absent, the F0 trajectory appears to be a result of articulatory carryover effects. This will be interpreted as a result of weak articulatory strength on the post focus domain, explaining the difference in F0 trajectories in long and short utterances. Chapter 5 builds on the previous Chapter to consider in addition the focus prosody in yes/no questions. I investigate what marks a yes/no question, and how focus prosody differs in declarative and interrogative utterances. Acoustic analyses show that questions are marked by a final rise, but the exact shape of such a rise depends on the accent condition of the sentence-final word. When compared to declarative sentences, the key differences in yes/no questions include: a higher F0 level; the absence of post-focus compression even in contexts otherwise observed in statements; and on-focus F0 raising as the only robust focus marker. These findings point to the fact that interrogative focus prosody is not an amalgamation of focus markers and question markers, and bear implication on the representation of Japanese intonation. Chapter 6 verifies observations established thus far through analysis-by-synthesis. I demonstrate comparative modeling as a means to adjudicate between competing theories using PENTAtrainer2, PENTAtrainer1 and AMtrainer. In terms of local fitting accuracy, AMtrainer yielded comparable synthesis accuracy to the PENTAtrainers. Finally, I further demonstrate the compatibility of PENTA with Japanese prosody showing highly accurate F0 predictive analysis (when trained with Chapter 2 production data), and highly satisfactory speaker-dependent synthesis accuracy (when trained with Chapter 4 and 5 sentential data). Naturalness judgment ratings show that the natural stimuli sound as natural as the synthetic stimuli, though questions generally sound less natural than statements. Reasons for this discrepancy are discussed with reference to the design of the stimuli

    Three-dimensional point-cloud room model in room acoustics simulations

    Get PDF

    The perceptual flow of phonetic feature processing

    Get PDF
    corecore