5 research outputs found

    Improving speech synthesis with discourse relations

    Get PDF

    Analyzing Prosody with Legendre Polynomial Coefficients

    Full text link
    This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. We additionally perform machine learning classification for both tasks, (achieving an accuracy of 72.3% for nativeness classification, and achieving 81.57% for sarcasm detection). We recommend that linguists looking to analyze prosodic contours make use of Legendre polynomial coefficients modeling; the accuracy and quality of the resulting prosodic contour representations makes them highly interpretable for linguistic analysis

    Prosodic and discourse function variations in lexical bundles in university lectures

    Get PDF
    Multiword sequences are important components of language because they are building blocks that can be used to create long stretches of discourse. They are word combinations that have particular importance because of their co-occurrence and function in discourse that suggest that they are stored and retrieved from memory as a whole rather than as separate word units. The functions that they perform in discourse can vary according to register. In spoken academic discourse, one of the essential functions of multi-word sequences is a discourse organizing function that include introducing a topic and elaborating on an existing topic These varied discourse functions have two main roles in the information structure of discourse: as a major rhetorical organizer or a minor rhetorical organizer (Chaudron & Richards, 1986). However, studies that have examined the discourse-organizing role of spoken linguistic devices, including multi-word sequences, either have examined limited data or have analyzed them from written transcripts only, overlooking an important aspect of speech, i.e., prosody, that has an important communicative role. This study focuses on one type of multi-word sequence, lexical bundles, which are frequently used recurrent word combinations that are identified computationally in a corpus to understand how their prosodic variations are linked to their discourse function(s). Lexical bundles in spoken academic discourse have been found to have a discourse-organizing function through analyzing spoken text from orthographic transcription. However, what remains to be explored is their prosodic features that have the potential to specify specific discourse-organizing functions more precisely. Therefore, this study focuses on understanding the relationship between the prosodic variation(s) and discourse function(s) of frequently occurring lexical bundles in a corpus. This study used a corpus-driven framework to analyze the prosodic and discourse function variations of lexical bundles in a spoken academic corpus compiled from YALE open courses. The discourse function of the lexical bundles was analyzed using transcripts and audio files to find emerging patterns in their rhetorical function in information structure. In other words, lexical bundles were classified according to the relationship to preceding and following discourse, i.e., whether it introduced a new topic or expanded, contrasted, or emphasized specific details related to the main topic. Prosodic analysis involved examining pitch movement and prominence within the lexical bundle. Then, the emerging prosodic patterns and their corresponding discourse functions were cross-tabulated to understand the relationship between them. Findings indicate that some lexical bundles had multiple prosodic variations related to discourse function variations while others had minimal prosodic variation related to one discourse function. The discourse functions were categorized as having a major rhetorical organization role (introducing the main topic for discussion, connecting topics, major contrast) or a minor rhetorical organization role (expanding on a topic through specific details, providing background information, exemplification, or rephrasing, contrasting ideas, emphasizing important information). The variation in discourse function and prosody of lexical bundles in university lectures may indicate that some lexical bundles are more formulaic than others

    A computational memory and processing model for prosody

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts & Sciences, 1999.Includes bibliographical references (p. 209-226).This thesis links processing in working memory to prosody in speech, and links different working memory capacities to different prosodic styles. It provides a causal account of prosodic differences and an architecture for reproducing them in synthesized speech. The implemented system mediates text-based information through a model of attention and working memory. The main simulation parameter of the memory model quantifies recall. Changing its value changes what counts as given and new information in a text, and therefore determines the intonation with which the text is uttered. Other aspects of search and storage in the memory model are mapped to the remainder of the continuous and categorical features of pitch and timing, producing prosody in three different styles: for small recall values, the exaggerated and sing-song melodies of children's speech; for mid-range values, an adult expressive style; for the largest values, the prosody of a speaker who is familiar with the text, and at times sounds bored or irritated. In addition, because the storage procedure is stochastic, the prosody from simulation to simulation varies, even for identical control parameters. As with with human speech, no two renditions are alike. Informal feedback indicates that the stylistic differences are recognizable and that the prosody is improved over current offerings. A comparison with natural data shows clear and predictable trends although not at significance. However, a comparison within the natural data also did not produce results at significance. One practical contribution of this work is a text mark-up schema consisting of relational annotations to grammatical structures. Another is the product - varied and plausible prosody in synthesized speech. The main theoretical contribution is to show that resource-bound cognitive activity has prosodic correlates, thus providing a rationale for the individual and stylistic differences in melody and rhythm that are ubiquitous in human speech.by Janet Elizabeth Cahn.Ph.D

    The Second Language Acquisition of Mandarin Chinese Tones by English, Japanese and Korean Speakers

    Get PDF
    This dissertation explores the second language acquisition of Mandarin Chinese tones by speakers of non-tonal languages within the framework of Optimality Theory. The effects of three L1s are analyzed: American English, a stress-accent language; Tokyo Japanese, a lexical pitch accent language; and Seoul Korean, a non-stress and non-pitch accent language. The study tests for three possible sources of L2 tonal errors; namely, 1) universal phonological constraints (i.e. the Tonal Markedness Scale (TMS), the Obligatory Contour Principle (OCP), and Tone-Position Constraints (TPC)); 2) the transfer of L1 pitch patterns; and 3) a pedagogical problem of Tone 3. The data shows that these three factors jointly shape the properties of interlanguage grammars. This study finds that the TMS, the OCP, and TPC constrain L2 tone acquisition, but do so to varying degrees. Evidence is found that the TMS applies to both word- and sentence-level L2 productions. Some effects of the OCP are found to interact with the TMS and with L1 transfer effects. For example, patterns regarding tone pairs (more T1-T1 productions than T4-T4, and in turn more than T2-T2) can be attributed to either a case of the emergence of the unmarked interacting effects of the TMS and the OCP, or to local conjunction of the TMS. Learners are better at maintaining Rising (T2) at word-initial positions, but Falling (T4) at word-final positions. L2 learners often substitute other tones for target tones and the substitution patterns provide evidence for L1 transfer. For example, English speakers often use high falling tone while Japanese speakers tend to lengthen low tones to express monosyllabic narrow focus in sentences. This study found conflicting error and substitution patterns pertaining to Tone 3, as well as greater accuracy in processing Pre-T3 sandhi than the sandhi occurring elsewhere. This effect is argued to be attributed to the T3 [214]-First teaching method. In light of the three factors affecting L2 tone acquisition, this study proposes a constraint re-ranking model to provide a new way of viewing positive and negative transfer. It is demonstrated that some markedness constraints are promoted while some are demoted in the acquisition of tones.Doctor of Philosoph
    corecore