1,101 research outputs found

    Universal and language-specific processing : the case of prosody

    Get PDF
    A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

    Generating Tailored, Comparative Descriptions with Contextually Appropriate Intonation

    Get PDF
    Generating responses that take user preferences into account requires adaptation at all levels of the generation process. This article describes a multi-level approach to presenting user-tailored information in spoken dialogues which brings together for the first time multi-attribute decision models, strategic content planning, surface realization that incorporates prosody prediction, and unit selection synthesis that takes the resulting prosodic structure into account. The system selects the most important options to mention and the attributes that are most relevant to choosing between them, based on the user model. Multiple options are selected when each offers a compelling trade-off. To convey these trade-offs, the system employs a novel presentation strategy which straightforwardly lends itself to the determination of information structure, as well as the contents of referring expressions. During surface realization, the prosodic structure is derived from the information structure using Combinatory Categorial Grammar in a way that allows phrase boundaries to be determined in a flexible, data-driven fashion. This approach to choosing pitch accents and edge tones is shown to yield prosodic structures with significantly higher acceptability than baseline prosody prediction models in an expert evaluation. These prosodic structures are then shown to enable perceptibly more natural synthesis using a unit selection voice that aims to produce the target tunes, in comparison to two baseline synthetic voices. An expert evaluation and f0 analysis confirm the superiority of the generator-driven intonation and its contribution to listeners' ratings

    Investigating variation in Arabic intonation : : the case for a multi-level corpus approach

    Get PDF
    This paper provides a first description of the intonational patterns of San‘aani Arabic (SA, the dialect of Arabic spoken in the capital of Yemen) and a comparison of these patterns with those observed in Cairene Arabic (CA), revealing differences between the two varieties which mirror cross-linguistic prosodic variation. The SA analysis is based on qualitative transcription of portions of a multi-level corpus, including read speech sentences, a narrative retold from memory and a sociolinguistic data collection tool which yields free conversation data in the desired variety as well as information that can be used to confirm which variety is being used. The corpus design and methodology serve as a prototype for larger data collection to document intonational variation in Arabic

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Intonation, word order and focus projection in Serbo-Croatian

    Get PDF
    LoC Class: PG1224.7, LoC Subject Headings: Serbo-Croatian language--Intonation, Serbo-Croatian language--Word orde

    English prosodic marking of Information Structure by L1-Japanese second language learners

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2018

    Rapid neural processing of grammatical tone in second language learners

    Get PDF
    The present dissertation investigates how beginner learners process grammatical tone in a second language and whether their processing is influenced by phonological transfer. Paper I focuses on the acquisition of Swedish grammatical tone by beginner learners from a non-tonal language, German. Results show that non-tonal beginner learners do not process the grammatical regularities of the tones but rather treat them akin to piano tones. A rightwards-going spread of activity in response to pitch difference in Swedish tones possibly indicates a process of tone sensitisation. Papers II to IV investigate how artificial grammatical tone, taught in a word-picture association paradigm, is acquired by German and Swedish learners. The results of paper II show that interspersed mismatches between grammatical tone and picture referents evoke an N400 only for the Swedish learners. Both learner groups produce N400 responses to picture mismatches related to grammatically meaningful vowel changes. While mismatch detection quickly reaches high accuracy rates, tone mismatches are least accurately and most slowly detected in both learner groups. For processing of the grammatical L2 words outside of mismatch contexts, the results of paper III reveal early, preconscious and late, conscious processing in the Swedish learner group within 20 minutes of acquisition (word recognition component, ELAN, LAN, P600). German learners only produce late responses: a P600 within 20 minutes and a LAN after sleep consolidation. The surprisingly rapid emergence of early grammatical ERP components (ELAN, LAN) is attributed to less resource-heavy processing outside of violation contexts. Results of paper IV, finally, indicate that memory trace formation, as visible in the word recognition component at ~50 ms, is only possible at the highest level of formal and functional similarity, that is, for words with falling tone in Swedish participants. Together, the findings emphasise the importance of phonological transfer in the initial stages of second language acquisition and suggest that the earlier the processing, the more important the impact of phonological transfer
    corecore