568 research outputs found

    Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese

    Full text link
    Mandarin Chinese is characterized by being a tonal language; the pitch (or F0F_0) of its utterances carries considerable linguistic information. However, speech samples from different individuals are subject to changes in amplitude and phase which must be accounted for in any analysis which attempts to provide a linguistically meaningful description of the language. A joint model for amplitude, phase and duration is presented which combines elements from Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects Models. By decomposing functions via a functional principal component analysis, and connecting registration functions to compositional data analysis, a joint multivariate mixed effect model can be formulated which gives insights into the relationship between the different modes of variation as well as their dependence on linguistic and non-linguistic covariates. The model is applied to the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin, containing approximately 50 thousand phonetically diverse sample F0F_0 contours (syllables), and reveals that phonetic information is jointly carried by both amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio

    Explaining the PENTA model: a reply to Arvaniti and Ladd

    Get PDF
    This paper presents an overview of the Parallel Encoding and Target Approximation (PENTA) model of speech prosody, in response to an extensive critique by Arvaniti & Ladd (2009). PENTA is a framework for conceptually and computationally linking communicative meanings to fine-grained prosodic details, based on an articulatory-functional view of speech. Target Approximation simulates the articulatory realisation of underlying pitch targets – the prosodic primitives in the framework. Parallel Encoding provides an operational scheme that enables simultaneous encoding of multiple communicative functions. We also outline how PENTA can be computationally tested with a set of software tools. With the help of one of the tools, we offer a PENTA-based hypothetical account of the Greek intonational patterns reported by Arvaniti & Ladd, showing how it is possible to predict the prosodic shapes of an utterance based on the lexical and postlexical meanings it conveys

    Modelling Japanese intonation using PENTAtrainer2

    Get PDF
    This paper presents results from Japanese intonation modelling using PENTAtrainer2, an articulatory synthesiser. Our first aim is to show that PENTA, on which PENTAtrainer2 is based, can achieve high accuracy in predictive synthesis of varying intonation contours. We trained the synthesiser on a 6251-sentence functionally annotated corpus and generated F0 contours for each communicative condition. The accuracy of speaker-dependent and independent synthesis, together with naturalness ratings, show that PENTA is effective in modelling Japanese intonation. This suggests that once contextual variability is incorporated into a model, multi-functional targets alone would suffice as the prosodic representation even in a sizeable corpus.postprin

    Multiple prosodic meanings are conveyed through separate pitch ranges: Evidence from perception of focus and surprise in Mandarin Chinese

    Get PDF
    F0 variation is a crucial feature in speech prosody, which can convey linguistic information such as focus and paralinguistic meanings such as surprise. How can multiple layers of information be represented with F0 in speech: are they divided into discrete layers of pitch or overlapped without clear divisions? We investigated this question by assessing pitch perception of focus and surprise in Mandarin Chinese. Seventeen native Mandarin listeners rated the strength of focus and surprise conveyed by the same set of synthetically manipulated sentences. An fMRI experiment was conducted to assess neural correlates of the listeners’ perceptual response to the stimuli. The results showed that behaviourally, the perceptual threshold for focus was 3 semitones and that for surprise was 5 semitones above the baseline. Moreover, the pitch range of 5-12 semitones above the baseline signalled both focus and surprise, suggesting a considerable overlap between the two types of prosodic information within this range. The neuroimaging data positively correlated with the variations in behavioural data. Also, a ceiling effect was found as no significant behavioural differences or neural activities were shown after reaching a certain pitch level for the perception of focus and surprise respectively. Together, the results suggest that different layers of prosodic information are represented in F0 through different pitch ranges: paralinguistic information is represented at a pitch range beyond that used by linguistic information. Meanwhile, the representation of paralinguistic information is achieved without obscuring linguistic prosody, thus allowing F0 to represent the two layers of information in parallel

    From communicative functions to prosodic forms

    Get PDF
    This is a proposal in favour of proceeding from communicative function to linguistic form, rather than the reverse, for an insightful account of how humans communicate by speech in languages. A functional framework is developed that encompasses argumentation structures, declarative and interrogative functions, and expressive intensification. Such a function orientation can become a powerful tool in comparative prosodic research across the world's languages. The potential of this approach is shown by comparing the prosodic form of Mandarin Chinese data collected in functionally contextualized scenarios with corresponding data from English and German

    Pre-Low Raising in Japanese Pitch Accent

    Get PDF
    Japanese has been observed to have 2 versions of the H tone, the higher of which is associated with an accented mora. However, the distinction of these 2 versions only surfaces in context but not in isolation, leading to a long-standing debate over whether there is 1 H tone or 2. This article reports evidence that the higher version may result from a pre-low raising mechanism rather than being inherently higher. The evidence is based on an analysis of F0 of words that varied in length, accent condition and syllable structure, produced by native speakers of Japanese at 2 speech rates. The data indicate a clear separation between effects that are due to mora-level preplanning and those that are mechanical. These results are discussed in terms of mechanisms of laryngeal control during tone production, and highlight the importance of articulation as a link between phonology and surface acoustics.postprin

    Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning

    Get PDF
    Variability has been one of the major challenges for both theoretical understanding and computer synthesis of speech prosody. In this paper we show that economical representation of variability is the key to effective modeling of prosody. Specifically, we report the development of PENTAtrainer—A trainable yet deterministic prosody synthesizer based on an articulatory–functional view of speech. We show with testing results on Thai, Mandarin and English that it is possible to achieve high-accuracy predictive synthesis of fundamental frequency contours with very small sets of parameters obtained through stochastic learning from real speech data. The first key component of this system is syllable-synchronized sequential target approximation—implemented as the qTA model, which is designed to simulate, for each tonal unit, a wide range of contextual variability with a single invariant target. The second key component is the automatic learning of function-specific targets through stochastic global optimization, guided by a layered pseudo-hierarchical functional annotation scheme, which requires the manual labeling of only the temporal domains of the functional units. The results in terms of synthesis accuracy demonstrate that effective modeling of the contextual variability is the key also to effective modeling of function-related variability. Additionally, we show that, being both theory-based and trainable (hence data-driven), computational systems like PENTAtrainer can serve as an effective modeling tool in basic research, with which the level of falsifiability in theory testing can be raised, and also a closer link between basic and applied research in speech science can be developed

    Computational modelling of double focus in American English

    Get PDF
    This study investigated how double focus in English statements and questions can be computationally modelled. PENTAtrainer2 was used to learn syllable-sized multi-functional targets from a corpus of 1960 English utterances, with controlled variations in lexical stress, focus, modality, and sentence length. The results showed that the learned targets could generate F0 contours close to the original. In particular, the asymmetry in the interaction between focus and modality was effectively simulated
    • …
    corecore