568 research outputs found
Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese
Mandarin Chinese is characterized by being a tonal language; the pitch (or
) of its utterances carries considerable linguistic information. However,
speech samples from different individuals are subject to changes in amplitude
and phase which must be accounted for in any analysis which attempts to provide
a linguistically meaningful description of the language. A joint model for
amplitude, phase and duration is presented which combines elements from
Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects
Models. By decomposing functions via a functional principal component analysis,
and connecting registration functions to compositional data analysis, a joint
multivariate mixed effect model can be formulated which gives insights into the
relationship between the different modes of variation as well as their
dependence on linguistic and non-linguistic covariates. The model is applied to
the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin,
containing approximately 50 thousand phonetically diverse sample contours
(syllables), and reveals that phonetic information is jointly carried by both
amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio
Explaining the PENTA model: a reply to Arvaniti and Ladd
This paper presents an overview of the Parallel Encoding and Target Approximation (PENTA) model of speech prosody, in response to an extensive critique by Arvaniti & Ladd (2009). PENTA is a framework for conceptually and computationally linking communicative meanings to fine-grained prosodic details, based on an articulatory-functional view of speech. Target Approximation simulates the articulatory realisation of underlying pitch targets – the prosodic primitives in the framework. Parallel Encoding provides an operational scheme that enables simultaneous encoding of multiple communicative functions. We also outline how PENTA can be computationally tested with a set of software tools. With the help of one of the tools, we offer a PENTA-based hypothetical account of the Greek intonational patterns reported by Arvaniti & Ladd, showing how it is possible to predict the prosodic shapes of an utterance based on the lexical and postlexical meanings it conveys
Modelling Japanese intonation using PENTAtrainer2
This paper presents results from Japanese intonation modelling using PENTAtrainer2, an articulatory synthesiser. Our first aim is to show that PENTA, on which PENTAtrainer2 is based, can achieve high accuracy in predictive synthesis of varying intonation contours. We trained the synthesiser on a 6251-sentence functionally annotated corpus and generated F0 contours for each communicative condition. The accuracy of speaker-dependent and independent synthesis, together with naturalness ratings, show that PENTA is effective in modelling Japanese intonation. This suggests that once contextual variability is incorporated into a model, multi-functional targets alone would suffice as the prosodic representation even in a sizeable corpus.postprin
Multiple prosodic meanings are conveyed through separate pitch ranges: Evidence from perception of focus and surprise in Mandarin Chinese
F0 variation is a crucial feature in speech prosody, which can convey linguistic information such as focus and paralinguistic meanings such as surprise. How can multiple layers of information be represented with F0 in speech: are they divided into discrete layers of pitch or overlapped without clear divisions? We investigated this question by assessing pitch perception of focus and surprise in Mandarin Chinese. Seventeen native Mandarin listeners rated the strength of focus and surprise conveyed by the same set of synthetically manipulated sentences. An fMRI experiment was conducted to assess neural correlates of the listeners’ perceptual response to the stimuli. The results showed that behaviourally, the perceptual threshold for focus was 3 semitones and that for surprise was 5 semitones above the baseline. Moreover, the pitch range of 5-12 semitones above the baseline signalled both focus and surprise, suggesting a considerable overlap between the two types of prosodic information within this range. The neuroimaging data positively correlated with the variations in behavioural data. Also, a ceiling effect was found as no significant behavioural differences or neural activities were shown after reaching a certain pitch level for the perception of focus and surprise respectively. Together, the results suggest that different layers of prosodic information are represented in F0 through different pitch ranges: paralinguistic information is represented at a pitch range beyond that used by linguistic information. Meanwhile, the representation of paralinguistic information is achieved without obscuring linguistic prosody, thus allowing F0 to represent the two layers of information in parallel
From communicative functions to prosodic forms
This is a proposal in favour of proceeding from communicative function to linguistic form, rather than the reverse, for an insightful account of how humans communicate by speech in languages. A functional framework is developed that encompasses argumentation structures, declarative and interrogative functions, and expressive intensification. Such a function orientation can become a powerful tool in comparative prosodic research across the world's languages. The potential of this approach is shown by comparing the prosodic form of Mandarin Chinese data collected in functionally contextualized scenarios with corresponding data from English and German
Pre-Low Raising in Japanese Pitch Accent
Japanese has been observed to have 2 versions of the H tone, the higher of which is associated with an accented mora. However, the distinction of these 2 versions only surfaces in context but not in isolation, leading to a long-standing debate over whether there is 1 H tone or 2. This article reports evidence that the higher version may result from a pre-low raising mechanism rather than being inherently higher. The evidence is based on an analysis of F0 of words that varied in length, accent condition and syllable structure, produced by native speakers of Japanese at 2 speech rates. The data indicate a clear separation between effects that are due to mora-level preplanning and those that are mechanical. These results are discussed in terms of mechanisms of laryngeal control during tone production, and highlight the importance of articulation as a link between phonology and surface acoustics.postprin
Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning
Variability has been one of the major challenges for both theoretical understanding and computer synthesis of speech prosody. In this paper we show that economical representation of variability is the key to effective modeling of prosody. Specifically, we report the development of PENTAtrainer—A trainable yet deterministic prosody synthesizer based on an articulatory–functional view of speech. We show with testing results on Thai, Mandarin and English that it is possible to achieve high-accuracy predictive synthesis of fundamental frequency contours with very small sets of parameters obtained through stochastic learning from real speech data. The first key component of this system is syllable-synchronized sequential target approximation—implemented as the qTA model, which is designed to simulate, for each tonal unit, a wide range of contextual variability with a single invariant target. The second key component is the automatic learning of function-specific targets through stochastic global optimization, guided by a layered pseudo-hierarchical functional annotation scheme, which requires the manual labeling of only the temporal domains of the functional units. The results in terms of synthesis accuracy demonstrate that effective modeling of the contextual variability is the key also to effective modeling of function-related variability. Additionally, we show that, being both theory-based and trainable (hence data-driven), computational systems like PENTAtrainer can serve as an effective modeling tool in basic research, with which the level of falsifiability in theory testing can be raised, and also a closer link between basic and applied research in speech science can be developed
Computational modelling of double focus in American English
This study investigated how double focus in English
statements and questions can be computationally
modelled. PENTAtrainer2 was used to learn
syllable-sized multi-functional targets from a corpus
of 1960 English utterances, with controlled
variations in lexical stress, focus, modality, and
sentence length. The results showed that the learned
targets could generate F0 contours close to the
original. In particular, the asymmetry in the
interaction between focus and modality was
effectively simulated
- …