73 research outputs found
Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch)
Speech researchers often rely on human annotation of prosody to generate data to test hypotheses and generate models. We present an overview of two prosodic annotation systems: ToBI (Tones and Break Indices) (Silverman et al., 1992), and RaP (Rhythm and Pitch) (Dilley & Brown, 2005), which was designed to address several limitations of ToBI. The paper reports two large-scale studies of inter-transcriber reliability for ToBI and RaP. Comparable reliability for both systems was obtained for a variety of prominence- and boundary-related agreement categories. These results help to establish RaP as an alternative to ToBI for research and technology applicationsNational Science Foundation (U.S.) (NSF grant BCS 0847653
Applying a fuzzy classifier to generate Sp ToBI annotation : preliminar results
One of the goals of the Glissando research project1 is to enrich a radio news corpus [1] with Sp ToBI labels. In this paper we present the application of the automatic predictions of a fuzzy classifier to speed up the labeling process. The strategy is proposed after completing the following steps: a) manual annotation of a part of the Glissando corpus with Sp ToBI labels and checking of the coherence of the labels; b) training of the automatic system; c) validation or correction of the automatic system's predictions by a human expert. The automatic judgments of the classifier are enriched with confidence measures that are useful to represent uncertain situations concerning the label to be assigned. The main aim of the paper is to show that there exists a correspondence between the uncertain situations that are identified during an inter-transcriber experiment and the uncertain situations that the fuzzy classifier detects. Labeling time reduction encourages the use of this strateg
Consistency in transcription and labelling of German intonation with GToBI
A diverse set of speech data was labelled in three sites by 13 transcribers with differing levels of expertise, using GToBI, a consensus transcription system for German intonation. Overall inter-transcriber -consistency suggests that, with training, labellers can acquire sufficient skill with GToBI for large-scale database labelling. 1
Recommended from our members
Chapter 2: The Original ToBI System and the Evolution of the ToBI Framework
In this chapter, the authors will try to identify the essential properties of a ToBI framework annotation system by describing the development and design of the original ToBI conventions. In this description, the authors will overview the general phonological theory and the specific theory of Mainstream American English intonation and prosody that the authors decided to incorporate in the original ToBI tags. The authors will also state the practical principles that led us to make the decisions that the authors did. The chapter is organised as follows. Section 2.2 briefly chronicles how the MAE_ToBI system came into being. Section 2.3 briefly describes the consensus account of English intonation and prosody on which the MAE_ToBI system is based. Section 2.4 catalogues the different components of a MAE_ToBI transcription and lists the salient rules which constrain the relationships between different components. This section also expands upon the theoretical foundations and practical consequences of adopting the general structure of multiple labelling tiers, and particularly the separation of the labels for tones from the labels for indexing prosodic boundary strength. Section 2.5 then describes some of the extensions of the basic ToBI tiers that have been adopted by some sites. This section also compares our decisions about the number of tiers and about inter-tier constraints with the analogous decisions for some of the other ToBI systems described in this book. Section 2.6 discusses the status of the symbolic labels relative to the continuous phonetic records that are also an obligatory component of the MAE_ToBI transcription. Section 2.7 then closes by listing several open research questions that the authors would like to see addressed by MAE_ToBI users and the larger ToBI community
Short-term periodicity of prosodic phrasing: Corpus-based evidence
Speech is perceived as a sequence of meaningful units ofvarious lengths, from phones to phrases. Prosody is one of themeans by which these are segmented: Prosodic boundaries sub-divide utterances into prosodic phrases. In this corpus study, westudy prosodic boundaries from a neurolinguistic perspective.To be perceived correctly, prosodic phrases must obey neuro-biological constraints. In particular, electrophysiological pro-cessing has been argued to operate periodically, with one elec-trophysiological processing cycle being devoted to the process-ing of exactly one prosodic phrase. We thus hypothesized thatprosodic phrases as such should show periodicity. We assessthe DIRNDL corpus of German radio news, which has been an-notated for intonational and intermediate phrases. We find thatsequences of 2–5 intermediate phrases are periodic at 0.8–1.6Hertz within their superordinate intonation phrase. Across ut-terances, the duration of intermediate phrases alternates with theduration of superordinate intonation phrases, indicating a de-pendence of prosodic time scales. While the determinants of pe-riodicity are unknown, the results are compatible with an asso-ciation between periodic electrophysiological processing mech-anisms and the rhythm of prosody. This contributes to closingthe gap between the the neurobiology of language and linguisticdescription
Tagging Prosody and Discourse Structure in Elicited Spontaneous Speech
This paper motivates and describes the annotation and analysis of prosody and discourse structure for several large spoken language corpora. The annotation schema are of two types: tags for prosody and intonation, and tags for several aspects of discourse structure. The choice of the particular tagging schema in each domain is based in large part on the insights they provide in corpus-based studies of the relationship between discourse structure and the accenting of referring expressions in American English. We first describe these results and show that the same models account for the accenting of pronouns in an extended passage from one of the Speech Warehouse hotel-booking dialogues. We then turn to corpora described in Venditti [Ven00], which adapts the same models to Tokyo Japanese. Japanese is interesting to compare to English, because accent is lexically specified and so cannot mark discourse focus in the same way. Analyses of these corpora show that local pitch range expansion serves the analogous focusing function in Japanese. The paper concludes with a section describing several outstanding questions in the annotation of Japanese intonation which corpus studies can help to resolve.Work reported in this paper was supported in part by a grant from the Ohio State University Office of Research, to Mary E. Beckman and co-principal investigators on the OSU Speech Warehouse project, and by an Ohio State University Presidential Fellowship to Jennifer J. Venditti
The identification and function of English prosodic features
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2007.Includes bibliographical references (leaves 98-102).This thesis contains three sets of studies designed to explore the identification and function of prosodic features in English. The first set of studies explores the identification of prosodic features using prosodic annotation. We compared inter-rater agreement for two current prosodic annotation schemes, ToBI (Silverman, et al., 1992) and RaP (Dilley & Brown, 2005) which provide guidelines for the identification of English prosodic features. The studies described here survey inter-rater agreement for both novice and expert raters in both systems, and for both spontaneous and read speech. The results indicate high agreement for both systems on binary classification, but only moderate agreement for categories with more than two levels. The second section explores an aspect of the function of prosody in determining the propositional content of a sentence by investigating the relationship between syntactic structure and intonational phrasing. The first study tests and refines a model designed to predict the intonational phrasing of a sentence given the syntactic structure. In further analysis, we demonstrate that specific acoustic cues-word duration and the presence of silence after a word, can give rise to the perception of intonational boundaries. The final set of experiments explores the relationship between prosody and information structure, and how this relationship is realized acoustically. In a series of four experiments, we manipulated the information status of elements of declarative sentences by varying the questions that preceded those sentences. We found that all of the acoustic features we tested-duration, f0, and intensity-were utilized by speakers to indicate the location of an accented element. However, speakers did not consistently indicate differences in information status type (wide focus, new information, contrastive information) with the acoustic features we investigated.by Mara E. Breen.Ph.D
Recommended from our members
Production of English Prominence by Native Mandarin Chinese Speakers
Native-like production of intonational prominence is important for spoken language competency. Non-native speakers may have trouble producing prosodic variation in a second language (L2) and thus, problems in being understood. By identifying common sources of production error, we will be able to aid in the instruction of L2 speakers. In this paper we present results of a production study designed to test the ability of Mandarin L1 speakers to produce prominence in English. Our results show that there are some consistent differences between the L1 and L2 speakers in the use of pitch to indicate prominence, as well as in the accenting of phrase-initial tokens. We also find that we can automatically detect prominence on Mandarin L1 English with 87.23% and an f-measure of 0.866 if we train a classifier with annotated Mandarin L1 English data. Models trained on native English speech can detect prominence in Mandarin L1 English with an accuracy of 74.77% and f-measure of 0.824
From text to prosody without ToBI
A new method for predicting prosodic parameters, i.e. phone durations and F0 targets, from preprocessed text is presented. The prosody model comprises a set of CARTs, which are learned from a large database of labeled speech. This database need not be annotated with Tone and Break Indices (ToBI labels). Instead, a
simpler symbolic prosodic description is created by a bootstrapping method. The method had been applied to one Spanish and two German speakers. For the German voices, two listening tests
showed a significant preference for the new method over a more traditional approach of prosody prediction, based on hand-crafted
rules
- …