16 research outputs found

    Tagging Prosody and Discourse Structure in Elicited Spontaneous Speech

    Get PDF
    This paper motivates and describes the annotation and analysis of prosody and discourse structure for several large spoken language corpora. The annotation schema are of two types: tags for prosody and intonation, and tags for several aspects of discourse structure. The choice of the particular tagging schema in each domain is based in large part on the insights they provide in corpus-based studies of the relationship between discourse structure and the accenting of referring expressions in American English. We first describe these results and show that the same models account for the accenting of pronouns in an extended passage from one of the Speech Warehouse hotel-booking dialogues. We then turn to corpora described in Venditti [Ven00], which adapts the same models to Tokyo Japanese. Japanese is interesting to compare to English, because accent is lexically specified and so cannot mark discourse focus in the same way. Analyses of these corpora show that local pitch range expansion serves the analogous focusing function in Japanese. The paper concludes with a section describing several outstanding questions in the annotation of Japanese intonation which corpus studies can help to resolve.Work reported in this paper was supported in part by a grant from the Ohio State University Office of Research, to Mary E. Beckman and co-principal investigators on the OSU Speech Warehouse project, and by an Ohio State University Presidential Fellowship to Jennifer J. Venditti

    DIMA - Annotation Guidelines for German Intonation

    Get PDF
    Kügler F, Smolibocki B, Arnold D, et al. DIMA - Annotation Guidelines for German Intonation. In: Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, Scotland; 2015: 317.This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue). The overall high κ between 0.76 and 0.89 (depending on the speech style) shows that the DIMA conventions can be applied successfully

    Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch)

    Get PDF
    Speech researchers often rely on human annotation of prosody to generate data to test hypotheses and generate models. We present an overview of two prosodic annotation systems: ToBI (Tones and Break Indices) (Silverman et al., 1992), and RaP (Rhythm and Pitch) (Dilley & Brown, 2005), which was designed to address several limitations of ToBI. The paper reports two large-scale studies of inter-transcriber reliability for ToBI and RaP. Comparable reliability for both systems was obtained for a variety of prominence- and boundary-related agreement categories. These results help to establish RaP as an alternative to ToBI for research and technology applicationsNational Science Foundation (U.S.) (NSF grant BCS 0847653

    DIMA - Annotation Guidelines for German Intonation

    Get PDF
    Kügler F, Smolibocki B, Arnold D, et al. DIMA - Annotation Guidelines for German Intonation. In: Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, Scotland; 2015: 317.This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue). The overall high κ between 0.76 and 0.89 (depending on the speech style) shows that the DIMA conventions can be applied successfully

    On the phrasing properties of Hindi relative clauses

    Get PDF
    This paper presents results from a production experiment in Hindi, showing that differences in attachment site of object relative clauses result in prosodic differences when the antecedent of the relative clause (RC) is part of a complex NP with the structure N1 of N2. In particular, based on duration and F0 data we argue that the phrasing in a matrix sentence encodes the attachment site of the object RC. When the RC attaches high, i.e. modifying the head N1 of the complex NP, N2 and N1 form together a phonological phrase, while the verb of the matrix clause forms a phonological phrase on its own. In the case of low attachment, i.e. the RC modifies the genitive N2, the N2 forms its own phonological phrase, while N1 forms a phonological phrase with the verb of the matrix clause.Theoretical and Experimental Linguistic

    Acoustic Correlates of Information Structure.

    Get PDF
    This paper reports three studies aimed at addressing three questions about the acoustic correlates of information structure in English: (1) do speakers mark information structure prosodically, and, to the extent they do; (2) what are the acoustic features associated with different aspects of information structure; and (3) how well can listeners retrieve this information from the signal? The information structure of subject-verb-object sentences was manipulated via the questions preceding those sentences: elements in the target sentences were either focused (i.e., the answer to a wh-question) or given (i.e., mentioned in prior discourse); furthermore, focused elements had either an implicit or an explicit contrast set in the discourse; finally, either only the object was focused (narrow object focus) or the entire event was focused (wide focus). The results across all three experiments demonstrated that people reliably mark (1) focus location (subject, verb, or object) using greater intensity, longer duration, and higher mean and maximum F0, and (2) focus breadth, such that narrow object focus is marked with greater intensity, longer duration, and higher mean and maximum F0 on the object than wide focus. Furthermore, when participants are made aware of prosodic ambiguity present across different information structures, they reliably mark focus type, so that contrastively focused elements are produced with greater intensity, longer duration, and lower mean and maximum F0 than noncontrastively focused elements. In addition to having important theoretical consequences for accounts of semantics and prosody, these experiments demonstrate that linear residualisation successfully removes individual differences in people's productions thereby revealing cross-speaker generalisations. Furthermore, discriminant modelling allows us to objectively determine the acoustic features that underlie meaning differences

    Cross-language differences in fundamental frequency range: a comparison of English and German

    Get PDF
    This paper presents a systematic comparison of various measures of f0 range in female speakers of English and German. F0 range was analysed along two dimensions, level (i.e. overall f0 height) and span (extent of f0 modulation within a given speech sample). These were examined using two types of measures, one based on 'long-term distributional' (LTD) methods, and the other based on specific landmarks in speech that are linguistic in nature ('linguistic' measures). The various methods were used to identify whether and on what basis or bases speakers of these two languages differ in f0 range. Findings yielded significant cross-language differences in both dimensions of f0 range, but effect sizes were found to be larger for span than for level, and for linguistic than for LTD measures. The linguistic measures also uncovered some differences between the two languages in how f0 range varies through an intonation contour. This helps shed light on the relation between intonational structure and f0 range.caslAltenberg, E. P., and Ferrand, C. T. (2006). Fundamental frequency in monolingual English, bilingual English=Russian, and bilingual English- Cantonese young adult women,- J. Voice 20(1), 89-96. Awan, S. N., and Mueller, P. B. (1996). Speaking fundamental frequency characteristics of white, African American, and Hispanic kindergartners,- J. Speech. Hear. Res. 39(3), 573-577. Baken, R. J., and Orlikoff, R. F. (2000). Clinical Measurement of Speech and Voice, 2nd ed. (Singular Publishing Group, San Diego, CA). Banse, R., and Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression,- J. Pers. Soc. Psychol. 70(3), 614-636. Beckman, M., and Ayers Elam, G. (1997). Guidelines for ToBI Labeling, version 3 (Ohio State University, Ohio). Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate-a practical and powerful approach to multiple testing,- J. R. Statist. Soc. B 57(1), 289-300. Boersma, P., and Weenink, D. (2007). Praat: Doing phonetics by computer (version 4.6) [computer program],- http:==www.praat.org= (Last viewed May 14, 2007). Breen, M., Dilley, L. C., Kraemer, J., and Gibson, E. (2012). Inter-transcriber agreement for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch),- Corpus Linguist. Linguist. Theory (in press). Brown, A., and Docherty, G. J. (1995). Phonetic variation in dysarthric speech as a function of sampling task,- Eur. J. Disord. Commun. 30(1), 17-35. Chen, S. H. (2005). The effects of tones on speaking frequency and intensity ranges in Mandarin and Min dialects,- J. Acoust. Soc. Am. 117(5), 3225-3230. Clark-Carter, D. (1997). Doing Quantitative Psychological Research: From Design to Report (Psychology Press, Hove, East Sussex). Cohen, J. (1960). A coefficient for agreement for nominal scales,- Educ. Psychol. Meas. 20, 37-46. Deutsch, D., Le, J., Shen, J., and Henthorn, T. (2009). The pitch levels of female speech in two Chinese villages,- J. Acoust. Soc. Am. 125(5), EL208-EL213. Diehl, J. J., Watson, D., Bennetto, L., Mcdonough, J., and Gunlogson, C. (2009). An acoustic analysis of prosody in high-functioning autism,- Appl. Psycholinguist. 30(3), 385-404. Dilley, L. C., and Brown, M. (2007). Effects of pitch range variation on f0 extrema in an imitation task,- J. Phonetics 35(4), 523-551. Dolson, M. (1994). The pitch of speech as a function of linguistic community,- Music. Percept. 11(3), 321-331. Eady, S. J. (1982). Differences in the F0 patterns of speech: Tone language versus stress language,- Lang. Speech 25, 29-42. Eckert, H., and Laver, J. (1994). Menschen und ihre Stimmen: Aspekte der vokalen Kommunikation (Humans and their Voices: Aspects of Vocal Communication) (Psychologie Verlags Union, Weinheim). Escudero, D., Aguilar, L., Vanrell, M. M., and Prieto, P. (2012). Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labelling system,- Speech Communications, retrieved from http:==prosodia.upf. edu=home=arxiu=publicacions=escudero-et-al_analysis-intertranscriberconsistency- cattobi.pdf (Last viewed December 21, 2011). Field, A. (2005). Discovering Statistics using SPSS, 2nd ed. (SAGE Publications, London). Gibbon, D. (1998). German Intonation,- in Intonation Systems: A Survey of Twenty Languages, edited by D. J. Hirst and A. Di Christo (Cambridge University Press, Cambridge, MA), pp. 78-95. Grabe, E. (1998). Comparative intonational phonology: English and German,- Ph.D. thesis, Max Planck Institute for Psycholinguistics Nijmegen, Max Planck Institute Series in Psycholinguistics No. 7, Wageningen, Ponsen en Looien. Gussenhoven, C., Repp, B. H., Rietveld, A., Rump, H. H., and Terken, J. (1997). The perceptual prominence of fundamental frequency peaks,- J. Acoust. Soc. Am. 102(5), 3009-3022. Hanley, T. D., Snidecor, J. C., and Ringel, R. L. (1967). Some acoustic differences among languages,- Phonetica 14, 97-107. Hirschberg, J., and Ward, G. (1992). The influence of pitch range, duration, amplitude, and spectral features on the interpretation of the rise fall rise intonation contour in English,- J. Phonetics 20(2), 241-251. Hollien, H., Hollien, P. A., and de Jong, G. (1997). Effects of three parameters on speaking fundamental frequency,- J. Acoust. Soc. Am. 102(5), 2984-2992. Hubbard, K., and Trauner, D. A. (2007). Intonation and emotion in autistic spectrum disorders,- J. Psycholinguist. Res. 36(2), 159-173. Keating, P., and Kuo, G. (2010). Comparison of speaking fundamental frequency in English and Mandarin,- UCLA Work. Papers Phonetics 108, 164-187. Kreiman, J., and Van Lancker Sidtis, D. (2011). Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception (John Wiley and Sons, Chichester). Ladd, D. R. (2008). Intonational Phonology, 2nd ed. (Cambridge University Press, Cambridge). Ladd, D. R., Silverman, K. E. A., Tolkmitt, F., Bergmann, G., and Scherer, K. R. (1985). Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect,- J. Acoust. Soc. Am. 78(2), 435-444. Landis, J., and Koch, G. (1977). The measurement of observer agreement for categorical data,- Biometrics 33(1), 159-174. Liberman, M., and Pierrehumbert, J. (1984). Intonational invariance under changes in pitch range and length,- in Language Sound Structure, edited by M. Aronoff, R. Oehrle, F. Kelley, and B. W. Stephens (MIT Press, Cambridge, MA), pp. 157-233. Majewski, W., Hollien, H., and Zalewski, J. (1972). Speaking fundamental frequency of Polish adult males,- Phonetica 25(2), 119-125. Mangold, M., and Grebe, P. (2005). Duden Ausspracheworterbuch (Duden Pronunciation Dictionary), 6th ed. (Dudenverlag, Mannheim). Nishio, M., and Niimi, S. (2008). Changes in speaking fundamental frequency characteristics with aging,- Folia Phoniatr. Logo. 60(3), 120-127. NIST=SEMATECH e-Handbook of Statistical Methods, (2010). http:==www.itl.nist.gov=div898=handbook= (Last viewed October 26, 2010). Patterson, D. (2000). A linguistic approach to pitch range modelling,- Ph.D. thesis, University of Edinburgh, Edinburgh. Pierrehumbert, J. (1979). Perception of fundamental-frequency declination,- J. Acoust. Soc. Am. 66(2), 363-369. Pierrehumbert, J. (1980). The phonology and phonetics of English intonation,- Ph.D. thesis, MIT, Cambridge, MA. Rendall, D., Vokey, J. R., and Nemeth, C. (2007). Lifting the curtain on the Wizard of Oz: Biased voice-based impressions of speaker size,- J. Exp. Psychol. Hum. Percept. Perform. 33(5), 1208-1219. Sobin, C., and Alpert, M. (1999). Emotion in speech: The acoustic attributes of fear, anger, sadness, and joy,- J. Psycholinguist. Res. 28(4), 347-365. Terken, J. (1994). Fundamental-frequency and perceived prominence of accented syllables II: Nonfinal accents,- J. Acoust. Soc. Am. 95(6), 3662-3665. 't Hart, J., Collier, R., and Cohen, A. (1990). A Perceptual Study of Intonation (Cambridge University Press, Cambridge). Van Bezooijen, R. (1995). Sociocultural aspects of pitch differences between Japanese and Dutch women,- Lang. Speech 38, 253-265. Van Dommelen, W. A., and Moxness, B. H. (1995). Acoustic parameters in speaker height and weight identification: Sex-specific behaviour,- Lang. Speech 38, 267-287. Wells, J. C. (1982). Accents of English (Cambridge University Press, Cambridge), Vols. 1-3. Yoon, T., Chavarria, S., Cole, J., and Hasegawa, M. (2004). Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI,- Proc. Interspeech 2004, 2729-2732.131pub2622pub

    Methods in prosody

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study

    The identification and function of English prosodic features

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2007.Includes bibliographical references (leaves 98-102).This thesis contains three sets of studies designed to explore the identification and function of prosodic features in English. The first set of studies explores the identification of prosodic features using prosodic annotation. We compared inter-rater agreement for two current prosodic annotation schemes, ToBI (Silverman, et al., 1992) and RaP (Dilley & Brown, 2005) which provide guidelines for the identification of English prosodic features. The studies described here survey inter-rater agreement for both novice and expert raters in both systems, and for both spontaneous and read speech. The results indicate high agreement for both systems on binary classification, but only moderate agreement for categories with more than two levels. The second section explores an aspect of the function of prosody in determining the propositional content of a sentence by investigating the relationship between syntactic structure and intonational phrasing. The first study tests and refines a model designed to predict the intonational phrasing of a sentence given the syntactic structure. In further analysis, we demonstrate that specific acoustic cues-word duration and the presence of silence after a word, can give rise to the perception of intonational boundaries. The final set of experiments explores the relationship between prosody and information structure, and how this relationship is realized acoustically. In a series of four experiments, we manipulated the information status of elements of declarative sentences by varying the questions that preceded those sentences. We found that all of the acoustic features we tested-duration, f0, and intensity-were utilized by speakers to indicate the location of an accented element. However, speakers did not consistently indicate differences in information status type (wide focus, new information, contrastive information) with the acoustic features we investigated.by Mara E. Breen.Ph.D
    corecore