Search CORE

830 research outputs found

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

Author: Stehwien Sabrina
Vu Ngoc Thang
Publication venue
Publication date: 02/06/2017
Field of study

This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Exploiting Contextual Information for Prosodic Event Detection Using Auto-Context

Author: Johnson Michael T
Liu Jia
Xia Shanhong
Yang Hua
Zhang Wei-Qiang
Zhao Junhong
Publication venue: e-Publications@Marquette
Publication date: 01/12/2013
Field of study

Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection

epublications@Marquette

Crossref

Springer - Publisher Connector

Structuring information through gesture and intonation

Author: Jannedy Stefanie
Mendoza-Denton Norma
Publication venue
Publication date: 01/01/2005
Field of study

Face-to-face communication is multimodal. In unscripted spoken discourse we can observe the interaction of several “semiotic layers”, modalities of information such as syntax, discourse structure, gesture, and intonation. We explore the role of gesture and intonation in structuring and aligning information in spoken discourse through a study of the co-occurrence of pitch accents and gestural apices. Metaphorical spatialization through gesture also plays a role in conveying the contextual relationships between the speaker, the government and other external forces in a naturally-occurring political speech setting

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Annotation graphs as a framework for multidimensional linguistic data analysis

Author: Bird Steven
Liberman Mark
Publication venue
Publication date: 01/01/1999
Field of study

In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet powerful method for representing complex annotation structures incorporating hierarchy and overlap. Here, we motivate and illustrate our approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain specialists, we have constructed a hybrid multi-level annotation for a fragment of the Boston University Radio Speech Corpus which includes the following levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named entity. We show how annotation graphs can represent hybrid multi-level structures which derive from a diverse set of file formats. We also show how the approach facilitates substantive comparison of multiple annotations of a single signal based on different theoretical models. The discussion shows how annotation graphs open the door to wide-ranging integration of tools, formats and corpora.Comment: 10 pages, 10 figures, Towards Standards and Tools for Discourse Tagging, Proceedings of the Workshop. pp. 1-10. Association for Computational Linguistic

arXiv.org e-Print Archive

CiteSeerX

Tagging Prosody and Discourse Structure in Elicited Spontaneous Speech

Author: Beckman Mary E.
Venditti Jennifer J.
Publication venue: Ohio State University. Department of Linguistics
Publication date: 01/01/2000
Field of study

This paper motivates and describes the annotation and analysis of prosody and discourse structure for several large spoken language corpora. The annotation schema are of two types: tags for prosody and intonation, and tags for several aspects of discourse structure. The choice of the particular tagging schema in each domain is based in large part on the insights they provide in corpus-based studies of the relationship between discourse structure and the accenting of referring expressions in American English. We first describe these results and show that the same models account for the accenting of pronouns in an extended passage from one of the Speech Warehouse hotel-booking dialogues. We then turn to corpora described in Venditti [Ven00], which adapts the same models to Tokyo Japanese. Japanese is interesting to compare to English, because accent is lexically specified and so cannot mark discourse focus in the same way. Analyses of these corpora show that local pitch range expansion serves the analogous focusing function in Japanese. The paper concludes with a section describing several outstanding questions in the annotation of Japanese intonation which corpus studies can help to resolve.Work reported in this paper was supported in part by a grant from the Ohio State University Office of Research, to Mary E. Beckman and co-principal investigators on the OSU Speech Warehouse project, and by an Ohio State University Presidential Fellowship to Jennifer J. Venditti

KnowledgeBank at OSU

Consistency of prosodic transcriptions : labelling experiments with trained and untrained transcribers

Author: Reyelt Matthias
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

Universaar

Acronym

Using the ToBI transcription to record the intonation of Slovene

Author: Volk Jana
Publication venue: Znanstvena založba Filozofske fakultete
Publication date: 30/12/2015
Field of study

The paper presents ToBI, a transcription method for prosodic annotation. ToBI is an acronym for Tones and Breaks Indices which first denoted an intonation system developed in the 1990s for annotating intonation and prosody in the database of spoken Mainstream American English. The MAE_ToBI transcription originally consists of six parts - the audio recording of the utterance, the fundamental frequency contour and four parallel tiers for the transcription of tone sequence, ortographic transcription, indication of break indices between words and for additional observations. The core of the transcription, i. e. of the phonological analyses of the intonation pattern, is represented by the tone tier where tonal variation is transcribed by using labels for high tone and low tone where a tone can appear as a pitch accent, phrase accentand boundary tone. Due to its simplicity and flexibility, the system soon began to be used for the prosodic annotation of other variants of English and many other languages, as well as in different non-linguistic fields, leading to the creation of many new ToBI systems adapted to individual languages and dialects. The author is the first to use this method for Slovene, more precisely, for the intonational transcription and analysis of the corpus of spontaneous speech of Slovene Istria, in order to investigate if the ToBi system is useful for the annotation of Slovene and its regional variants.Članek predstavlja ToBI, transkripcijsko metodo za zapis prozodičnih dogodkov. ToBI je kratica za Tones and Breaks Indices, ki izvirno poimenuje intonacijski sistem, ki je bil razvit v 90-ih letih prejšnjega stoletja in zgrajen za označevanje intonacije in prozodije v podatkovni bazi govorjene ameriške angleščine (Mainstream American English). MAE_ToBI transkripcija po prvotnem dogovoru sestoji iz šestih delov - iz zvočnega posnetka izreka, zapisa poteka osnovne frekvence in štirih vzporedno poravnanih pasov, ki so namenjeni transkripciji tonskega poteka, ortografskemu zapisu izreka, označevanju jakosti mej med besedami ter zapisovanju dodatnih opazovanj. Jedro zapisa oziroma fonoloških analiz intonacijskega vzorca predstavlja tonski pas, v katerem z oznakami za visoki in nizki ton transkribiramo razlikovalna tonska nihanja. Sistem se je zaradi svoje enostavnosti in prilagodljivosti hitro razširil na prozodično označevanje ostalih variant angleščine in mnogih drugih jezikov ter na različna nelingvistična področja, nastali so številnih novih ToBI-sistemi, prilagojeni posameznim jezikom ali narečjem. Metoda je bila prvič uporabljena za zapis in analizo intonacije na korpusu spontanega govora govorcev v Slovenski Istri z namenom preizkusiti, v kolikšni meri je ToBI primeren za opis intonacije slovenskega jezika in njegovih pokrajinskih različic

Repository of the University of Ljubljana

Repository of University of Primorska

Recommended from our members

Chapter 2: The Original ToBI System and the Evolution of the ToBI Framework

Author: Beckman Mary E.
Hirschberg Julia Bell
Shattuck-Hufnagel Stefanie
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

In this chapter, the authors will try to identify the essential properties of a ToBI framework annotation system by describing the development and design of the original ToBI conventions. In this description, the authors will overview the general phonological theory and the specific theory of Mainstream American English intonation and prosody that the authors decided to incorporate in the original ToBI tags. The authors will also state the practical principles that led us to make the decisions that the authors did. The chapter is organised as follows. Section 2.2 briefly chronicles how the MAE_ToBI system came into being. Section 2.3 briefly describes the consensus account of English intonation and prosody on which the MAE_ToBI system is based. Section 2.4 catalogues the different components of a MAE_ToBI transcription and lists the salient rules which constrain the relationships between different components. This section also expands upon the theoretical foundations and practical consequences of adopting the general structure of multiple labelling tiers, and particularly the separation of the labels for tones from the labels for indexing prosodic boundary strength. Section 2.5 then describes some of the extensions of the basic ToBI tiers that have been adopted by some sites. This section also compares our decisions about the number of tiers and about inter-tier constraints with the analogous decisions for some of the other ToBI systems described in this book. Section 2.6 discusses the status of the symbolic labels relative to the continuous phonetic records that are also an obligatory component of the MAE_ToBI transcription. Section 2.7 then closes by listing several open research questions that the authors would like to see addressed by MAE_ToBI users and the larger ToBI community

Columbia University Academic Commons