879 research outputs found
BEA – A multifunctional Hungarian spoken language database
In diverse areas of linguistics, the demand for studying actual language use is on
the increase. The aim of developing a phonetically-based multi-purpose database of
Hungarian spontaneous speech, dubbed BEA2, is to accumulate a large amount of
spontaneous speech of various types together with sentence repetition and reading.
Presently, the recorded material of BEA amounts to 260 hours produced by 280
present-day Budapest speakers (ages between 20 and 90, 168 females and 112
males), providing also annotated materials for various types of research and practical
applications
Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies
This paper discusses the challenges that arise when large speech corpora
receive an ever-broadening range of diverse and distinct annotations. Two case
studies of this process are presented: the Switchboard Corpus of telephone
conversations and the TDT2 corpus of broadcast news. Switchboard has undergone
two independent transcriptions and various types of additional annotation, all
carried out as separate projects that were dispersed both geographically and
chronologically. The TDT2 corpus has also received a variety of annotations,
but all directly created or managed by a core group. In both cases, issues
arise involving the propagation of repairs, consistency of references, and the
ability to integrate annotations having different formats and levels of detail.
We describe a general framework whereby these issues can be addressed
successfully.Comment: 7 pages, 2 figure
Tagging Prosody and Discourse Structure in Elicited Spontaneous Speech
This paper motivates and describes the annotation and analysis of prosody and discourse structure for several large spoken language corpora. The annotation schema are of two types: tags for prosody and intonation, and tags for several aspects of discourse structure. The choice of the particular tagging schema in each domain is based in large part on the insights they provide in corpus-based studies of the relationship between discourse structure and the accenting of referring expressions in American English. We first describe these results and show that the same models account for the accenting of pronouns in an extended passage from one of the Speech Warehouse hotel-booking dialogues. We then turn to corpora described in Venditti [Ven00], which adapts the same models to Tokyo Japanese. Japanese is interesting to compare to English, because accent is lexically specified and so cannot mark discourse focus in the same way. Analyses of these corpora show that local pitch range expansion serves the analogous focusing function in Japanese. The paper concludes with a section describing several outstanding questions in the annotation of Japanese intonation which corpus studies can help to resolve.Work reported in this paper was supported in part by a grant from the Ohio State University Office of Research, to Mary E. Beckman and co-principal investigators on the OSU Speech Warehouse project, and by an Ohio State University Presidential Fellowship to Jennifer J. Venditti
Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch)
Speech researchers often rely on human annotation of prosody to generate data to test hypotheses and generate models. We present an overview of two prosodic annotation systems: ToBI (Tones and Break Indices) (Silverman et al., 1992), and RaP (Rhythm and Pitch) (Dilley & Brown, 2005), which was designed to address several limitations of ToBI. The paper reports two large-scale studies of inter-transcriber reliability for ToBI and RaP. Comparable reliability for both systems was obtained for a variety of prominence- and boundary-related agreement categories. These results help to establish RaP as an alternative to ToBI for research and technology applicationsNational Science Foundation (U.S.) (NSF grant BCS 0847653
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction,
topic detection, or browsing/playback is to segment the input into sentence and
topic units. Speech segmentation is challenging, since the cues typically
present for segmenting text (headers, paragraphs, punctuation) are absent in
spoken language. We investigate the use of prosody (information gleaned from
the timing and melody of speech) for these tasks. Using decision tree and
hidden Markov modeling techniques, we combine prosodic cues with word-based
approaches, and evaluate performance on two speech corpora, Broadcast News and
Switchboard. Results show that the prosodic model alone performs on par with,
or better than, word-based statistical language models -- for both true and
automatically recognized words in news speech. The prosodic model achieves
comparable performance with significantly less training data, and requires no
hand-labeling of prosodic events. Across tasks and corpora, we obtain a
significant improvement over word-only models using a probabilistic combination
of prosodic and lexical information. Inspection reveals that the prosodic
models capture language-independent boundary indicators described in the
literature. Finally, cue usage is task and corpus dependent. For example, pause
and pitch features are highly informative for segmenting news speech, whereas
pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2),
Special Issue on Accessing Information in Spoken Audio, September 200
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
ProGmatica: a prosodic and pragmatic database for european portuguese
In this work, a spontaneous speech corpus of broadcasted television material in European Portuguese (EP) is presented. We decided to name it ProGmatica as it is meant to combine prosody information under a pragmatic framework. Our purpose is to analyse, describe and predict the prosodic patterns that are involved in speech acts and discourse events. It is also our goal to relate both prosody and pragmatics to emotion, style and attitude. In future developments, we intend, by this way, to provide EP TTS systems with pragmatic and emotional dimensions.
From the whole recorded material we selected, extracted and saved prototypical speech acts with the help of speech analysis tools. We have a multi-speaker corpus, where linguistic, paralinguistic and extra linguistic information are labelled and related to each other.
The paper is organized as follows. In section one, a brief state-of-the-art for the available EP corpora containing prosodic information is presented. In section two, we explain the pragmatic criteria used to structure this database. Then, we describe how the speech signal was labelled and which information layers were considered. In section three, we propose a prosodic prediction model to be applied to each speech act in future. In section four, some of the main problems we went through are discussed and future work is presented
- …