22,308 research outputs found
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
Annotation graphs as a framework for multidimensional linguistic data analysis
In recent work we have presented a formal framework for linguistic annotation
based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet
powerful method for representing complex annotation structures incorporating
hierarchy and overlap. Here, we motivate and illustrate our approach using
discourse-level annotations of text and speech data drawn from the CALLHOME,
COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain
specialists, we have constructed a hybrid multi-level annotation for a fragment
of the Boston University Radio Speech Corpus which includes the following
levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named
entity. We show how annotation graphs can represent hybrid multi-level
structures which derive from a diverse set of file formats. We also show how
the approach facilitates substantive comparison of multiple annotations of a
single signal based on different theoretical models. The discussion shows how
annotation graphs open the door to wide-ranging integration of tools, formats
and corpora.Comment: 10 pages, 10 figures, Towards Standards and Tools for Discourse
Tagging, Proceedings of the Workshop. pp. 1-10. Association for Computational
Linguistic
Modelling Discourse-related terminology in OntoLingAnnotâs ontologies
Recently, computational linguists have shown great interest in discourse annotation in an attempt to capture the internal relations in texts. With this aim, we have formalized the linguistic knowledge associated to discourse into different linguistic ontologies. In this paper, we present the most prominent discourse-related terms and concepts included in the ontologies of the OntoLingAnnot annotation model. They show the different units, values, attributes, relations, layers and strata included in the discourse annotation level of the OntoLingAnnot model, within which these ontologies are included, used and evaluated
Methodological considerations concerning manual annotation of musical audio in function of algorithm development
In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Annotation graphs and annotation servers offer infrastructure to support the
analysis of human language resources in the form of time-series data such as
text, audio and video. This paper outlines areas of common need among empirical
linguists and computational linguists. After reviewing examples of data and
tools used or under development for each of several areas, it proposes a common
framework for future tool development, data annotation and resource sharing
based upon annotation graphs and servers.Comment: 8 pages, 6 figure
ELAN as flexible annotation framework for sound and image processing detectors
Annotation of digital recordings in humanities research still is, to a largeextend, a process that is performed manually. This paper describes the firstpattern recognition based software components developed in the AVATecH projectand their integration in the annotation tool ELAN. AVATecH (AdvancingVideo/Audio Technology in Humanities Research) is a project that involves twoMax Planck Institutes (Max Planck Institute for Psycholinguistics, Nijmegen,Max Planck Institute for Social Anthropology, Halle) and two FraunhoferInstitutes (Fraunhofer-Institut fĂŒr Intelligente Analyse- undInformationssysteme IAIS, Sankt Augustin, Fraunhofer Heinrich-Hertz-Institute,Berlin) and that aims to develop and implement audio and video technology forsemi-automatic annotation of heterogeneous media collections as they occur inmultimedia based research. The highly diverse nature of the digital recordingsstored in the archives of both Max Planck Institutes, poses a huge challenge tomost of the existing pattern recognition solutions and is a motivation to makesuch technology available to researchers in the humanities
- âŠ