18,153 research outputs found
Annotation graphs as a framework for multidimensional linguistic data analysis
In recent work we have presented a formal framework for linguistic annotation
based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet
powerful method for representing complex annotation structures incorporating
hierarchy and overlap. Here, we motivate and illustrate our approach using
discourse-level annotations of text and speech data drawn from the CALLHOME,
COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain
specialists, we have constructed a hybrid multi-level annotation for a fragment
of the Boston University Radio Speech Corpus which includes the following
levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named
entity. We show how annotation graphs can represent hybrid multi-level
structures which derive from a diverse set of file formats. We also show how
the approach facilitates substantive comparison of multiple annotations of a
single signal based on different theoretical models. The discussion shows how
annotation graphs open the door to wide-ranging integration of tools, formats
and corpora.Comment: 10 pages, 10 figures, Towards Standards and Tools for Discourse
Tagging, Proceedings of the Workshop. pp. 1-10. Association for Computational
Linguistic
Methodological considerations concerning manual annotation of musical audio in function of algorithm development
In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Annotation graphs and annotation servers offer infrastructure to support the
analysis of human language resources in the form of time-series data such as
text, audio and video. This paper outlines areas of common need among empirical
linguists and computational linguists. After reviewing examples of data and
tools used or under development for each of several areas, it proposes a common
framework for future tool development, data annotation and resource sharing
based upon annotation graphs and servers.Comment: 8 pages, 6 figure
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
A growing interest in tasks involving language understanding by the NLP
community has led to the need for effective semantic parsing and inference.
Modern NLP systems use semantic representations that do not quite fulfill the
nuanced needs for language understanding: adequately modeling language
semantics, enabling general inferences, and being accurately recoverable. This
document describes underspecified logical forms (ULF) for Episodic Logic (EL),
which is an initial form for a semantic representation that balances these
needs. ULFs fully resolve the semantic type structure while leaving issues such
as quantifier scope, word sense, and anaphora unresolved; they provide a
starting point for further resolution into EL, and enable certain structural
inferences without further resolution. This document also presents preliminary
results of creating a hand-annotated corpus of ULFs for the purpose of training
a precise ULF parser, showing a three-person pairwise interannotator agreement
of 0.88 on confident annotations. We hypothesize that a divide-and-conquer
approach to semantic parsing starting with derivation of ULFs will lead to
semantic analyses that do justice to subtle aspects of linguistic meaning, and
will enable construction of more accurate semantic parsers.Comment: Accepted for publication at The 13th International Conference on
Computational Semantics (IWCS 2019
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
Modelling Discourse-related terminology in OntoLingAnnot’s ontologies
Recently, computational linguists have shown great interest in discourse annotation in an attempt to capture the internal relations in texts. With this aim, we have formalized the linguistic knowledge associated to discourse into different linguistic ontologies. In this paper, we present the most prominent discourse-related terms and concepts included in the ontologies of the OntoLingAnnot annotation model. They show the different units, values, attributes, relations, layers and strata included in the discourse annotation level of the OntoLingAnnot model, within which these ontologies are included, used and evaluated
- …