18,153 research outputs found

    Annotation graphs as a framework for multidimensional linguistic data analysis

    Full text link
    In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet powerful method for representing complex annotation structures incorporating hierarchy and overlap. Here, we motivate and illustrate our approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain specialists, we have constructed a hybrid multi-level annotation for a fragment of the Boston University Radio Speech Corpus which includes the following levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named entity. We show how annotation graphs can represent hybrid multi-level structures which derive from a diverse set of file formats. We also show how the approach facilitates substantive comparison of multiple annotations of a single signal based on different theoretical models. The discussion shows how annotation graphs open the door to wide-ranging integration of tools, formats and corpora.Comment: 10 pages, 10 figures, Towards Standards and Tools for Discourse Tagging, Proceedings of the Workshop. pp. 1-10. Association for Computational Linguistic

    Methodological considerations concerning manual annotation of musical audio in function of algorithm development

    Get PDF
    In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1

    Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

    Full text link
    Annotation graphs and annotation servers offer infrastructure to support the analysis of human language resources in the form of time-series data such as text, audio and video. This paper outlines areas of common need among empirical linguists and computational linguists. After reviewing examples of data and tools used or under development for each of several areas, it proposes a common framework for future tool development, data annotation and resource sharing based upon annotation graphs and servers.Comment: 8 pages, 6 figure

    A Type-coherent, Expressive Representation as an Initial Step to Language Understanding

    Full text link
    A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding: adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize that a divide-and-conquer approach to semantic parsing starting with derivation of ULFs will lead to semantic analyses that do justice to subtle aspects of linguistic meaning, and will enable construction of more accurate semantic parsers.Comment: Accepted for publication at The 13th International Conference on Computational Semantics (IWCS 2019

    ATLAS: A flexible and extensible architecture for linguistic annotation

    Full text link
    We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure

    Modelling Discourse-related terminology in OntoLingAnnot’s ontologies

    Get PDF
    Recently, computational linguists have shown great interest in discourse annotation in an attempt to capture the internal relations in texts. With this aim, we have formalized the linguistic knowledge associated to discourse into different linguistic ontologies. In this paper, we present the most prominent discourse-related terms and concepts included in the ontologies of the OntoLingAnnot annotation model. They show the different units, values, attributes, relations, layers and strata included in the discourse annotation level of the OntoLingAnnot model, within which these ontologies are included, used and evaluated
    corecore