26,819 research outputs found
Modelling Discourse-related terminology in OntoLingAnnot’s ontologies
Recently, computational linguists have shown great interest in discourse annotation in an attempt to capture the internal relations in texts. With this aim, we have formalized the linguistic knowledge associated to discourse into different linguistic ontologies. In this paper, we present the most prominent discourse-related terms and concepts included in the ontologies of the OntoLingAnnot annotation model. They show the different units, values, attributes, relations, layers and strata included in the discourse annotation level of the OntoLingAnnot model, within which these ontologies are included, used and evaluated
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
Methodological considerations concerning manual annotation of musical audio in function of algorithm development
In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1
Refining Implicit Argument Annotation for UCCA
Predicate-argument structure analysis is a central component in meaning
representations of text. The fact that some arguments are not explicitly
mentioned in a sentence gives rise to ambiguity in language understanding, and
renders it difficult for machines to interpret text correctly. However, only
few resources represent implicit roles for NLU, and existing studies in NLP
only make coarse distinctions between categories of arguments omitted from
linguistic form. This paper proposes a typology for fine-grained implicit
argument annotation on top of Universal Conceptual Cognitive Annotation's
foundational layer. The proposed implicit argument categorisation is driven by
theories of implicit role interpretation and consists of six types: Deictic,
Generic, Genre-based, Type-identifiable, Non-specific, and Iterated-set. We
exemplify our design by revisiting part of the UCCA EWT corpus, providing a new
dataset annotated with the refinement layer, and making a comparative analysis
with other schemes.Comment: DMR 202
- …