Search CORE

116 research outputs found

Towards interoperable discourse annotation: discourse features in the Ontologies of Linguistic Annotation

Author: Chiarcos Christian
Publication venue
Publication date: 03/05/2023
Field of study

This paper describes the extension of the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in the most important corpora available to the community, including OntoNotes, the RST Discourse Treebank and the Penn Discourse Treebank. Along with selected schemes for information structure and coreference, discourse relations are discussed with special emphasis on the Penn Discourse Treebank and the RST Discourse Treebank. For an example contained in the intersection of both corpora, I show how ontologies can be employed to generalize over divergent annotation schemes

OPUS Augsburg

Interchanging lexical resources on the Semantic Web

Author: Aguado de Cea G.
Buitelaar Paul
Cimiano Philipp
Declerck Thierry
Gracia Jorge
Gómez-Pérez A.
Hollink Laura
McCrae J.
Montiel-Ponsoda Elena
Spohr Dennis
Wunner Tobias
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

Lexica and terminology databases play a vital role in many NLP applications, but currently most such resources are published in application-specific formats, or with custom access interfaces, leading to the problem that much of this data is in ‘‘data silos’’ and hence difficult to access. The Semantic Web and in particular the Linked Data initiative provide effective solutions to this problem, as well as possibilities for data reuse by inter-lexicon linking, and incorporation of data categories by dereferencable URIs. The Semantic Web focuses on the use of ontologies to describe semantics on the Web, but currently there is no standard for providing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. We present our model, lemon, which aims to address these gap

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Publications at Bielefeld University

Access to Research at National University of Ireland, Galway

Archivo Digital UPM

Annotation interoperability for the post-ISOCat era

Author: Abromeit Frank
Chiarcos Christian
Fäth Christian
Publication venue
Publication date: 24/04/2023
Field of study

With this paper, we provide an overview over ISOCat successor solutions and annotation standardization efforts since 2010, and we describe the low-cost harmonization of post-ISOCat vocabularies by means of modular, linked ontologies: The CLARIN Concept Registry, LexInfo, Universal Parts of Speech, Universal Dependencies and UniMorph are linked with the Ontologies of Linguistic Annotation and through it with ISOCat, the GOLD ontology, the Typological Database Systems ontology and a large number of annotation schemes

OPUS Augsburg

Towards robust multi-tool tagging: an OWL/DL-based approach

Author: Chiarcos Christian
Publication venue
Publication date: 19/05/2023
Field of study

OPUS Augsburg

Linking Discourse Marker Inventories

Author: Chiarcos Christian
Ionov Maxim
Publication venue: OASIcs - OpenAccess Series in Informatics. 3rd Conference on Language, Data and Knowledge (LDK 2021)
Publication date: 01/01/2021
Field of study

The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data

Dagstuhl Research Online Publication Server

Linking discourse marker inventories

Author: Chiarcos Christian
Ionov Maxim
Publication venue
Publication date: 24/04/2023
Field of study

OPUS Augsburg

Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

Author: Abromeit Frank
Chiarcos Christian
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Publication date: 01/01/2019
Field of study

We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3

ZENODO

Dagstuhl Research Online Publication Server

Automatic detection of language and annotation model information in CoNLL corpora

Author: Abromeit Frank
Chiarcos Christian
Publication venue
Publication date: 27/04/2023
Field of study

OPUS Augsburg

Towards LLOD-based language contact studies: a case study in interoperability

Author: Chiarcos Christian
Donandt Kathrin
Ionov M.
Sargsian Hasmik
Wichers Schreur Jesse
Publication venue
Publication date: 27/04/2023
Field of study

We describe a methodological and technical framework for conducting qualitative and quantitative studies of linguistic research questions over diverse and heterogeneous data sources such as corpora and elicitations. We demonstrate how LLOD formalisms can be employed to develop extraction pipelines for features and linguistic examples from corpora and collections of interlinear glossed text, and furthermore, how SPARQL UPDATE can be employed (1) to normalize diverse data against a reference data model (here, POWLA), (2) to harmonize annotation vocabularies by reference to terminology repositories (here, OLiA), (3) to extract examples from these normalized data structures regardless of their origin, and (4) to implement this extraction routine in a tool-independent manner for different languages with different annotation schemes. We demonstrate our approach for language contact studies for genetically unrelated, but neighboring languages from the Caucasus area, Eastern Armenian and Georgian

OPUS Augsburg

Lin|gu|is|tik: building the linguist's pathway to bibliographies, libraries, language resources and linked open data

Author: Abromeit Frank
Chiarcos Christian
Dimitrova Vanya
Fäth Christian
Renner-Westermann Heike
Publication venue
Publication date: 27/04/2023
Field of study

This paper introduces a novel research tool for the field of linguistics: The Lin|gu|is|tik web portal provides a virtual library which offers scientific information on every linguistic subject. It comprises selected internet sources and databases as well as catalogues for linguistic literature, and addresses an interdisciplinary audience. The virtual library is the most recent outcome of the Special Subject Collection Linguistics of the German Research Foundation (DFG), and also integrates the knowledge accumulated in the Bibliography of Linguistic Literature. In addition to the portal, we describe long-term goals and prospects with a special focus on ongoing efforts regarding an extension towards integrating language resources and Linguistic Linked Open Data

OPUS Augsburg