Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

Abromeit, Frank; Chiarcos, Christian

research

Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

Authors: Frank Abromeit
Christian Chiarcos
Publication date: 1 January 2019
Publisher: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Doi

Abstract

We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Dagstuhl Research Online Publication Server

oai:drops-oai.dagstuhl.de:1038...

Last time updated on 22/05/2019

ZENODO

oai:zenodo.org:3555141

Last time updated on 08/08/2023