Search CORE

7 research outputs found

Editorial Introduction to the Third Issue

Author: Bański Piotr
Litta Modignani Picozzi Eleonora
Witt Andreas
Publication venue
Publication date: 01/01/2012
Field of study

Publikationsserver des Instituts für Deutsche Sprache

A Generic Formalism for Encoding Stand-off annotations in TEI

Author: Lopez Patrice
Pose Javier
Romary Laurent
Publication venue: HAL CCSD
Publication date: 08/09/2014
Field of study

This article outlines a proposal for a consistent encoding of stand-off annotations in the frame of the TEI standard. The proposed encoding requires the extension of the current TEI schema with three additional elements, directly related to the encoding of stand-off annotations that provide a generic and flexible structure for encoding stand-off annotations in multiple layers or levels of annotations

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

<tiger2/> - Serialising the ISO SynAF Syntactic Object Model

Author: Romary Laurent
Zeldes Amir
Zipser Florian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2014
Field of study

International audienceThis paper introduces , an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread best practices we adapt a popular XML format for syntactic annotation, TigerXML, with additional features to support a variety of syntactic phenomena including constituent and dependency structures, binding, and different node types such as compounds or empty elements. We also define interfaces to other formats and standards including the Morpho-syntactic Annotation Framework MAF and the ISOCat Data Category Registry. Finally a case study of the German Treebank TueBa-D/Z is presented, showcasing the handling of constituent structures, topological fields and coreference annotation in tandem

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

SusTEInability of linguistic resources through feature structures

Author: Hinrichs Erhard
Lehmberg Timm
Rehm Georg
Stegmann Jens
Witt Andreas
Publication venue: Oxford : Oxford University Press
Publication date: 16/12/2015
Field of study

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data

Publikationsserver des Instituts für Deutsche Sprache

SusTEInability of linguistic resources through feature structures

Author: A. Witt
E. Hinrichs
G. Rehm
J. Stegmann
T. Lehmberg
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref