Representing and Querying Standoff XML

Abstract

The paper discusses the representation and exploitation of multi-level annotated linguistic data. We first present a standoff XML representation, which distributes information over separate, standoff layers and allows us to represent annotations of various kinds in a uniform, generic way. This format serves as our interchange format. We further introduce an XML-inline representation that is designed to provide for a more efficient processing of the data. This format is computed on the basis of the standoff representation and uses fragments to represent overlapping elements. We then compare both representations by testing their performance with regard to a testsuite. Not surprisingly, the inline variant performs much better than the standoff variant, in particular with more complex queries

    Similar works

    Full text

    thumbnail-image

    Available Versions