49 research outputs found

    A Description Language for Syntactically Annotated Corpora

    No full text
    This paper introduces a description language for syntactically annotated corpora which allows for encoding both the syntactic annotation to a corpus and the queries to a syntactically annotated corpus

    An XML-based representation format for syntactically annotated corpora

    No full text
    This paper discusses a general approach to the description and encoding of linguistic corpora annotated with hierarchically structured syntactic information. A general format can be motivated by the variety and incompatibility of existing annotation formats. By using XML as a representation format the theoretical and technical problems encountered can be overcome. Introduction As there are various formats for the representation and storage of linguistic corpora, there are also a number of formats for the representation of syntactically annotated corpora or treebanks: Tipster (Grishman, 1998), Penn Treebank (Marcus et al., 1993), Susanne (Sampson, 1995), NeGra (Skut et al., 1998) and several formats for chunked corpora. This variety of formats complicates the access to syntactic data and thus contradicts the aim of creating standard resources only once and to enable easy exchange of data. In this paper we propose an XML-based, theoryindependent exchange format for syntactically annot..
    corecore