A Framework for Multi-level Linguistic Annotation

Abstract

International audienceThis article presents a 3-step model for multi- layer annotations of corpora. Each kind of an- notation for a textual corporacorresponds to a dierent view on the same document. This prin- ciple can be expressed rst with a general re- lational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the en- coding of large corpora. The exploitation of this kind of annotated corpora requires ecient ma- nipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propo- sitions have been implemented in the rst ver- sion of a workbench dedicated to the French Le Monde corpus

    Similar works