Corpora and evaluation tools for multilingual named entity grammar development

Bering, Christian; Droźdźyński, Witold; Erbach, Gregor; Guasch, Clara; Homola, Petr; Krieger, Hans-Ulrich; Lehmann, Sabine; Li, Hong; Piskorski, Jakub; Schäfer, Ulrich; Shimada, Atsuko; Siegel, Melanie; Xu, Feiyu; Ziegler-Eisele, Dorothee

research

Corpora and evaluation tools for multilingual named entity grammar development

Authors: Christian Bering
Witold Droźdźyński
Gregor Erbach
Clara Guasch
Petr Homola
Hans-Ulrich Krieger
Sabine Lehmann
Hong Li
Jakub Piskorski
Ulrich Schäfer
Atsuko Shimada
Melanie Siegel
Feiyu Xu
Dorothee Ziegler-Eisele
Publication date: 14 December 2011
Publisher

Abstract

We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Hochschulschriftenserver - Universität Frankfurt am Main

oai:publikationen.ub.uni-frank...

Last time updated on 27/08/2013