The multilingual entity task (MET) overview

Abstract

Conference-6 (MUC-6) evaluation of named entity identification demonstrated that systems are approach-ing human performance onEnglish language t xts [10]. Informal and anonymous, the MET provided a new opportunity to assess progress on the same task in Span-ish, Japanese, and Chinese. Preliminary results indicate that MET systems in all three languages performed comparably to those of the MUC-6 evaluatien in English. Based upon the Named Entity Task Guidelines [ 11], the task was to locate and tag with SGML named entity expressions (people, organizations, and locations), time expressions (time and date), and numeric expressions (percentage and money) in Spanish texts from Agence France Presse, in Japanese texts from Kyodo newswire, or in Chinese texts from Xinhua newswkel. Across lan-guages the keywords "press conference " retrieved a rich subcorpus of texts, covering awide spectrum of topics. Frequency and types of expressions vary in the three language sets [2] [8] [9]. The original task guidelines were modified so that he core guidelines were language independent with language specific rules appended. The schedule was quite abbreviated. In the fall, Government language teams retrieved training and test texts with multilingual software for the Fast Data Finder (FDF), refined the MUC-6 guidelines, and manually tagged 100 training texts using the SRA Named Entity Tool. In January, the training texts were released along with 200 sample unannotated training texts to the partic-ipating sites. A dry run was held in late March and early April and in late April the official test on 100 texts was. The language t xts were supplied by the Linguistic Data Consortium (LDC) at the University of Pennsylvania. performed anonymously. SAIC created language ver-sions of the scoring program and provided technical support throughout. Both commercial and academic groups partici-pated. Two groups, New Mexico State University/Com

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 19/02/2019