Skip to main content
Article thumbnail
Location of Repository

CONSTRUCTION OF VIETNAMESE CORPORA FOR NAMED ENTITY RECOGNITION

By Thao Pham T. X, Tri T. Q, Ai Kawazoe, Dien Dinh and Nigel Collier

Abstract

In order to build an automatic named entity recognition (NER) system using a machine learning approach, a large tagged corpus is widely seen as one necessary knowledge resource. Nevertheless, manual construction is time consuming, labor intensive and expensive. Building NER corpora for European languages has been extensively studied while some less-studied languages such as Vietnamese have not yet received much attention. This paper describes construction of a Vietnamese corpus, Vietnamese guidelines for annotators and a tagging tool that we make publicly available. We report on a comparison with the English named entity (NE) corpus in our multilingual NER system. I

Year: 2009
OAI identifier: oai:CiteSeerX.psu:10.1.1.134.6297
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://riao.free.fr/papers/67.... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.