In order to build an automatic named entity recognition (NER) system using a machine learning approach, a large tagged corpus is widely seen as one necessary knowledge resource. Nevertheless, manual construction is time consuming, labor intensive and expensive. Building NER corpora for European languages has been extensively studied while some less-studied languages such as Vietnamese have not yet received much attention. This paper describes construction of a Vietnamese corpus, Vietnamese guidelines for annotators and a tagging tool that we make publicly available. We report on a comparison with the English named entity (NE) corpus in our multilingual NER system. I
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.