Search CORE

24 research outputs found

Building a Large Syntactically-Annotated Corpus of Vietnamese

Author: Le Hong Phuong
Nguyen Phuong Thai
Nguyen Thi Minh Huyen
Nguyen van Hiep
Vu Xuan Luong
Publication venue: HAL CCSD
Publication date: 06/08/2009
Field of study

Held in conjunction with ACL-IJCNLP 2009International audienceTreebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank construction. Since Vietnamese is an isolating language and has no word delimiter, there are many ambiguities in sentence analysis. We systematically applied a lot of linguistic techniques to handle such ambiguities. Annotators are supported by automatic labeling tools and a tree-editor tool. Raw texts are extracted from Tuoi Tre (Youth), an online Vietnamese daily newspaper. The current annotation agreement is around 90 percent

INRIA a CCSD electronic archive server

VnCoreNLP: A Vietnamese Natural Language Processing Toolkit

Author: Dras Mark
Johnson Mark
Nguyen Dai Quoc
Nguyen Dat Quoc
Vu Thanh
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to provide rich linguistic annotations to facilitate research work on Vietnamese NLP. Our VnCoreNLP is open-source and available at: https://github.com/vncorenlp/VnCoreNLPComment: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, NAACL 2018, to appea

arXiv.org e-Print Archive

Crossref