24 research outputs found
Building a Large Syntactically-Annotated Corpus of Vietnamese
Held in conjunction with ACL-IJCNLP 2009International audienceTreebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank construction. Since Vietnamese is an isolating language and has no word delimiter, there are many ambiguities in sentence analysis. We systematically applied a lot of linguistic techniques to handle such ambiguities. Annotators are supported by automatic labeling tools and a tree-editor tool. Raw texts are extracted from Tuoi Tre (Youth), an online Vietnamese daily newspaper. The current annotation agreement is around 90 percent
VnCoreNLP: A Vietnamese Natural Language Processing Toolkit
We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP
annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language
processing (NLP) tasks including word segmentation, part-of-speech (POS)
tagging, named entity recognition (NER) and dependency parsing, and obtains
state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to
provide rich linguistic annotations to facilitate research work on Vietnamese
NLP. Our VnCoreNLP is open-source and available at:
https://github.com/vncorenlp/VnCoreNLPComment: Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Demonstrations, NAACL 2018, to
appea