11,744 research outputs found
The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition
This paper presents a state-of-the-art system for Vietnamese Named Entity
Recognition (NER). By incorporating automatic syntactic features with word
embeddings as input for bidirectional Long Short-Term Memory (Bi-LSTM), our
system, although simpler than some deep learning architectures, achieves a much
better result for Vietnamese NER. The proposed method achieves an overall F1
score of 92.05% on the test set of an evaluation campaign, organized in late
2016 by the Vietnamese Language and Speech Processing (VLSP) community. Our
named entity recognition system outperforms the best previous systems for
Vietnamese NER by a large margin.Comment: 7 pages, 9 tables, 3 figures, accepted to PACLIC 201
ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing
English and Chinese, known as resource-rich languages, have witnessed the
strong development of transformer-based language models for natural language
processing tasks. Although Vietnam has approximately 100M people speaking
Vietnamese, several pre-trained models, e.g., PhoBERT, ViBERT, and vELECTRA,
performed well on general Vietnamese NLP tasks, including POS tagging and named
entity recognition. These pre-trained language models are still limited to
Vietnamese social media tasks. In this paper, we present the first monolingual
pre-trained language model for Vietnamese social media texts, ViSoBERT, which
is pre-trained on a large-scale corpus of high-quality and diverse Vietnamese
social media texts using XLM-R architecture. Moreover, we explored our
pre-trained model on five important natural language downstream tasks on
Vietnamese social media texts: emotion recognition, hate speech detection,
sentiment analysis, spam reviews detection, and hate speech spans detection.
Our experiments demonstrate that ViSoBERT, with far fewer parameters, surpasses
the previous state-of-the-art models on multiple Vietnamese social media tasks.
Our ViSoBERT model is available only for research purposes.Comment: Accepted at EMNLP'2023 Main Conferenc
VnCoreNLP: A Vietnamese Natural Language Processing Toolkit
We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP
annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language
processing (NLP) tasks including word segmentation, part-of-speech (POS)
tagging, named entity recognition (NER) and dependency parsing, and obtains
state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to
provide rich linguistic annotations to facilitate research work on Vietnamese
NLP. Our VnCoreNLP is open-source and available at:
https://github.com/vncorenlp/VnCoreNLPComment: Proceedings of the 2018 Conference of the North American Chapter of
the Association for Computational Linguistics: Demonstrations, NAACL 2018, to
appea
- …