588 research outputs found
Revisiting Pre-Trained Models for Chinese Natural Language Processing
Bidirectional Encoder Representations from Transformers (BERT) has shown
marvelous improvements across various NLP tasks, and consecutive variants have
been proposed to further improve the performance of the pre-trained language
models. In this paper, we target on revisiting Chinese pre-trained language
models to examine their effectiveness in a non-English language and release the
Chinese pre-trained language model series to the community. We also propose a
simple but effective model called MacBERT, which improves upon RoBERTa in
several ways, especially the masking strategy that adopts MLM as correction
(Mac). We carried out extensive experiments on eight Chinese NLP tasks to
revisit the existing pre-trained language models as well as the proposed
MacBERT. Experimental results show that MacBERT could achieve state-of-the-art
performances on many NLP tasks, and we also ablate details with several
findings that may help future research. Resources available:
https://github.com/ymcui/MacBERTComment: 12 pages, to appear at Findings of EMNLP 202
DSTEA: Improving Dialogue State Tracking via Entity Adaptive Pre-training
Dialogue State Tracking (DST) is critical for comprehensively interpreting
user and system utterances, thereby forming the cornerstone of efficient
dialogue systems. Despite past research efforts focused on enhancing DST
performance through alterations to the model structure or integrating
additional features like graph relations, they often require additional
pre-training with external dialogue corpora. In this study, we propose DSTEA,
improving Dialogue State Tracking via Entity Adaptive pre-training, which can
enhance the encoder through by intensively training key entities in dialogue
utterances. DSTEA identifies these pivotal entities from input dialogues
utilizing four different methods: ontology information, named-entity
recognition, the spaCy, and the flair library. Subsequently, it employs
selective knowledge masking to train the model effectively. Remarkably, DSTEA
only requires pre-training without the direct infusion of extra knowledge into
the DST model. This approach resulted in substantial performance improvements
of four robust DST models on MultiWOZ 2.0, 2.1, and 2.2, with joint goal
accuracy witnessing an increase of up to 2.69% (from 52.41% to 55.10%). Further
validation of DSTEA's efficacy was provided through comparative experiments
considering various entity types and different entity adaptive pre-training
configurations such as masking strategy and masking rate
- …