2 research outputs found
Spelling Error Correction with Soft-Masked BERT
Spelling error correction is an important yet challenging task because a
satisfactory solution of it essentially needs human-level language
understanding ability. Without loss of generality we consider Chinese spelling
error correction (CSC) in this paper. A state-of-the-art method for the task
selects a character from a list of candidates for correction (including
non-correction) at each position of the sentence on the basis of BERT, the
language representation model. The accuracy of the method can be sub-optimal,
however, because BERT does not have sufficient capability to detect whether
there is an error at each position, apparently due to the way of pre-training
it using mask language modeling. In this work, we propose a novel neural
architecture to address the aforementioned issue, which consists of a network
for error detection and a network for error correction based on BERT, with the
former being connected to the latter with what we call soft-masking technique.
Our method of using `Soft-Masked BERT' is general, and it may be employed in
other language detection-correction problems. Experimental results on two
datasets demonstrate that the performance of our proposed method is
significantly better than the baselines including the one solely based on BERT.Comment: To be published at ACL 202
Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction
We investigate the problem of Chinese Grammatical Error Correction (CGEC) and
present a new framework named Tail-to-Tail (\textbf{TtT}) non-autoregressive
sequence prediction to address the deep issues hidden in CGEC. Considering that
most tokens are correct and can be conveyed directly from source to target, and
the error positions can be estimated and corrected based on the bidirectional
context information, thus we employ a BERT-initialized Transformer Encoder as
the backbone model to conduct information modeling and conveying. Considering
that only relying on the same position substitution cannot handle the
variable-length correction cases, various operations such substitution,
deletion, insertion, and local paraphrasing are required jointly. Therefore, a
Conditional Random Fields (CRF) layer is stacked on the up tail to conduct
non-autoregressive sequence prediction by modeling the token dependencies.
Since most tokens are correct and easily to be predicted/conveyed to the
target, then the models may suffer from a severe class imbalance issue. To
alleviate this problem, focal loss penalty strategies are integrated into the
loss functions. Moreover, besides the typical fix-length error correction
datasets, we also construct a variable-length corpus to conduct experiments.
Experimental results on standard datasets, especially on the variable-length
datasets, demonstrate the effectiveness of TtT in terms of sentence-level
Accuracy, Precision, Recall, and F1-Measure on tasks of error Detection and
Correction.Comment: ACL 2021. Code: https://github.com/lipiji/TtT. Fix the results of
SpellGCN on Oct.26,202