155 research outputs found
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning
Grammatical error correction aims to correct ungrammatical sentences
automatically. Recently, some work has demonstrated the excellent capabilities
of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical
error correction. However, the potential of open-source LLMs remains
unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to
preliminary explore its potential for native Chinese grammatical error
correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of
ChatGPT-generated and human-annotated. For grammatical errors with clues, we
proposed a heuristic method to guide ChatGPT to generate ungrammatical
sentences by providing those clues. For grammatical errors without clues, we
collected ungrammatical sentences from publicly available websites and manually
corrected them. In addition, we employed an error-invariant augmentation method
to enhance the ability of the model to correct native Chinese grammatical
errors. We ultimately constructed about 1k parallel data and utilized these
data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese
University of Hong Kong, Shenzhen) with instruction tuning. The experimental
results show that GrammarGPT outperforms the existing SOTA system
significantly. Although model parameters are 20x larger than the SOTA baseline,
the required amount of data for instruction tuning is 1200x smaller,
illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT
ranks on NLPCC2023 SharedTask1, demonstrating our approach's
effectiveness. The code and data are available at
\url{https://github.com/FreedomIntelligence/GrammarGPT}
Layer-wise Representation Fusion for Compositional Generalization
Despite successes across a broad range of applications, sequence-to-sequence
models' construct of solutions are argued to be less compositional than
human-like generalization. There is mounting evidence that one of the reasons
hindering compositional generalization is representations of the encoder and
decoder uppermost layer are entangled. In other words, the syntactic and
semantic representations of sequences are twisted inappropriately. However,
most previous studies mainly concentrate on enhancing token-level semantic
information to alleviate the representations entanglement problem, rather than
composing and using the syntactic and semantic representations of sequences
appropriately as humans do. In addition, we explain why the entanglement
problem exists from the perspective of recent studies about training deeper
Transformer, mainly owing to the ``shallow'' residual connections and its
simple, one-step operations, which fails to fuse previous layers' information
effectively. Starting from this finding and inspired by humans' strategies, we
propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and
Semant\textbf{i}c Representati\textbf{on}s), an extension to
sequence-to-sequence models to learn to fuse previous layers' information back
into the encoding and decoding process appropriately through introducing a
\emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion}
achieves competitive and even \textbf{state-of-the-art} results on two
realistic benchmarks, which empirically demonstrates the effectiveness of our
proposal.Comment: work in progress. arXiv admin note: substantial text overlap with
arXiv:2305.1216
Multilingual Multi-Figurative Language Detection
Figures of speech help people express abstract concepts and evoke stronger
emotions than literal expressions, thereby making texts more creative and
engaging. Due to its pervasive and fundamental character, figurative language
understanding has been addressed in Natural Language Processing, but it's
highly understudied in a multilingual setting and when considering more than
one figure of speech at the same time. To bridge this gap, we introduce
multilingual multi-figurative language modelling, and provide a benchmark for
sentence-level figurative language detection, covering three common figures of
speech and seven languages. Specifically, we develop a framework for figurative
language detection based on template-based prompt learning. In so doing, we
unify multiple detection tasks that are interrelated across multiple figures of
speech and languages, without requiring task- or language-specific modules.
Experimental results show that our framework outperforms several strong
baselines and may serve as a blueprint for the joint modelling of other
interrelated tasks.Comment: Accepted to ACL 2023 (Findings
Beyond Hard Samples: Robust and Effective Grammatical Error Correction with Cycle Self-Augmenting
Recent studies have revealed that grammatical error correction methods in the
sequence-to-sequence paradigm are vulnerable to adversarial attack, and simply
utilizing adversarial examples in the pre-training or post-training process can
significantly enhance the robustness of GEC models to certain types of attack
without suffering too much performance loss on clean data. In this paper, we
further conduct a thorough robustness evaluation of cutting-edge GEC methods
for four different types of adversarial attacks and propose a simple yet very
effective Cycle Self-Augmenting (CSA) method accordingly. By leveraging the
augmenting data from the GEC models themselves in the post-training process and
introducing regularization data for cycle training, our proposed method can
effectively improve the model robustness of well-trained GEC models with only a
few more training epochs as an extra cost. More concretely, further training on
the regularization data can prevent the GEC models from over-fitting on
easy-to-learn samples and thus can improve the generalization capability and
robustness towards unseen data (adversarial noise/samples). Meanwhile, the
self-augmented data can provide more high-quality pseudo pairs to improve model
performance on the original testing data. Experiments on four benchmark
datasets and seven strong models indicate that our proposed training method can
significantly enhance the robustness of four types of attacks without using
purposely built adversarial examples in training. Evaluation results on clean
data further confirm that our proposed CSA method significantly improves the
performance of four baselines and yields nearly comparable results with other
state-of-the-art models. Our code is available at
https://github.com/ZetangForward/CSA-GEC
- …