Search CORE

12,634 research outputs found

Pushed and Non-pushed Speaking Tasks in an EAP Context: What Are the Benefits for Linguistic Processing and Accuracy?

Author: Byrne Shelley
Jones Christian
Publication venue: 'Kaunas University of Technology (KTU)'
Publication date: 23/06/2014
Field of study

This article reports on a mixed methods study investigating the effectiveness of pushed and non-pushed speaking tasks in a UK university setting with upper-intermediate students. Specifically, the study addressed a) if a pushed speaking task produced more language related episodes (LREs) than a non-pushed speaking task b) the differences in the types of LREs produced by each task and c) whether a pushed speaking task resulted in more accurate usage of past narrative forms. Results showed that the pushed storytelling task produced significantly more LREs than the non-pushed task and it also identified that the most common LRE type for both pushed and non-pushed learners related to some form of output correction. The pushed group achieved greater accuracy gains from pretest and posttest scores but these gain scores were not found to be statistically significant. The study concludes that creating a push during spoken output activities can increase the occurrence of opportunities for linguistic processing, and subsequently interlanguage development, to occur

CLoK

Crossref

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions

Author: Huang Fei
Li Chen
Li Zhenghua
Liu Yumeng
Zhang Bo
Zhang Ji
Zhang Min
Zhou Houquan
Publication venue
Publication date: 22/10/2023
Field of study

The sequence-to-sequence (Seq2Seq) approach has recently been widely used in grammatical error correction (GEC) and shows promising performance. However, the Seq2Seq GEC approach still suffers from two issues. First, a Seq2Seq GEC model can only be trained on parallel data, which, in GEC task, is often noisy and limited in quantity. Second, the decoder of a Seq2Seq GEC model lacks an explicit awareness of the correctness of the token being generated. In this paper, we propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally, and then dynamically influence the choice of the next token. We discover and investigate two types of critics: a pre-trained left-to-right language model critic and an incremental target-side grammatical error detector critic. Through extensive experiments on English and Chinese datasets, our framework consistently outperforms strong baselines and achieves results competitive with state-of-the-art methods.Comment: Accept to Findings of EMNLP 202

arXiv.org e-Print Archive

MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction

Author: Li Yangning
Li Yinghui
Ye Jingheng
Zheng Hai-Tao
Publication venue
Publication date: 17/10/2023
Field of study

Data Augmentation through generating pseudo data has been proven effective in mitigating the challenge of data scarcity in the field of Grammatical Error Correction (GEC). Various augmentation strategies have been widely explored, most of which are motivated by two heuristics, i.e., increasing the distribution similarity and diversity of pseudo data. However, the underlying mechanism responsible for the effectiveness of these strategies remains poorly understood. In this paper, we aim to clarify how data augmentation improves GEC models. To this end, we introduce two interpretable and computationally efficient measures: Affinity and Diversity. Our findings indicate that an excellent GEC data augmentation strategy characterized by high Affinity and appropriate Diversity can better improve the performance of GEC models. Based on this observation, we propose MixEdit, a data augmentation approach that strategically and dynamically augments realistic data, without requiring extra monolingual corpora. To verify the correctness of our findings and the effectiveness of the proposed MixEdit, we conduct experiments on mainstream English and Chinese GEC datasets. The results show that MixEdit substantially improves GEC models and is complementary to traditional data augmentation methods.Comment: Accepted to Findings of EMNLP 202

arXiv.org e-Print Archive

Beyond Hard Samples: Robust and Effective Grammatical Error Correction with Cycle Self-Augmenting

Author: Li Juntao
Qi Kaifeng
Tang Zecheng
Zhang Min
Publication venue
Publication date: 23/10/2023
Field of study

Recent studies have revealed that grammatical error correction methods in the sequence-to-sequence paradigm are vulnerable to adversarial attack, and simply utilizing adversarial examples in the pre-training or post-training process can significantly enhance the robustness of GEC models to certain types of attack without suffering too much performance loss on clean data. In this paper, we further conduct a thorough robustness evaluation of cutting-edge GEC methods for four different types of adversarial attacks and propose a simple yet very effective Cycle Self-Augmenting (CSA) method accordingly. By leveraging the augmenting data from the GEC models themselves in the post-training process and introducing regularization data for cycle training, our proposed method can effectively improve the model robustness of well-trained GEC models with only a few more training epochs as an extra cost. More concretely, further training on the regularization data can prevent the GEC models from over-fitting on easy-to-learn samples and thus can improve the generalization capability and robustness towards unseen data (adversarial noise/samples). Meanwhile, the self-augmented data can provide more high-quality pseudo pairs to improve model performance on the original testing data. Experiments on four benchmark datasets and seven strong models indicate that our proposed training method can significantly enhance the robustness of four types of attacks without using purposely built adversarial examples in training. Evaluation results on clean data further confirm that our proposed CSA method significantly improves the performance of four baselines and yields nearly comparable results with other state-of-the-art models. Our code is available at https://github.com/ZetangForward/CSA-GEC

arXiv.org e-Print Archive

Metalinguistic Knowledge and Language-analytic Ability in University-level L2 Learners

Author: Roehr K
Publication venue: Essex Research Reports in Linguistics
Publication date: 01/01/2006
Field of study

University of Essex Research Repository