Search CORE

300 research outputs found

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions

Author: Huang Fei
Li Chen
Li Zhenghua
Liu Yumeng
Zhang Bo
Zhang Ji
Zhang Min
Zhou Houquan
Publication venue
Publication date: 22/10/2023
Field of study

The sequence-to-sequence (Seq2Seq) approach has recently been widely used in grammatical error correction (GEC) and shows promising performance. However, the Seq2Seq GEC approach still suffers from two issues. First, a Seq2Seq GEC model can only be trained on parallel data, which, in GEC task, is often noisy and limited in quantity. Second, the decoder of a Seq2Seq GEC model lacks an explicit awareness of the correctness of the token being generated. In this paper, we propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally, and then dynamically influence the choice of the next token. We discover and investigate two types of critics: a pre-trained left-to-right language model critic and an incremental target-side grammatical error detector critic. Through extensive experiments on English and Chinese datasets, our framework consistently outperforms strong baselines and achieves results competitive with state-of-the-art methods.Comment: Accept to Findings of EMNLP 202

arXiv.org e-Print Archive

Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data

Author: Grundkiewicz Roman
Heafield Kenneth
Junczys-Dowmuntz Marcin
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction

Author: Li Yangning
Li Yinghui
Ye Jingheng
Zheng Hai-Tao
Publication venue
Publication date: 17/10/2023
Field of study

Data Augmentation through generating pseudo data has been proven effective in mitigating the challenge of data scarcity in the field of Grammatical Error Correction (GEC). Various augmentation strategies have been widely explored, most of which are motivated by two heuristics, i.e., increasing the distribution similarity and diversity of pseudo data. However, the underlying mechanism responsible for the effectiveness of these strategies remains poorly understood. In this paper, we aim to clarify how data augmentation improves GEC models. To this end, we introduce two interpretable and computationally efficient measures: Affinity and Diversity. Our findings indicate that an excellent GEC data augmentation strategy characterized by high Affinity and appropriate Diversity can better improve the performance of GEC models. Based on this observation, we propose MixEdit, a data augmentation approach that strategically and dynamically augments realistic data, without requiring extra monolingual corpora. To verify the correctness of our findings and the effectiveness of the proposed MixEdit, we conduct experiments on mainstream English and Chinese GEC datasets. The results show that MixEdit substantially improves GEC models and is complementary to traditional data augmentation methods.Comment: Accepted to Findings of EMNLP 202

arXiv.org e-Print Archive

Argument mining: A machine learning perspective

Author: A Peldszus
CI Chesñevar
D Walton
D Walton
E Black
E Cabrio
GR Simari
H Mercier
JB Freeman
JL Pollock
L Getoor
L Getoor
M Mäs
P Besnard
P Saint-Dizier
PM Dung
R Mochales
S Gabbriellini
SE Toulmin
SJ Pan
T Goudas
TJM Bench-Capon
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Argument mining has recently become a hot topic, attracting the interests of several and diverse research communities, ranging from artificial intelligence, to computational linguistics, natural language processing, social and philosophical sciences. In this paper, we attempt to describe the problems and challenges of argument mining from a machine learning angle. In particular, we advocate that machine learning techniques so far have been under-exploited, and that a more proper standardization of the problem, also with regards to the underlying argument model, could provide a crucial element to develop better systems

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Read & Improve: A Novel Reading Tutoring System

Author: Kochmar Ekaterina
Watson Rebecca
Publication venue
Publication date: 26/09/2021
Field of study

We introduce a new readability tutoring system, Read & Improve, a freely available online resource aimed at supporting learners of English and English Language Teaching (ELT) professionals by improving English learners’ reading proficiency. Using a combination of machine learning approaches and natural language processing techniques, Read & Improve detects learning needs of every student and makes sure no learner is left behind by identifying reading content at an appropriate level of readability and helping learners acquire new words through accessible dictionary definitions and content exploration functionality

OPUS