300 research outputs found
Improving Seq2Seq Grammatical Error Correction via Decoding Interventions
The sequence-to-sequence (Seq2Seq) approach has recently been widely used in
grammatical error correction (GEC) and shows promising performance. However,
the Seq2Seq GEC approach still suffers from two issues. First, a Seq2Seq GEC
model can only be trained on parallel data, which, in GEC task, is often noisy
and limited in quantity. Second, the decoder of a Seq2Seq GEC model lacks an
explicit awareness of the correctness of the token being generated. In this
paper, we propose a unified decoding intervention framework that employs an
external critic to assess the appropriateness of the token to be generated
incrementally, and then dynamically influence the choice of the next token. We
discover and investigate two types of critics: a pre-trained left-to-right
language model critic and an incremental target-side grammatical error detector
critic. Through extensive experiments on English and Chinese datasets, our
framework consistently outperforms strong baselines and achieves results
competitive with state-of-the-art methods.Comment: Accept to Findings of EMNLP 202
MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction
Data Augmentation through generating pseudo data has been proven effective in
mitigating the challenge of data scarcity in the field of Grammatical Error
Correction (GEC). Various augmentation strategies have been widely explored,
most of which are motivated by two heuristics, i.e., increasing the
distribution similarity and diversity of pseudo data. However, the underlying
mechanism responsible for the effectiveness of these strategies remains poorly
understood. In this paper, we aim to clarify how data augmentation improves GEC
models. To this end, we introduce two interpretable and computationally
efficient measures: Affinity and Diversity. Our findings indicate that an
excellent GEC data augmentation strategy characterized by high Affinity and
appropriate Diversity can better improve the performance of GEC models. Based
on this observation, we propose MixEdit, a data augmentation approach that
strategically and dynamically augments realistic data, without requiring extra
monolingual corpora. To verify the correctness of our findings and the
effectiveness of the proposed MixEdit, we conduct experiments on mainstream
English and Chinese GEC datasets. The results show that MixEdit substantially
improves GEC models and is complementary to traditional data augmentation
methods.Comment: Accepted to Findings of EMNLP 202
Argument mining: A machine learning perspective
Argument mining has recently become a hot topic, attracting the interests of several and diverse research communities, ranging from artificial intelligence, to computational linguistics, natural language processing, social and philosophical sciences. In this paper, we attempt to describe the problems and challenges of argument mining from a machine learning angle. In particular, we advocate that machine learning techniques so far have been under-exploited, and that a more proper standardization of the problem, also with regards to the underlying argument model, could provide a crucial element to develop better systems
Read & Improve: A Novel Reading Tutoring System
We introduce a new readability tutoring system, Read & Improve, a freely available online resource aimed at supporting learners of English and English Language Teaching (ELT) professionals by improving English learners’ reading proficiency. Using a combination of machine learning approaches and natural language processing techniques, Read & Improve detects learning needs of every student and makes sure no learner is left behind by identifying reading content at an appropriate level of readability and helping learners acquire new words through accessible dictionary definitions and content exploration functionality
- …