Search CORE

272 research outputs found

Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data

Author: Grundkiewicz Roman
Heafield Kenneth
Junczys-Dowmuntz Marcin
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Beyond Hard Samples: Robust and Effective Grammatical Error Correction with Cycle Self-Augmenting

Author: Li Juntao
Qi Kaifeng
Tang Zecheng
Zhang Min
Publication venue
Publication date: 23/10/2023
Field of study

Recent studies have revealed that grammatical error correction methods in the sequence-to-sequence paradigm are vulnerable to adversarial attack, and simply utilizing adversarial examples in the pre-training or post-training process can significantly enhance the robustness of GEC models to certain types of attack without suffering too much performance loss on clean data. In this paper, we further conduct a thorough robustness evaluation of cutting-edge GEC methods for four different types of adversarial attacks and propose a simple yet very effective Cycle Self-Augmenting (CSA) method accordingly. By leveraging the augmenting data from the GEC models themselves in the post-training process and introducing regularization data for cycle training, our proposed method can effectively improve the model robustness of well-trained GEC models with only a few more training epochs as an extra cost. More concretely, further training on the regularization data can prevent the GEC models from over-fitting on easy-to-learn samples and thus can improve the generalization capability and robustness towards unseen data (adversarial noise/samples). Meanwhile, the self-augmented data can provide more high-quality pseudo pairs to improve model performance on the original testing data. Experiments on four benchmark datasets and seven strong models indicate that our proposed training method can significantly enhance the robustness of four types of attacks without using purposely built adversarial examples in training. Evaluation results on clean data further confirm that our proposed CSA method significantly improves the performance of four baselines and yields nearly comparable results with other state-of-the-art models. Our code is available at https://github.com/ZetangForward/CSA-GEC

arXiv.org e-Print Archive

Recommended from our members

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

Author: Stahlberg Felix
Publication venue: University of Cambridge
Publication date: 17/02/2020
Field of study

With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1 EPSRC Tier-2 capital grant EP/P020259/

Apollo (Cambridge)

Neural Combinatory Constituency Parsing

Author: CHEN Zhousi
チンチュウシ
陳宙斯
Publication venue
Publication date: 25/03/2023
Field of study

東京都立大学Tokyo Metropolitan University博士（情報科学）doctoral thesi

Tokyo Metropolitan University Institutional Repository Miyako-Dori / 首都大学東京機関リポジトリ

Decoding linguistic information from EEG signals

Author: Murphy Alex
Publication venue
Publication date: 05/12/2022
Field of study

For many years, the fields of the cognitive neuroscience of language and natural language processing (NLP) have been relatively distinct and non-overlapping. Recent breakthrough research is starting to show that these two fields, in their common goal towards understanding and modelling language, have a lot to offer each other. As developments in machine learning continue to break into new ground, due largely in part to the successful development of novel classifiers that can be efficiently trained to model highly nonlinear dynamic systems, such as language, the open question is how well these models perform on human neural signals during language processing. Recent results are beginning to show that various types of human signals (eye-tracking, fMRI, MEG) can successfully model various linguistic aspects of what is being concurrently processed by the brain. EEG is a cheap and relatively accessible way to access neural signals and this thesis explores the extent to which decoding of EEG data, using state-of-the-art models common in NLP, to carry out this task. Critically, an important foundation needs to be in place that can fully explore the types of linguistic signal that is decodable with EEG. This thesis attempts to answer this question, setting the stage for joint modelling of text and neural signals to advance the field of NLP. This research is also of interest to cognitive neuroscientists as the data collected for this thesis will be openly accessible to all, with accompanying linguistic annotation, which can help to answer various questions about the spatiotemporal dynamics during the reading of naturalistic texts. In Chapter 1, I provide an overview of the major literature that has investigated the status of linguistic processing from neural signals, setting the research question in the correct historical context. This literature review serves as the basis for the two experimental chapters which follow and is thus subdivided into two main sections. Chapter 2 explores the various aspects of linguistic processing which are decodable from the novel EEG dataset collected for this thesis, with a strong emphasis on controlling for potential confounds as much as possible. Using a novel machine learning classifier, I show that with specialised training methods, generalisation to novel data relating to part-of-speech decoding is possible. In Chapter 3, the preprocessing steps involved in preparing the data are examined, in which I show that depending on the modelling goal, some steps are particularly useful to boost performance of linguistic decoding of EEG stimuli. Finally, in Chapter 4, a broad review of the results, their implications and limitations are considered

University of Birmingham Research Archive, E-theses Repository