272 research outputs found
Beyond Hard Samples: Robust and Effective Grammatical Error Correction with Cycle Self-Augmenting
Recent studies have revealed that grammatical error correction methods in the
sequence-to-sequence paradigm are vulnerable to adversarial attack, and simply
utilizing adversarial examples in the pre-training or post-training process can
significantly enhance the robustness of GEC models to certain types of attack
without suffering too much performance loss on clean data. In this paper, we
further conduct a thorough robustness evaluation of cutting-edge GEC methods
for four different types of adversarial attacks and propose a simple yet very
effective Cycle Self-Augmenting (CSA) method accordingly. By leveraging the
augmenting data from the GEC models themselves in the post-training process and
introducing regularization data for cycle training, our proposed method can
effectively improve the model robustness of well-trained GEC models with only a
few more training epochs as an extra cost. More concretely, further training on
the regularization data can prevent the GEC models from over-fitting on
easy-to-learn samples and thus can improve the generalization capability and
robustness towards unseen data (adversarial noise/samples). Meanwhile, the
self-augmented data can provide more high-quality pseudo pairs to improve model
performance on the original testing data. Experiments on four benchmark
datasets and seven strong models indicate that our proposed training method can
significantly enhance the robustness of four types of attacks without using
purposely built adversarial examples in training. Evaluation results on clean
data further confirm that our proposed CSA method significantly improves the
performance of four baselines and yields nearly comparable results with other
state-of-the-art models. Our code is available at
https://github.com/ZetangForward/CSA-GEC
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
Neural Combinatory Constituency Parsing
東京都立大学Tokyo Metropolitan University博士(情報科学)doctoral thesi
Decoding linguistic information from EEG signals
For many years, the fields of the cognitive neuroscience of language and natural language processing (NLP) have been relatively distinct and non-overlapping. Recent breakthrough research is starting to show that these two fields, in their common goal towards understanding and modelling language, have a lot to offer each other. As developments in machine learning continue to break into new ground, due largely in part to the successful development of novel classifiers that can be efficiently trained to model highly nonlinear dynamic systems, such as language, the open question is how well these models perform on human neural signals during language processing. Recent results are beginning to show that various types of human signals (eye-tracking, fMRI, MEG) can successfully model various linguistic aspects of what is being concurrently processed by the brain. EEG is a cheap and relatively accessible way to access neural signals and this thesis explores the extent to which decoding of EEG data, using state-of-the-art models common in NLP, to carry out this task. Critically, an important foundation needs to be in place that can fully explore the types of linguistic signal that is decodable with EEG. This thesis attempts to answer this question, setting the stage for joint modelling of text and neural signals to advance the field of NLP. This research is also of interest to cognitive neuroscientists as the data collected for this thesis will be openly accessible to all, with accompanying linguistic annotation, which can help to answer various questions about the spatiotemporal dynamics during the reading of naturalistic texts. In Chapter 1, I provide an overview of the major literature that has investigated the status of linguistic processing from neural signals, setting the research question in the correct historical context. This literature review serves as the basis for the two experimental chapters which follow and is thus subdivided into two main sections. Chapter 2 explores the various aspects of linguistic processing which are decodable from the novel EEG dataset collected for this thesis, with a strong emphasis on controlling for potential confounds as much as possible. Using a novel machine learning classifier, I show that with specialised training methods, generalisation to novel data relating to part-of-speech decoding is possible. In Chapter 3, the preprocessing steps involved in preparing the data are examined, in which I show that depending on the modelling goal, some steps are particularly useful to boost performance of linguistic decoding of EEG stimuli. Finally, in Chapter 4, a broad review of the results, their implications and limitations are considered
- …