1,892 research outputs found
Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
Recent studies have revealed a number of pathologies of neural machine
translation (NMT) systems. Hypotheses explaining these mostly suggest that
there is something fundamentally wrong with NMT as a model or its training
algorithm, maximum likelihood estimation (MLE). Most of this evidence was
gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at
identifying the highest-scoring translation, i.e. the mode, under the model
distribution. We argue that the evidence corroborates the inadequacy of MAP
decoding more than casts doubt on the model and its training algorithm. In this
work, we criticise NMT models probabilistically showing that stochastic samples
following the model's own generative story do reproduce various statistics of
the training data well, but that it is beam search that strays from such
statistics. We show that some of the known pathologies of NMT are due to MAP
decoding and not to NMT's statistical assumptions nor MLE. In particular, we
show that the most likely translations under the model accumulate so little
probability mass that the mode can be considered essentially arbitrary. We
therefore advocate for the use of decision rules that take into account
statistics gathered from the model distribution holistically. As a proof of
concept we show that a straightforward implementation of minimum Bayes risk
decoding gives good results outperforming beam search using as little as 30
samples, confirming that MLE-trained NMT models do capture important aspects of
translation well in expectation
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference
Reinforcement learning from human feedback (RLHF) is a recent technique to
improve the quality of the text generated by a language model, making it closer
to what humans would generate. A core ingredient in RLHF's success in aligning
and improving large language models (LLMs) is its reward model, trained using
human feedback on model outputs. In machine translation (MT), where metrics
trained from human annotations can readily be used as reward models, recent
methods using minimum Bayes risk decoding and reranking have succeeded in
improving the final quality of translation. In this study, we comprehensively
explore and compare techniques for integrating quality metrics as reward models
into the MT pipeline. This includes using the reward model for data filtering,
during the training phase through RL, and at inference time by employing
reranking techniques, and we assess the effects of combining these in a unified
approach. Our experimental results, conducted across multiple translation
tasks, underscore the crucial role of effective data filtering, based on
estimated quality, in harnessing the full potential of RL in enhancing MT
quality. Furthermore, our findings demonstrate the effectiveness of combining
RL training with reranking techniques, showcasing substantial improvements in
translation quality.Comment: 14 pages, work-in-progres
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
- …