10 research outputs found
Regularization techniques for fine-tuning in neural machine translation
We investigate techniques for supervised domain adaptation for neural machine
translation where an existing model trained on a large out-of-domain dataset is
adapted to a small in-domain dataset. In this scenario, overfitting is a major
challenge. We investigate a number of techniques to reduce overfitting and
improve transfer learning, including regularization techniques such as dropout
and L2-regularization towards an out-of-domain prior. In addition, we introduce
tuneout, a novel regularization technique inspired by dropout. We apply these
techniques, alone and in combination, to neural machine translation, obtaining
improvements on IWSLT datasets for English->German and English->Russian. We
also investigate the amounts of in-domain training data needed for domain
adaptation in NMT, and find a logarithmic relationship between the amount of
training data and gain in BLEU score.Comment: EMNLP 2017 short paper; for bibtex, see
http://homepages.inf.ed.ac.uk/rsennric/bib.html#micelibarone2017
Improving Machine Translation of Educational Content via Crowdsourcing
The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation
models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of
using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a
lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain
by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine
translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected
with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with
pre-existing in-domain corpora
Explicit Inductive Bias for Transfer Learning with Convolutional Networks
International audienceIn inductive transfer learning, fine-tuning pre-trained convolutional networks substantially out-performs training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper , we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple L 2 penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks
Explicit Inductive Bias for Transfer Learning with Convolutional Networks
In inductive transfer learning, fine-tuning pre-trained convolutional
networks substantially outperforms training from scratch. When using
fine-tuning, the underlying assumption is that the pre-trained model extracts
generic features, which are at least partially relevant for solving the target
task, but would be difficult to extract from the limited amount of data
available on the target task. However, besides the initialization with the
pre-trained model and the early stopping, there is no mechanism in fine-tuning
for retaining the features learned on the source task. In this paper, we
investigate several regularization schemes that explicitly promote the
similarity of the final solution with the initial model. We show the benefit of
having an explicit inductive bias towards the initial model, and we eventually
recommend a simple penalty with the pre-trained model being a reference
as the baseline of penalty for transfer learning tasks.Comment: Accepted at ICML 201
Overcoming Data Challenges in Machine Translation
Data-driven machine translation paradigms—which use machine learning to create translation models that can automatically translate from one language to another—have the potential to enable seamless communication across language barriers, and improve global information access. For this to become a reality, machine translation must be available for all languages and styles of text. However, the translation quality of these models is sensitive to the quality and quantity of the data the models are trained on. In this dissertation we address and analyze challenges arising from this sensitivity; we present methods that improve translation quality in difficult data settings, and analyze the effect of data quality on machine translation quality.
Machine translation models are typically trained on parallel corpora, but limited quantities of such data are available for most language pairs, leading to a low resource problem. We present a method for transfer learning from a paraphraser to overcome data sparsity in low resource settings. Even when training data is available in the desired language pair, it is frequently of a different style or genre than we would like to translate—leading to a domain mismatch. We present a method for improving domain adaptation translation quality.
A seemingly obvious approach when faced with a lack of data is to acquire more data. However, it is not always feasible to produce additional human translations. In such a case, an option may be to crawl the web for additional training data. However, as we demonstrate, such data can be very noisy and harm machine translation quality. Our analysis motivated subsequent work on data filtering and cleaning by the broader community.
The contributions in this dissertation not only improve translation quality in difficult data settings, but also serve as a reminder to carefully consider the impact of the data when training machine learning models