Search CORE

10 research outputs found

Regularization techniques for fine-tuning in neural machine translation

Author: Germann Ulrich
Haddow Barry
Miceli Barone Antonio
Sennrich Rico
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset. In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2-regularization towards an out-of-domain prior. In addition, we introduce tuneout, a novel regularization technique inspired by dropout. We apply these techniques, alone and in combination, to neural machine translation, obtaining improvements on IWSLT datasets for English->German and English->Russian. We also investigate the amounts of in-domain training data needed for domain adaptation in NMT, and find a logarithmic relationship between the amount of training data and gain in BLEU score.Comment: EMNLP 2017 short paper; for bibtex, see http://homepages.inf.ed.ac.uk/rsennric/bib.html#micelibarone2017

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Improving Machine Translation of Educational Content via Crowdsourcing

Author: Behnke Maximiliana
Castilho Sheila
Egg Markus
Gaspari Federico
Georgakopoulou Panayota
Kermanidis Katia Lida
Kordoni Valia
Miceli Barone Antonio Valerio
Naskos Thanasis
Sennrich Rico
Sosoni Vilelmini
Stasimioti Maria
Takoulidou Eirini
van Zaanan Menno
Publication venue
Publication date: 01/01/2018
Field of study

The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora

Archivio della ricerca - Università degli studi di Napoli Federico II

Irish Universities

Edinburgh Research Explorer

DCU Online Research Access Service

Tilburg University Repository

Explicit Inductive Bias for Transfer Learning with Convolutional Networks

Author: Davoine Franck
Grandvalet Yves
Li Xuhong
Publication venue: HAL CCSD
Publication date: 10/07/2018
Field of study

International audienceIn inductive transfer learning, fine-tuning pre-trained convolutional networks substantially out-performs training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper , we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple L 2 penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks

Explicit Inductive Bias for Transfer Learning with Convolutional Networks

Author: Davoine Franck
Grandvalet Yves
Li Xuhong
Publication venue
Publication date: 06/06/2018
Field of study

In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple

L^2

penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.Comment: Accepted at ICML 201

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Overcoming Data Challenges in Machine Translation

Author: Khayrallah Huda
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 16/09/2021
Field of study

Data-driven machine translation paradigms—which use machine learning to create translation models that can automatically translate from one language to another—have the potential to enable seamless communication across language barriers, and improve global information access. For this to become a reality, machine translation must be available for all languages and styles of text. However, the translation quality of these models is sensitive to the quality and quantity of the data the models are trained on. In this dissertation we address and analyze challenges arising from this sensitivity; we present methods that improve translation quality in difficult data settings, and analyze the effect of data quality on machine translation quality. Machine translation models are typically trained on parallel corpora, but limited quantities of such data are available for most language pairs, leading to a low resource problem. We present a method for transfer learning from a paraphraser to overcome data sparsity in low resource settings. Even when training data is available in the desired language pair, it is frequently of a different style or genre than we would like to translate—leading to a domain mismatch. We present a method for improving domain adaptation translation quality. A seemingly obvious approach when faced with a lack of data is to acquire more data. However, it is not always feasible to produce additional human translations. In such a case, an option may be to crawl the web for additional training data. However, as we demonstrate, such data can be very noisy and harm machine translation quality. Our analysis motivated subsequent work on data filtering and cleaning by the broader community. The contributions in this dissertation not only improve translation quality in difficult data settings, but also serve as a reminder to carefully consider the impact of the data when training machine learning models

JScholarship