42 research outputs found
Competence-based Curriculum Learning for Neural Machine Translation
Current state-of-the-art NMT systems use large neural networks that are not
only slow to train, but also often require many heuristics and optimization
tricks, such as specialized learning rate schedules and large batch sizes. This
is undesirable as it requires extensive hyperparameter tuning. In this paper,
we propose a curriculum learning framework for NMT that reduces training time,
reduces the need for specialized heuristics or large batch sizes, and results
in overall better performance. Our framework consists of a principled way of
deciding which training samples are shown to the model at different times
during training, based on the estimated difficulty of a sample and the current
competence of the model. Filtering training samples in this manner prevents the
model from getting stuck in bad local optima, making it converge faster and
reach a better solution than the common approach of uniformly sampling training
examples. Furthermore, the proposed method can be easily applied to existing
NMT models by simply modifying their input data pipelines. We show that our
framework can help improve the training time and the performance of both
recurrent neural network models and Transformers, achieving up to a 70%
decrease in training time, while at the same time obtaining accuracy
improvements of up to 2.2 BLEU
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Transfer learning has been proven as an effective technique for neural
machine translation under low-resource conditions. Existing methods require a
common target language, language relatedness, or specific training tricks and
regimes. We present a simple transfer learning method, where we first train a
"parent" model for a high-resource language pair and then continue the training
on a lowresource pair only by replacing the training corpus. This "child" model
performs significantly better than the baseline trained for lowresource pair
only. We are the first to show this for targeting different languages, and we
observe the improvements even for unrelated languages with different alphabets.Comment: Accepted to WMT18 reseach paper, Proceedings of the 3rd Conference on
Machine Translation 201
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation
We consider the problem of making efficient use of heterogeneous training
data in neural machine translation (NMT). Specifically, given a training
dataset with a sentence-level feature such as noise, we seek an optimal
curriculum, or order for presenting examples to the system during training. Our
curriculum framework allows examples to appear an arbitrary number of times,
and thus generalizes data weighting, filtering, and fine-tuning schemes. Rather
than relying on prior knowledge to design a curriculum, we use reinforcement
learning to learn one automatically, jointly with the NMT system, in the course
of a single training run. We show that this approach can beat uniform and
filtering baselines on Paracrawl and WMT English-to-French datasets by up to
+3.4 BLEU, and match the performance of a hand-designed, state-of-the-art
curriculum.Comment: NAACL 2019 short paper. Reviewer comments not yet addresse
Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation
Traditional Neural machine translation (NMT) involves a fixed training
procedure where each sentence is sampled once during each epoch. In reality,
some sentences are well-learned during the initial few epochs; however, using
this approach, the well-learned sentences would continue to be trained along
with those sentences that were not well learned for 10-30 epochs, which results
in a wastage of time. Here, we propose an efficient method to dynamically
sample the sentences in order to accelerate the NMT training. In this approach,
a weight is assigned to each sentence based on the measured difference between
the training costs of two iterations. Further, in each epoch, a certain
percentage of sentences are dynamically sampled according to their weights.
Empirical results based on the NIST Chinese-to-English and the WMT
English-to-German tasks depict that the proposed method can significantly
accelerate the NMT training and improve the NMT performance.Comment: Revised version of ACL-201
Results of the WMT17 Neural MT Training Task
This paper presents the results of the WMT17 Neural MT Training Task.
The objective of this task is to explore the methods of training a fixed neural architecture, aiming primarily at the best translation quality and, as a secondary goal, shorter training time.
Task participants were provided with a complete neural machine translation system, fixed training data and the configuration of the network.
The translation was performed in the English-to-Czech direction and the task was divided into two subtasks of different configurations - one scaled to fit on a 4GB and another on an 8GB GPU card.
We received 3 submissions for the 4GB variant and 1 submission for the 8GB variant; we provided also our run for each of the sizes and two baselines.
We translated the test set with the trained models and evaluated the outputs using several automatic metrics.
We also report results of the human evaluation of the submitted systems