1 research outputs found
Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)
We implement a Tensor Train layer in the TensorFlow Neural Machine
Translation (NMT) model using the t3f library. We perform training runs on the
IWSLT English-Vietnamese '15 and WMT German-English '16 datasets with learning
rates , maximum ranks and a
range of core dimensions. We compare against a target BLEU test score of 24.0,
obtained by our benchmark run. For the IWSLT English-Vietnamese training, we
obtain BLEU test/dev scores of 24.0/21.9 and 24.2/21.9 using core dimensions
with learning rate 0.0012 and rank
distributions and respectively. These runs use 113\%
and 397\% of the flops of the benchmark run respectively. We find that, of the
parameters surveyed, a higher learning rate and more `rectangular' core
dimensions generally produce higher BLEU scores. For the WMT German-English
dataset, we obtain BLEU scores of 24.0/23.8 using core dimensions with learning rate 0.0012 and rank distribution
. We discuss the potential for future optimization and application
of Tensor Train decomposition to other NMT models.Comment: 10 pages, 2 table