4 research outputs found
Unfolding and Shrinking Neural Machine Translation Ensembles
Ensembling is a well-known technique in neural machine translation (NMT) to
improve system performance. Instead of a single neural net, multiple neural
nets with the same topology are trained separately, and the decoder generates
predictions by averaging over the individual models. Ensembling often improves
the quality of the generated translations drastically. However, it is not
suitable for production systems because it is cumbersome and slow. This work
aims to reduce the runtime to be on par with a single system without
compromising the translation quality. First, we show that the ensemble can be
unfolded into a single large neural network which imitates the output of the
ensemble system. We show that unfolding can already improve the runtime in
practice since more work can be done on the GPU. We proceed by describing a set
of techniques to shrink the unfolded network by reducing the dimensionality of
layers. On Japanese-English we report that the resulting network has the size
and decoding speed of a single NMT network but performs on the level of a
3-ensemble system.Comment: Accepted at EMNLP 201
Speed-constrained tuning for statistical machine translation using Bayesian optimization
We address the problem of automatically finding the parameters of a statistical machine translation system that maximize BLEU scores while ensuring that decoding speed exceeds a minimum value. We propose the use of Bayesian Optimization to efficiently tune the speed-related decoding parameters by easily incorporating speed as a noisy constraint function. The obtained parameter values are guaranteed to satisfy the speed constraint with an associated confidence margin. Across three language pairs and two speed constraint values, we report overall optimization time reduction compared to grid and random search. We also show that Bayesian Optimization can decouple speed and BLEU measurements, resulting in a further reduction of overall optimization time as speed is measured over a small subset of sentences