22,236 research outputs found
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
Most of the Neural Machine Translation (NMT) models are based on the
sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped
with the attention mechanism. However, the conventional attention mechanism
treats the decoding at each time step equally with the same matrix, which is
problematic since the softness of the attention for different types of words
(e.g. content words and function words) should differ. Therefore, we propose a
new model with a mechanism called Self-Adaptive Control of Temperature (SACT)
to control the softness of attention by means of an attention temperature.
Experimental results on the Chinese-English translation and English-Vietnamese
translation demonstrate that our model outperforms the baseline models, and the
analysis and the case study show that our model can attend to the most relevant
elements in the source-side contexts and generate the translation of high
quality.Comment: To appear in EMNLP 201
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks in an encoder-decoder configuration. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer, based
solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to be
superior in quality while being more parallelizable and requiring significantly
less time to train. Our model achieves 28.4 BLEU on the WMT 2014
English-to-German translation task, improving over the existing best results,
including ensembles by over 2 BLEU. On the WMT 2014 English-to-French
translation task, our model establishes a new single-model state-of-the-art
BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction
of the training costs of the best models from the literature. We show that the
Transformer generalizes well to other tasks by applying it successfully to
English constituency parsing both with large and limited training data.Comment: 15 pages, 5 figure
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
We introduce a Multi-modal Neural Machine Translation model in which a
doubly-attentive decoder naturally incorporates spatial visual features
obtained using pre-trained convolutional neural networks, bridging the gap
between image description and translation. Our decoder learns to attend to
source-language words and parts of an image independently by means of two
separate attention mechanisms as it generates words in the target language. We
find that our model can efficiently exploit not just back-translated in-domain
multi-modal data but also large general-domain text-only MT corpora. We also
report state-of-the-art results on the Multi30k data set.Comment: 8 pages (11 including references), 2 figure
- …