37 research outputs found
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Villiers André. Deux nouveaux Cerambycinae de l'Ile de la Réunion [Col. Cerambycidae]. In: Bulletin de la Société entomologique de France, volume 75 (3-4), Mars-avril 1970. pp. 81-84
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Recently, non-recurrent architectures (convolutional, self-attentional) have
outperformed RNNs in neural machine translation. CNNs and self-attentional
networks can connect distant words via shorter network paths than RNNs, and it
has been speculated that this improves their ability to model long-range
dependencies. However, this theoretical argument has not been tested
empirically, nor have alternative explanations for their strong performance
been explored in-depth. We hypothesize that the strong performance of CNNs and
self-attentional networks could also be due to their ability to extract
semantic features from the source text, and we evaluate RNNs, CNNs and
self-attention networks on two tasks: subject-verb agreement (where capturing
long-range dependencies is required) and word sense disambiguation (where
semantic feature extraction is required). Our experimental results show that:
1) self-attentional networks and CNNs do not outperform RNNs in modeling
subject-verb agreement over long distances; 2) self-attentional networks
perform distinctly better than RNNs and CNNs on word sense disambiguation.Comment: 11 pages, 5 figures, accepted by EMNLP 2018 (v2: corrected author
names; v3: fix to CNN context-window size, and new post-publication
experiments in section 6