37 research outputs found

    Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

    Get PDF
    Villiers André. Deux nouveaux Cerambycinae de l'Ile de la Réunion [Col. Cerambycidae]. In: Bulletin de la Société entomologique de France, volume 75 (3-4), Mars-avril 1970. pp. 81-84

    Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

    Get PDF
    Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.Comment: 11 pages, 5 figures, accepted by EMNLP 2018 (v2: corrected author names; v3: fix to CNN context-window size, and new post-publication experiments in section 6
    corecore