Search CORE

4 research outputs found

Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning

Author: Liu Fenglin
Ren Xuancheng
Sun Xu
Zhao Guangxiang
Publication venue
Publication date: 04/07/2021
Field of study

In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise cross-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments show that we successfully address the hierarchy bypassing problem and substantially improve the performance of sequence-to-sequence learning with deep representations on diverse tasks.Comment: 9 pages, 6 figure

arXiv.org e-Print Archive

Neuron Interaction Based Representation Composition for Neural Machine Translation

Author: Li Jian
Lyu Michael R.
Shi Shuming
Tu Zhaopeng
Wang Xing
Yang Baosong
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 22/11/2019
Field of study

Recent NLP studies reveal that substantial linguistic information can be attributed to single neurons, i.e., individual dimensions of the representation vectors. We hypothesize that modeling strong interactions among neurons helps to better capture complex information by composing the linguistic properties embedded in individual neurons. Starting from this intuition, we propose a novel approach to compose representations learned by different components in neural machine translation (e.g., multi-layer networks or multi-head attention), based on modeling strong interactions among neurons in the representation vectors. Specifically, we leverage bilinear pooling to model pairwise multiplicative interactions among individual neurons, and a low-rank approximation to make the model computationally feasible. We further propose extended bilinear pooling to incorporate first-order representations. Experiments on WMT14 English⇒German and English⇒French translation tasks show that our model consistently improves performances over the SOTA Transformer baseline. Further analyses demonstrate that our approach indeed captures more syntactic and semantic information as expected

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications