1,578 research outputs found
Attending to Future Tokens For Bidirectional Sequence Generation
Neural sequence generation is typically performed token-by-token and
left-to-right. Whenever a token is generated only previously produced tokens
are taken into consideration. In contrast, for problems such as sequence
classification, bidirectional attention, which takes both past and future
tokens into consideration, has been shown to perform much better. We propose to
make the sequence generation process bidirectional by employing special
placeholder tokens. Treated as a node in a fully connected graph, a placeholder
token can take past and future tokens into consideration when generating the
actual output token. We verify the effectiveness of our approach experimentally
on two conversational tasks where the proposed bidirectional model outperforms
competitive baselines by a large margin.Comment: Conference on Empirical Methods in Natural Language Processing
(EMNLP), 2019, Hong Kong, Chin
Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation
This paper proposes a hierarchical attentional neural translation model which
focuses on enhancing source-side hierarchical representations by covering both
local and global semantic information using a bidirectional tree-based encoder.
To maximize the predictive likelihood of target words, a weighted variant of an
attention mechanism is used to balance the attentive information between
lexical and phrase vectors. Using a tree-based rare word encoding, the proposed
model is extended to sub-word level to alleviate the out-of-vocabulary (OOV)
problem. Empirical results reveal that the proposed model significantly
outperforms sequence-to-sequence attention-based and tree-based neural
translation models in English-Chinese translation tasks.Comment: Accepted for publication at EMNLP 201
Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder
We investigate the integration of a planning mechanism into an
encoder-decoder architecture with an explicit alignment for character-level
machine translation. We develop a model that plans ahead when it computes
alignments between the source and target sequences, constructing a matrix of
proposed future alignments and a commitment vector that governs whether to
follow or recompute the plan. This mechanism is inspired by the strategic
attentive reader and writer (STRAW) model. Our proposed model is end-to-end
trainable with fully differentiable operations. We show that it outperforms a
strong baseline on three character-level decoder neural machine translation on
WMT'15 corpus. Our analysis demonstrates that our model can compute
qualitatively intuitive alignments and achieves superior performance with fewer
parameters.Comment: Accepted to Rep4NLP 2017 Workshop at ACL 2017 Conferenc
- …