2 research outputs found
Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning
In sequence-to-sequence learning, the decoder relies on the attention
mechanism to efficiently extract information from the encoder. While it is
common practice to draw information from only the last encoder layer, recent
work has proposed to use representations from different encoder layers for
diversified levels of information. Nonetheless, the decoder still obtains only
a single view of the source sequences, which might lead to insufficient
training of the encoder layer stack due to the hierarchy bypassing problem. In
this work, we propose layer-wise cross-view decoding, where for each decoder
layer, together with the representations from the last encoder layer, which
serve as a global view, those from other encoder layers are supplemented for a
stereoscopic view of the source sequences. Systematic experiments show that we
successfully address the hierarchy bypassing problem and substantially improve
the performance of sequence-to-sequence learning with deep representations on
diverse tasks.Comment: 9 pages, 6 figure
Understanding and Enhancing the Use of Context for Machine Translation
To understand and infer meaning in language, neural models have to learn
complicated nuances. Discovering distinctive linguistic phenomena from data is
not an easy task. For instance, lexical ambiguity is a fundamental feature of
language which is challenging to learn. Even more prominently, inferring the
meaning of rare and unseen lexical units is difficult with neural networks.
Meaning is often determined from context. With context, languages allow meaning
to be conveyed even when the specific words used are not known by the reader.
To model this learning process, a system has to learn from a few instances in
context and be able to generalize well to unseen cases. The learning process is
hindered when training data is scarce for a task. Even with sufficient data,
learning patterns for the long tail of the lexical distribution is challenging.
In this thesis, we focus on understanding certain potentials of contexts in
neural models and design augmentation models to benefit from them. We focus on
machine translation as an important instance of the more general language
understanding problem. To translate from a source language to a target
language, a neural model has to understand the meaning of constituents in the
provided context and generate constituents with the same meanings in the target
language. This task accentuates the value of capturing nuances of language and
the necessity of generalization from few observations. The main problem we
study in this thesis is what neural machine translation models learn from data
and how we can devise more focused contexts to enhance this learning. Looking
more in-depth into the role of context and the impact of data on learning
models is essential to advance the NLP field. Moreover, it helps highlight the
vulnerabilities of current neural networks and provides insights into designing
more robust models.Comment: PhD dissertation defended on November 10th, 202