173,659 research outputs found
Assessing the Ability of Self-Attention Networks to Learn Word Order
Self-attention networks (SAN) have attracted a lot of interests due to their
high parallelization and strong performance on a variety of NLP tasks, e.g.
machine translation. Due to the lack of recurrence structure such as recurrent
neural networks (RNN), SAN is ascribed to be weak at learning positional
information of words for sequence modeling. However, neither this speculation
has been empirically confirmed, nor explanations for their strong performances
on machine translation tasks when "lacking positional information" have been
explored. To this end, we propose a novel word reordering detection task to
quantify how well the word order information learned by SAN and RNN.
Specifically, we randomly move one word to another position, and examine
whether a trained model can detect both the original and inserted positions.
Experimental results reveal that: 1) SAN trained on word reordering detection
indeed has difficulty learning the positional information even with the
position embedding; and 2) SAN trained on machine translation learns better
positional information than its RNN counterpart, in which position embedding
plays a critical role. Although recurrence structure make the model more
universally-effective on learning word order, learning objectives matter more
in the downstream tasks such as machine translation.Comment: ACL 201
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
What do Neural Machine Translation Models Learn about Morphology?
Neural machine translation (MT) models obtain state-of-the-art performance
while maintaining a simple, end-to-end architecture. However, little is known
about what these models learn about source and target languages during the
training process. In this work, we analyze the representations learned by
neural MT models at various levels of granularity and empirically evaluate the
quality of the representations for learning morphology through extrinsic
part-of-speech and morphological tagging tasks. We conduct a thorough
investigation along several parameters: word-based vs. character-based
representations, depth of the encoding layer, the identity of the target
language, and encoder vs. decoder representations. Our data-driven,
quantitative evaluation sheds light on important aspects in the neural MT
system and its ability to capture word structure.Comment: Updated decoder experiment
- …