57 research outputs found
What do Neural Machine Translation Models Learn about Morphology?
Neural machine translation (MT) models obtain state-of-the-art performance
while maintaining a simple, end-to-end architecture. However, little is known
about what these models learn about source and target languages during the
training process. In this work, we analyze the representations learned by
neural MT models at various levels of granularity and empirically evaluate the
quality of the representations for learning morphology through extrinsic
part-of-speech and morphological tagging tasks. We conduct a thorough
investigation along several parameters: word-based vs. character-based
representations, depth of the encoding layer, the identity of the target
language, and encoder vs. decoder representations. Our data-driven,
quantitative evaluation sheds light on important aspects in the neural MT
system and its ability to capture word structure.Comment: Updated decoder experiment
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
Neural models have become ubiquitous in automatic speech recognition systems.
While neural networks are typically used as acoustic models in more complex
systems, recent studies have explored end-to-end speech recognition systems
based on neural networks, which can be trained to directly predict text from
input acoustic features. Although such systems are conceptually elegant and
simpler than traditional systems, it is less obvious how to interpret the
trained models. In this work, we analyze the speech representations learned by
a deep end-to-end model that is based on convolutional and recurrent layers,
and trained with a connectionist temporal classification (CTC) loss. We use a
pre-trained model to generate frame-level features which are given to a
classifier that is trained on frame classification into phones. We evaluate
representations from different layers of the deep model and compare their
quality for predicting phone labels. Our experiments shed light on important
aspects of the end-to-end model such as layer depth, model complexity, and
other design choices.Comment: NIPS 201
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
Despite the remarkable evolution of deep neural networks in natural language
processing (NLP), their interpretability remains a challenge. Previous work
largely focused on what these models learn at the representation level. We
break this analysis down further and study individual dimensions (neurons) in
the vector representation learned by end-to-end neural models in NLP tasks. We
propose two methods: Linguistic Correlation Analysis, based on a supervised
method to extract the most relevant neurons with respect to an extrinsic task,
and Cross-model Correlation Analysis, an unsupervised method to extract salient
neurons w.r.t. the model itself. We evaluate the effectiveness of our
techniques by ablating the identified neurons and reevaluating the network's
performance for two tasks: neural machine translation (NMT) and neural language
modeling (NLM). We further present a comprehensive analysis of neurons with the
aim to address the following questions: i) how localized or distributed are
different linguistic properties in the models? ii) are certain neurons
exclusive to some properties and not others? iii) is the information more or
less distributed in NMT vs. NLM? and iv) how important are the neurons
identified through the linguistic correlation method to the overall task? Our
code is publicly available as part of the NeuroX toolkit (Dalvi et al. 2019).Comment: AAA 2019, pages 10, AAAI Conference on Artificial Intelligence (AAAI
2019
- …