442 research outputs found
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
Despite that Transformers perform well in NLP tasks, recent studies suggest
that self-attention is theoretically limited in learning even some regular and
context-free languages. These findings motivated us to think about their
implications in modeling natural language, which is hypothesized to be mildly
context-sensitive. We test Transformer's ability to learn a variety of mildly
context-sensitive languages of varying complexities, and find that they
generalize well to unseen in-distribution data, but their ability to
extrapolate to longer strings is worse than that of LSTMs. Our analyses show
that the learned self-attention patterns and representations modeled dependency
relations and demonstrated counting behavior, which may have helped the models
solve the languages
Assessing the Unitary RNN as an End-to-End Compositional Model of Syntax
We show that both an LSTM and a unitary-evolution recurrent neural network (URN) can achieve encouraging accuracy on two types of syntactic patterns: context-free long distance agreement, and mildly context-sensitive cross serial dependencies. This work extends recent experiments on deeply nested context-free long distance dependencies, with similar results. URNs differ from LSTMs in that they avoid non-linear activation functions, and they apply matrix multiplication to word embeddings encoded as unitary matrices. This permits them to retain all information in the processing of an input string over arbitrary distances. It also causes them to satisfy strict compositionality. URNs constitute a significant advance in the search for explainable models in deep learning applied to NLP
Can Recurrent Neural Networks Validate Usage-Based Theories of Grammar Acquisition?
It has been shown that Recurrent Artificial Neural Networks automatically acquire some grammatical knowledge in the course of performing linguistic prediction tasks. The extent to which such networks can actually learn grammar is still an object of investigation. However, being mostly data-driven, they provide a natural testbed for usage-based theories of language acquisition. This mini-review gives an overview of the state of the field, focusing on the influence of the theoretical framework in the interpretation of results
- …