563 research outputs found
Distinct patterns of syntactic agreement errors in recurrent networks and humans
Determining the correct form of a verb in context requires an understanding
of the syntactic structure of the sentence. Recurrent neural networks have been
shown to perform this task with an error rate comparable to humans, despite the
fact that they are not designed with explicit syntactic representations. To
examine the extent to which the syntactic representations of these networks are
similar to those used by humans when processing sentences, we compare the
detailed pattern of errors that RNNs and humans make on this task. Despite
significant similarities (attraction errors, asymmetry between singular and
plural subjects), the error patterns differed in important ways. In particular,
in complex sentences with relative clauses error rates increased in RNNs but
decreased in humans. Furthermore, RNNs showed a cumulative effect of attractors
but humans did not. We conclude that at least in some respects the syntactic
representations acquired by RNNs are fundamentally different from those used by
humans.Comment: Proceedings of the 40th Annual Conference of the Cognitive Science
Societ
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Phonological (un)certainty weights lexical activation
Spoken word recognition involves at least two basic computations. First is
matching acoustic input to phonological categories (e.g. /b/, /p/, /d/). Second
is activating words consistent with those phonological categories. Here we test
the hypothesis that the listener's probability distribution over lexical items
is weighted by the outcome of both computations: uncertainty about phonological
discretisation and the frequency of the selected word(s). To test this, we
record neural responses in auditory cortex using magnetoencephalography, and
model this activity as a function of the size and relative activation of
lexical candidates. Our findings indicate that towards the beginning of a word,
the processing system indeed weights lexical candidates by both phonological
certainty and lexical frequency; however, later into the word, activation is
weighted by frequency alone.Comment: 6 pages, 4 figures, accepted at: Cognitive Modeling and Computational
Linguistics (CMCL) 201
RNNs Implicitly Implement Tensor Product Representations
Recurrent neural networks (RNNs) can learn continuous vector representations
of symbolic structures such as sequences and sentences; these representations
often exhibit linear regularities (analogies). Such regularities motivate our
hypothesis that RNNs that show such regularities implicitly compile symbolic
structures into tensor product representations (TPRs; Smolensky, 1990), which
additively combine tensor products of vectors representing roles (e.g.,
sequence positions) and vectors representing fillers (e.g., particular words).
To test this hypothesis, we introduce Tensor Product Decomposition Networks
(TPDNs), which use TPRs to approximate existing vector representations. We
demonstrate using synthetic data that TPDNs can successfully approximate linear
and tree-based RNN autoencoder representations, suggesting that these
representations exhibit interpretable compositional structure; we explore the
settings that lead RNNs to induce such structure-sensitive representations. By
contrast, further TPDN experiments show that the representations of four models
trained to encode naturally-occurring sentences can be largely approximated
with a bag of words, with only marginal improvements from more sophisticated
structures. We conclude that TPDNs provide a powerful method for interpreting
vector representations, and that standard RNNs can induce compositional
sequence representations that are remarkably well approximated by TPRs; at the
same time, existing training tasks for sentence representation learning may not
be sufficient for inducing robust structural representations.Comment: Accepted to ICLR 201
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number
Deep architectures such as Transformers are sometimes criticized for having
uninterpretable "black-box" representations. We use causal intervention
analysis to show that, in fact, some linguistic features are represented in a
linear, interpretable format. Specifically, we show that BERT's ability to
conjugate verbs relies on a linear encoding of subject number that can be
manipulated with predictable effects on conjugation accuracy. This encoding is
found in the subject position at the first layer and the verb position at the
last layer, but distributed across positions at middle layers, particularly
when there are multiple cues to subject number.Comment: To appear in Findings of the Association for Computational
Linguistics: EMNLP 202
- …