289 research outputs found
Can Recurrent Neural Networks Validate Usage-Based Theories of Grammar Acquisition?
It has been shown that Recurrent Artificial Neural Networks automatically acquire some grammatical knowledge in the course of performing linguistic prediction tasks. The extent to which such networks can actually learn grammar is still an object of investigation. However, being mostly data-driven, they provide a natural testbed for usage-based theories of language acquisition. This mini-review gives an overview of the state of the field, focusing on the influence of the theoretical framework in the interpretation of results
Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?
The main subject and the associated verb in English must agree in grammatical number as per the Subject-Verb Agreement (SVA) phenomenon. It has been found that the presence of a noun between the verb and the main subject, whose grammatical number is opposite to that of the main subject, can cause speakers to produce a verb that agrees with the intervening noun rather than the main noun; the former thus acts as an agreement attractor. Such attractors have also been shown to pose a challenge for RNN models without explicit hierarchical bias to perform well on SVA tasks. Previous work suggests that syntactic cues in the input can aid such models to choose hierarchical rules over linear rules for number agreement. In this work, we investigate the effects of the choice of training data, training algorithm, and architecture on hierarchical generalization. We observe that the models under consideration fail to perform well on sentences with no agreement attractor when trained solely on natural sentences with at least one attractor. Even in the presence of this biased training set, implicit hierarchical bias in the architecture (as in the Ordered Neurons LSTM) is not enough to capture syntax-sensitive dependencies. These results suggest that current RNNs do not capture the underlying hierarchical rules of natural language, but rather use shallower heuristics for their predictions
Semantic Tagging with Deep Residual Networks
We propose a novel semantic tagging task, sem-tagging, tailored for the
purpose of multilingual semantic parsing, and present the first tagger using
deep residual networks (ResNets). Our tagger uses both word and character
representations and includes a novel residual bypass architecture. We evaluate
the tagset both intrinsically on the new task of semantic tagging, as well as
on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an
auxiliary loss function predicting our semantic tags, significantly outperforms
prior results on English Universal Dependencies POS tagging (95.71% accuracy on
UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?
Character-level features are currently used in different neural network-based
natural language processing algorithms. However, little is known about the
character-level patterns those models learn. Moreover, models are often
compared only quantitatively while a qualitative analysis is missing. In this
paper, we investigate which character-level patterns neural networks learn and
if those patterns coincide with manually-defined word segmentations and
annotations. To that end, we extend the contextual decomposition technique
(Murdoch et al. 2018) to convolutional neural networks which allows us to
compare convolutional neural networks and bidirectional long short-term memory
networks. We evaluate and compare these models for the task of morphological
tagging on three morphologically different languages and show that these models
implicitly discover understandable linguistic rules. Our implementation can be
found at https://github.com/FredericGodin/ContextualDecomposition-NLP .Comment: Accepted at EMNLP 201
Recommended from our members
Neural Language Models Capture Some, But Not All, Agreement AttractionEffects
The number of the subject in English must match the num-ber of the corresponding verb (dog runs but dogs run). Yetin real-time language production and comprehension, speak-ers often mistakenly compute agreement between the verb anda grammatically irrelevant non-subject noun phrase instead.This phenomenon, referred to as agreement attraction, is mod-ulated by a wide range of factors; any complete computationalmodel of grammatical planning and comprehension would beexpected to derive this rich empirical picture. Recent develop-ments in Natural Language Processing have shown that neuralnetworks trained only on word-prediction over large corporaare capable of capturing subject-verb agreement dependen-cies to a significant extent, but with occasional errors. In thispaper, we evaluate the potential of such neural word predic-tion models as a foundation for a cognitive model of real-timegrammatical processing. We use LSTMs, a common sequenceprediction model used to model language, to simulate six ex-periments taken from the agreement attraction literature. TheLSTMs captured the critical human behavior in three out of thesix experiments, indicating that (1) some agreement attractionphenomena can be captured by a generic sequence process-ing model, but (2) capturing the other phenomena may requiremodels with more language-specific mechanisms
- …