Search CORE

289 research outputs found

Can Recurrent Neural Networks Validate Usage-Based Theories of Grammar Acquisition?

Author: Herbelot A.
Pannitto L.
Publication venue
Publication date: 01/01/2022
Field of study

It has been shown that Recurrent Artificial Neural Networks automatically acquire some grammatical knowledge in the course of performing linguistic prediction tasks. The extent to which such networks can actually learn grammar is still an object of investigation. However, being mostly data-driven, they provide a natural testbed for usage-based theories of language acquisition. This mini-review gives an overview of the state of the field, focusing on the influence of the theoretical framework in the interpretation of results

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?

Author: Agarwal Sumeet
Bansal Hritik
Bhatt Gantavya
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

The main subject and the associated verb in English must agree in grammatical number as per the Subject-Verb Agreement (SVA) phenomenon. It has been found that the presence of a noun between the verb and the main subject, whose grammatical number is opposite to that of the main subject, can cause speakers to produce a verb that agrees with the intervening noun rather than the main noun; the former thus acts as an agreement attractor. Such attractors have also been shown to pose a challenge for RNN models without explicit hierarchical bias to perform well on SVA tasks. Previous work suggests that syntactic cues in the input can aid such models to choose hierarchical rules over linear rules for number agreement. In this work, we investigate the effects of the choice of training data, training algorithm, and architecture on hierarchical generalization. We observe that the models under consideration fail to perform well on sentences with no agreement attractor when trained solely on natural sentences with at least one attractor. Even in the presence of this biased training set, implicit hierarchical bias in the architecture (as in the Ordered Neurons LSTM) is not enough to capture syntax-sensitive dependencies. These results suggest that current RNNs do not capture the underlying hierarchical rules of natural language, but rather use shallower heuristics for their predictions

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Semantic Tagging with Deep Residual Networks

Author: Bjerva Johannes
Bos Johan
Plank Barbara
Publication venue
Publication date: 31/10/2016
Field of study

We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets). Our tagger uses both word and character representations and includes a novel residual bypass architecture. We evaluate the tagset both intrinsically on the new task of semantic tagging, as well as on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an auxiliary loss function predicting our semantic tags, significantly outperforms prior results on English Universal Dependencies POS tagging (95.71% accuracy on UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

VBN

Dissertations of the University of Groningen

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Author: Dambre Joni
De Neve Wesley
Demeester Thomas
Demuynck Kris
Godin Fréderic
Publication venue
Publication date: 01/01/2018
Field of study

Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns those models learn. Moreover, models are often compared only quantitatively while a qualitative analysis is missing. In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations. To that end, we extend the contextual decomposition technique (Murdoch et al. 2018) to convolutional neural networks which allows us to compare convolutional neural networks and bidirectional long short-term memory networks. We evaluate and compare these models for the task of morphological tagging on three morphologically different languages and show that these models implicitly discover understandable linguistic rules. Our implementation can be found at https://github.com/FredericGodin/ContextualDecomposition-NLP .Comment: Accepted at EMNLP 201

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Recommended from our members

Neural Language Models Capture Some, But Not All, Agreement AttractionEffects

Author: Arehalli Suhas
Linzen Tal
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

The number of the subject in English must match the num-ber of the corresponding verb (dog runs but dogs run). Yetin real-time language production and comprehension, speak-ers often mistakenly compute agreement between the verb anda grammatically irrelevant non-subject noun phrase instead.This phenomenon, referred to as agreement attraction, is mod-ulated by a wide range of factors; any complete computationalmodel of grammatical planning and comprehension would beexpected to derive this rich empirical picture. Recent develop-ments in Natural Language Processing have shown that neuralnetworks trained only on word-prediction over large corporaare capable of capturing subject-verb agreement dependen-cies to a significant extent, but with occasional errors. In thispaper, we evaluate the potential of such neural word predic-tion models as a foundation for a cognitive model of real-timegrammatical processing. We use LSTMs, a common sequenceprediction model used to model language, to simulate six ex-periments taken from the agreement attraction literature. TheLSTMs captured the critical human behavior in three out of thesix experiments, indicating that (1) some agreement attractionphenomena can be captured by a generic sequence process-ing model, but (2) capturing the other phenomena may requiremodels with more language-specific mechanisms

eScholarship - University of California