155,549 research outputs found
Recommended from our members
Simulating the referential properties of Dutch, German and English Root Infinitives in MOSAIC
Children learning many languages go through an Optional Infinitive stage in which they produce non-finite verb forms in contexts in which a finite verb form is required (e.g. ‘That go there’ instead of ‘That goes there’). MOSAIC (Model of Syntax Acquisition in Children) is a computational model of language learning that successfully simulates the developmental patterning of the Optional Infinitive (OI) phenomenon in English, Dutch, German and Spanish (Freudenthal, Pine, Aguado-Orea & Gobet, 2007). In the present study, MOSAIC is applied to the simulation of certain subtle but theoretically important phenomena in the cross-linguistic patterning of the OI phenomenon that are typically assumed to require a more complex formal analysis. MOSAIC is shown to successfully simulate 1) The Modal Reference Effect: the finding that Dutch and German children tend to use Root Infinitives in modal contexts, 2) The Eventivity constraint: the finding that Dutch and German Root Infinitives refer predominantly to actions rather than static situations, and 3) The absence or reduced size of these effects in English. These results provide strong support for input-driven explanations of the Modal Reference Effect as well as MOSAIC’s mechanism for producing Root Infinitives, and the wider claim that it is possible to explain key aspects of children’s early multi-word speech in terms of the interaction between a resource-limited distributional learning mechanism and the surface properties of the language to which children are exposed
Understanding the Developmental Dynamics of Subject Omission: The Role of Processing Limitations in Learning
P. Bloom’s (1990) data on subject omission are often taken as strong support for the view that child language can be explained in terms of full competence coupled with processing limitations in production. This paper examines whether processing limitations in learning may provide a more parsimonious explanation of the data without the need to assume full competence. We extended P. Bloom’s study by using a larger sample (12 children) and measuring subject-omission phenomena in three developmental phases. The results revealed a Verb Phrase-length effect consistent with that reported by P. Bloom. However, contrary to the predictions of the processing limitations account, the proportion of overt subjects that were pronominal increased with developmental phase. The data were simulated with MOSAIC, a computational model that learns to produce progressively longer utterances as a function of training. MOSAIC was able to capture all of the effects reported by P. Bloom through a resource-limited distributional analysis of child-directed speech. Since MOSAIC does not have any built-in linguistic knowledge, these results show that the phenomena identified by P. Bloom do not constitute evidence for underlying competence on the part of the child. They also underline the need to develop more empirically grounded models of the way that processing limitations in learning might influence the language acquisition process
Recommended from our members
Subject omission in children's language; The case for performance limitations in learning.
Several theories have been put forward to explain the phenomenon that children who are learning to speak their native language tend to omit the subject of the sentence. According to the pro-drop hypothesis, children represent the wrong grammar. According to the performance limitations view, children represent the full grammar, but omit subjects due to performance limitations in production. This paper proposes a third explanation and presents a model which simulates the data relevant to subject omission. The model consists of a simple learning mechanism that carries out a distributional analysis of naturalistic input. It does not have any overt representation of grammatical categories, and its performance limitations reside mainly in its learning mechanism. The model clearly simulates the data at hand, without the need to assume large amounts of innate knowledge in the child, and can be considered more parsimonious on these grounds alone. Importantly, it employs a unified and objective measure of processing load, namely the length of the utterance, which interacts with frequency in the input. The standard performance limitations view assumes that processing load is dependent on a phrase’s syntactic role, but does not specify a unifying underlying principle
Simulating the temporal reference of Dutch and English Root Infinitives.
Hoekstra & Hyams (1998) claim that the overwhelming majority of Dutch children’s Root Infinitives (RIs) are used to refer to modal (not realised) events, whereas in English speaking children, the temporal reference of RIs is free. Hoekstra & Hyams attribute this difference to qualitative differences in how temporal reference is carried by the Dutch infinitive and the English bare form. Ingram & Thompson (1996) advocate an input-driven account of this difference and suggest that the modal reading of German (and Dutch) RIs is caused by the fact that infinitive forms are predominantly used in modal contexts. This paper investigates whether an input-driven account can explain the differential reading of RIs in Dutch and English. To this end, corpora of English and Dutch Child Directed Speech were fed through MOSAIC, a computational model that has already been used to simulate the basic Optional Infinitive phenomenon. Infinitive forms in the input were tagged for modal or non-modal reference based on the sentential context in which they appeared. The output of the model was compared to the results of corpus studies and recent experimental data which call into question the strict distinction between Dutch and English advocated by Hoekstra & Hyams
Recommended from our members
Meter based omission of function words in MOSAIC
MOSAIC (Model of Syntax Acquisition in Children) is augmented with a new mechanism that allows for the omission of unstressed function words based on the prosodic structure of the utterance in which they occur. The mechanism allows MOSAIC to omit elements from multiple locations in a target utterance, which it was previously unable to do. It is shown that, although the new mechanism results in Optional Infinitive errors when run on children’s input, it is insufficient to simulate the high rate OI errors in children’s speech unless combined with MOSAIC’s edge-first learning mechanism. It is also shown that the addition of the new mechanism does not adversely affect MOSAIC’s fit to the Optional Infinitive phenomenon. The mechanism does, however, make MOSAIC’s output more child-like, both in terms of the range of utterances it can simulate, and the level and type of determiner omission that the model displays
All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch
Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, though NLP-inspired research has focused on adding more complex readability features there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervised machine learning. Based on readability assessments by both experts
and a crowd, we implement different types of text characteristics ranging from easy-to-compute superficial text characteristics to features requiring a deep linguistic processing, resulting in ten
different feature groups. Both a regression and classification setup are investigated reflecting the two possible readability prediction tasks: scoring individual texts or comparing two texts. We show that going beyond correlation calculations for readability optimization using a wrapper-based genetic algorithm optimization approach is a promising task which provides considerable insights in which feature combinations contribute to the overall readability prediction. Since we also have gold standard information available for those features requiring deep processing we are able to investigate the true upper bound of our Dutch system. Interestingly, we will observe that the performance of our fully-automatic readability prediction pipeline is on par with the pipeline using golden deep syntactic and semantic information
- …