26 research outputs found
Recommended from our members
Modelling syntactic development in a cross-linguistic context
Mainstream linguistic theory has traditionally assumed that children come into the world with rich innate knowledge about language and grammar. More recently, computational work using distributional algorithms has shown that the information contained in the input is much richer than proposed by the nativist approach. However, neither of these approaches has been developed to the point of providing detailed and quantitative predictions about the developmental data. In this paper, we champion a third approach, in which computational models learn from naturalistic input and produce utterances that can be directly compared with the utterances of language-learning children. We demonstrate the feasibility of this approach by showing how MOSAIC, a simple distributional analyser, simulates the optional-infinitive phenomenon in English, Dutch, and Spanish. The model accounts for young children's tendency to use both correct finites and incorrect (optional) infinitives in finite contexts, for the generality of this phenomenon across languages, and for the sparseness of other types of errors (e.g., word order errors). It thus shows how these phenomena, which have traditionally been taken as evidence for innate knowledge of Universal Grammar, can be explained in terms of a simple distributional analysis of the language to which children are exposed
Recommended from our members
The role of input size and generativity in simulating language acquisition.
This paper presents an analysis of the role of input size and generativity (ability to produce novel utterances) in simulating developmental data on a phenomenon in first language acquisition. An existing model that has already simulated the basic phenomenon is trained on input sets of varying sizes (13,000 to 40,000 utterances). The ability of the model to produce novel utterances is also manipulated. Both input size and generativity affect the fits for later stages of development. Higher generativity improves fits for later stages, but worsens them for early stages, suggesting generativity is best increased as a function of mean length of utterance (MLU). The effect of training set is variable. Results are discussed in terms of optimal training sets for simulations, and children’s developing ability to produce utterances beyond the input they have heard
Modelling children's negation errors using probabilistic learning in MOSAIC.
Cognitive models of language development have often been used to simulate the pattern of errors in children’s speech. One relatively infrequent error in English involves placing inflection to the right of a negative, rather than to the left. The pattern of negation errors in English is explained by Harris & Wexler (1996) in terms of very early knowledge of inflection on the part of the child. We present data from three children which demonstrates that although negation errors are rare, error types predicted not to occur by Harris & Wexler do occur, as well as error types that are predicted to occur. Data from MOSAIC, a model of language acquisition, is also presented. MOSAIC is able to simulate the pattern of negation errors in children’s speech. The phenomenon is modelled more accurately when a probabilistic learning algorithm is used
Recommended from our members
Resolving ambiguities in the extraction of syntactic categories through chunking.
In recent years, several authors have investigated how co-occurrence statistics in natural language can act as a cue that children may use to extract syntactic categories for the language they are learning. While some authors have reported encouraging results, it is difficult to evaluate the quality of the syntactic categories derived. It is argued in this paper that traditional measures of accuracy are inherently flawed. A valid evaluation metric needs to consider the well-formedness of utterances generated through a production end. This paper attempts to evaluate the quality of the categories derived from cooccurrence statistics through the use of MOSAIC, a computational model of syntax acquisition that has already been used to simulate several phenomena in child language. It will be shown that derived syntactic categories which may appear to be of high quality will quickly give rise to errors which are not typical of child speech. A solution to this problem is suggested in the form of a chunking mechanism which serves to differentiate between alternative grammatical functions of identical word forms. Results are evaluated in terms of the error rates in utterances produced by the system as well as the quantitative fit to the phenomenon of subject omission
On the resolution of ambiguities in the extraction of syntactic categories through chunking
In recent years, several authors have investigated how co-occurrence statistics in natural language can act as a cue
that children may use to extract syntactic categories for the language they are learning. While some authors have reported encouraging results, it is difficult to evaluate the quality of the syntactic categories derived. It is argued in this paper that traditional measures of accuracy are inherently flawed. A valid evaluation metric needs to consider the wellformedness of utterances generated through a production end. This paper attempts to evaluate the quality of the categories derived from co-occurrence statistics through the use of MOSAIC, a computational model of syntax acquisition
that has already been used to simulate several phenomena in child language. It is shown that derived syntactic categories that may appear to be of high quality quickly give rise to errors that are not typical of child speech. A solution to this problem is suggested in the form of a chunking mechanism that serves to differentiate between alternative grammatical functions of identical word forms. Results are evaluated in terms of the error rates in utterances produced
by the system as well as the quantitative fit to the phenomenon of subject omission
Understanding the Developmental Dynamics of Subject Omission: The Role of Processing Limitations in Learning
P. Bloom’s (1990) data on subject omission are often taken as strong support for the view that child language can be explained in terms of full competence coupled with processing limitations in production. This paper examines whether processing limitations in learning may provide a more parsimonious explanation of the data without the need to assume full competence. We extended P. Bloom’s study by using a larger sample (12 children) and measuring subject-omission phenomena in three developmental phases. The results revealed a Verb Phrase-length effect consistent with that reported by P. Bloom. However, contrary to the predictions of the processing limitations account, the proportion of overt subjects that were pronominal increased with developmental phase. The data were simulated with MOSAIC, a computational model that learns to produce progressively longer utterances as a function of training. MOSAIC was able to capture all of the effects reported by P. Bloom through a resource-limited distributional analysis of child-directed speech. Since MOSAIC does not have any built-in linguistic knowledge, these results show that the phenomena identified by P. Bloom do not constitute evidence for underlying competence on the part of the child. They also underline the need to develop more empirically grounded models of the way that processing limitations in learning might influence the language acquisition process
Computer simulations of developmental change: The contributions of working memory capacity and long-term knowledge
Increasing working memory (WM) capacity is often cited as a major influence on children’s development and yet WM capacity is difficult to examine independently of long-term knowledge. A computational model of children’s nonword repetition (NWR) performance is presented that independently manipulates long-term knowledge and WM capacity to determine the relative contributions of each in explaining the developmental data. The simulations show that (1) both mechanisms independently cause the same overall developmental changes in NWR performance; (2) increase in long-term knowledge provides the better fit to the child data; and (3) varying both long-term knowledge and WM capacity adds no significant gains over varying long-term knowledge alone. Given that increases in long-term knowledge must occur during development, the results indicate that increases in WM capacity may not be required to explain developmental differences. An increase in WM capacity should only be cited as a mechanism of developmental change when there are clear empirical reasons for doing so
Investigating children’s acquisition of verb inflection in English, Swedish and Finnish: challenges for current generativist and constructivist approaches
A debate that lies in the heart of the cognitive sciences is the question of how children acquire their first language. On the one side, generativist accounts have based their explanations on innate knowledge of abstract rules, whilst, on the other, constructivist accounts explain language acquisition as a result of input-based learning. The goal of this thesis is to focus on one of the most vigorously researched areas in language acquisition, the development of inflectional verb morphology, and by doing so not only provide more insight into the acquisition of inflection in general, but also help distinguish between the two competing approaches. More specifically, the thesis will focus on three different languages – English, Swedish and Finnish – and use these languages as a testing ground for explaining how a particular aspect of language is acquired. Chapter 1 provides a general introduction to the generativist and constructivist approaches to language acquisition, as well as outlining some important linguistic terms. Chapter 2, presents with the two different linguistic phenomena under investigation in this thesis: Optional Infinitive (OI) and person/number marking errors. Chapter 3 presents Experiment 1, which reports the results of a cross-sectional elicited-production study investigating the possibility that at least some apparent OI errors reflect a process of defaulting to the form with the highest frequency in the input. Across 48 verbs, a significant negative correlation was observed between the proportion of ‘bare’ vs 3sg –s forms in a representative input corpus and the rate of 3sg –s production in simple finite contexts. This finding suggests that, in addition to other learning mechanisms that yield such errors cross-linguistically, at least some of the OI errors produced by English-speaking children reflect a process of defaulting to a high-frequency/phonologically-simple form. Chapter 4 describes Experiment 2, which further investigates the pattern of OI errors, in English and Swedish. In this study, OI errors were elicited in both simple finite and modal contexts. The results support the idea put forward in Experiment 1 that children’s (apparent) OI errors have two distinct sources: truncating compound finite structures and defaulting to the most frequent/phonologically simple form. Experiment 3 in Chapter 5 focused on examining the defaulting errors and further input effects by eliciting present tense verb forms from native Finnish-speaking children. The results provide evidence for the defaulting hypothesis, and suggest that a successful account of the development of verb inflection will need to incorporate both rote-storage and retrieval of individual inflected forms as well as phonological analogy across them. Finally, Chapter 6 concludes the thesis by summarizing the findings of Experiments 1-3, and discussing the main implications of the results for the generativist and constructivist accounts of acquisition of verb morphology, as well as suggesting some possible future research directions
Testing the Extended Optional Infinitive Hypothesis in English and German
The thesis aims to explain how children learn the pattern of verb marking in their language and how this process goes wrong in children with DLD. This is fundamental to our understanding of language acquisition. One model of this process, which has been particularly influential in the DLD literature, is the (Extended) Optional Infinitive ((E)OI) Hypothesis (Wexler, 1994; Rice et al., 1995).However, recent work has shown that the cross-linguistic pattern of verb-marking error may be better explained with the Dual-factor model in which some errors reflect the omission of modal verbs (e.g. He can swim.) and others a process of defaulting to a more accessible finite form (Freudenthal et al., 2010; Räsänen, Ambridge & Pine, 2014). These two models have very different implications both for theory building and for the design of effective interventions for children with DLD. So it is important to establish which is correct. However, distinguishing between them empirically requires cross-linguistic research on both typically developing children and children with DLD. This study will therefore use an elicited production methodology to compare different accounts of the pattern of verb-marking error in typically developing children and children with DLD in English and German