67,472 research outputs found
Recommended from our members
Modeling German Word Order Acquisition via Bayesian Inference
Perfors et al. (2011) introduced a Bayesian model selection inference system as a child language acquisition model, and demonstrated in the case of English child language input that without any prior bias, such a system would prefer a probabilistic context-free grammar (PCFG) that used hierarchical structure over a regular grammar representing linear phrase structure. However, this system by Perfors et al. (2011) is limited as a computational model because it can only compare PCFGs that all parse the same data, which is likely to include errors, especially in the case of large data sets. In fact, in the German child language corpus that we consider, transcription and part-of-speech (POS) tagging result in so many errors that the corpus no longer appears representative of a child\u27s input. This illustrates that the assumption that such corpora are always appropriate for computational child language acquisition modeling is misleading. Here we propose a method of comparing syntactic hypotheses compatible with different subsets of a child-directed speech corpus by identifying and countering the implications of different data subset sizes on the likelihood component of the Perfors-type Bayesian model selection scheme. We apply this approach in a case study of word-order acquisition in German
Quantifying cross-linguistic influence with a computational model: A study of case-marking comprehension
Cross-linguistic influence (CLI) is one of the key phenomena in bilingual and second language learning. We propose a method for quantifying CLI in the use of linguistic constructions with the help of a computational model, which acquires constructions in two languages from bilingual input. We focus on the acquisition of case-marking cues in Russian and German and simulate two experiments that employ a picture-choice task tapping into the mechanisms of sentence interpretation. Our model yields behavioral patterns similar to human, and these patterns can be explained by the amount of CLI: the negative CLI in high amounts leads to the misinterpretation of participant roles in Russian and German object-verb-subject sentences. Finally, we make two novel predictions about the acquisition of case-marking cues in Russian and German. Most importantly, our simulations suggest that the high degree of positive CLI may facilitate the interpretation of object-verb-subject sentences
Alternation-Sensitive Phoneme Learning: Implications For Children\u27s Development And Language Change
This dissertation develops a cognitive model describing when children learn to group distinct sound segments (allophones) into abstract equivalence classes (phonemes). The allophones an individual acquires are arbitrary and determined by their particular input, yet are intricately involved in language cognition once learned. The proposed acquisition model characterises the role of surface segment alternations in children\u27s input by using the Tolerance Principle (Yang 2016) to evaluate the cognitive cost of possible phoneme inventory structures iteratively as a childâs vocabulary grows. This Alternation-sensitive Phoneme Learning model therefore traces the emergence of abstract representations from concrete speech stimuli, starting from a default representation where underlying contrasts simply mirror surface-segment contrasts (Invariant Transparency Hypothesis, Ringe & Eska 2013).
A longitudinal corpus study of four children\u27s alveolar stop and flap productions establishes that English medial flap allophony follows a U-shaped acquisition course, which is characteristic of learning linguistic rules or generalisations. The Alternation-sensitive Phoneme Learning cognitive model is then validated by accurately predicting the timing of changes in each child\u27s productions, which signal allophone acquisition. A second case study models the historical process of secondary split in Menominee mid and high back vowels. Here, the acquisition model serves as an independently motivated quantitative test for the occurrence of phonemic split, providing an alternative to traditional reliance on linguists\u27 case-specific subjective judgements about when it might occur. A third case study examines the phonemic status of the velar nasal in German, showing how this acquisition model can discriminate between tolerable grammars and the subset of tolerable grammars that are learnable, with implications for the relationship between formal language description and psychological representation. This dissertation\u27s approach synthesises insights from computational modelling, naturalistic corpus data, historical linguistics, and experimental research on child language acquisition
Recommended from our members
Simulating the referential properties of Dutch, German and English Root Infinitives in MOSAIC
Children learning many languages go through an Optional Infinitive stage in which they produce non-finite verb forms in contexts in which a finite verb form is required (e.g. âThat go thereâ instead of âThat goes thereâ). MOSAIC (Model of Syntax Acquisition in Children) is a computational model of language learning that successfully simulates the developmental patterning of the Optional Infinitive (OI) phenomenon in English, Dutch, German and Spanish (Freudenthal, Pine, Aguado-Orea & Gobet, 2007). In the present study, MOSAIC is applied to the simulation of certain subtle but theoretically important phenomena in the cross-linguistic patterning of the OI phenomenon that are typically assumed to require a more complex formal analysis. MOSAIC is shown to successfully simulate 1) The Modal Reference Effect: the finding that Dutch and German children tend to use Root Infinitives in modal contexts, 2) The Eventivity constraint: the finding that Dutch and German Root Infinitives refer predominantly to actions rather than static situations, and 3) The absence or reduced size of these effects in English. These results provide strong support for input-driven explanations of the Modal Reference Effect as well as MOSAICâs mechanism for producing Root Infinitives, and the wider claim that it is possible to explain key aspects of childrenâs early multi-word speech in terms of the interaction between a resource-limited distributional learning mechanism and the surface properties of the language to which children are exposed
Treebank-based acquisition of wide-coverage, probabilistic LFG resources: project overview, results and evaluation
This paper presents an overview of a project to acquire wide-coverage, probabilistic Lexical-Functional Grammar
(LFG) resources from treebanks. Our approach is based on an automatic annotation algorithm that annotates ârawâ treebank trees with LFG f-structure information approximating to basic predicate-argument/dependency structure. From the f-structure-annotated treebank
we extract probabilistic unification grammar resources. We present the annotation algorithm, the extraction of
lexical information and the acquisition of wide-coverage and robust PCFG-based LFG approximations including
long-distance dependency resolution.
We show how the methodology can be applied to multilingual, treebank-based unification grammar acquisition. Finally
we show how simple (quasi-)logical forms can be derived automatically from the f-structures generated for the treebank trees
A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging
In this paper, we propose a new approach to construct a system of
transformation rules for the Part-of-Speech (POS) tagging task. Our approach is
based on an incremental knowledge acquisition method where rules are stored in
an exception structure and new rules are only added to correct the errors of
existing rules; thus allowing systematic control of the interaction between the
rules. Experimental results on 13 languages show that our approach is fast in
terms of training time and tagging speed. Furthermore, our approach obtains
very competitive accuracy in comparison to state-of-the-art POS and
morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the
European Journal on Artificial Intelligence. Version 3: Resubmitted after
major revisions. Version 4: Resubmitted after minor revisions. Version 5: to
appear in AI Communications (accepted for publication on 3/12/2015
Automatic acquisition of LFG resources for German - as good as it gets
We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined
by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG
resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar
Optimality Theory as a Framework for Lexical Acquisition
This paper re-investigates a lexical acquisition system initially developed
for French.We show that, interestingly, the architecture of the system
reproduces and implements the main components of Optimality Theory. However, we
formulate the hypothesis that some of its limitations are mainly due to a poor
representation of the constraints used. Finally, we show how a better
representation of the constraints used would yield better results
- âŠ