20 research outputs found
Exploring Issues in Lexical Acquisition Using Bayesian Modeling
This thesis addresses questions about early lexical acquisition. Four case studies provide concrete examples of how Bayesian computational modeling can be used to study assumptions about inductive biases, properties of the input data and possible limitations of the learning algorithm.
The first study describes an incremental particle filter algorithm for non-parametric word segmentation models and compares its behavior to Markov chain Monte Carlo methods that operate in an offline fashion. Depending on the setting, particle filters may be outperformed by or outperform offline batch algorithms. It is argued that the results ought to be viewed as raising questions about the segmentation model rather than providing evidence for any specific algorithm.
The second study explores how modeling assumptions interact with the amount of input processed by a model. The experiments indicate that non-parametric word segmentation models exhibit an overlearning effect where more input results in worse segmentation performance. It is shown that adding the ability to learn entire sequences of words in addition to individual words addresses this problem on a large corpus if linguistically plausible assumptions about possible words are made.
The third study explores the role of stress cues in word segmentation through Bayesian modeling. In line with developmental evidence, the results indicate that stress cues aid segmentation and interact with phonotactic cues; and that substantive constraints such as a Unique Stress Constraint can be inferred from the linguistic input and need not be built into the model.
The fourth study shows how variable phonological processes such as segmental deletion can be modeled jointly with word segmentation by a two-level architecture that uses a generative beta-binomial model to map underlying to surface forms. Experimental evaluation for the phenomenon of word-final /t/-deletion shows the importance of context in determining whether or not a variable rule applies; and that naturalistic data contains subtle complexities that may not be captured by summary statistics of the input, illustrating the need to not only pay close attention to the assumptions built into the model but also to those that went into preparing the input
Recommended from our members
Linguistics, cognitive psychology, and the now-or-never bottleneck
Christiansen & Chater (CC)âs key premise is that âif linguistic information is not processed rapidly, that information is lost for goodâ. From this âNow-or-Never Bottleneckâ (NNB), CC derive âwide-reaching and fundamental implications for language processing, acquisition and change as well as for the structure of language itselfâ. We question both the premise and the consequentiality of its purported implications
Internationaler Ferienkurs
Mehr als 600 Teilnehmer aus 70 LĂ€ndern aller Kontinente kommen im Sommer fĂŒr vier Wochen zum Internationalen Ferienkurs der UniversitĂ€t Heidelberg zusammen. Was sie dort erwartet, schildert dieser TV-Beitrag ĂŒber den Ferienkurs des Jahres 2001. Unterschiedlichste Sprachniveaus bringen die zumeist jĂŒngeren Besucher aus der Ferne mit und eine gehörige Portion Neugierde auf die deutsche Kultur. Der Ferienkurs versteht sich nicht nur als Sprachkurs an einer deutschen UniversitĂ€t mit besonders klangvollem Namen im Ausland, er will auch das vermitteln, was mit Literatur und Liedern den besonderen Reiz der Sprache ausmacht. Dauer 4:3
A Particle Filter algorithm for Bayesian Wordsegmentation
Bayesian models are usually learned using batch algorithms that have to iterate multiple times over the full dataset. This is both computationally expensive and, from a cognitive point of view, highly implausible. We present a novel online algorithm for the word segmentation models of Goldwater et al. (2009) which is, to our knowledge, the first published version of a Particle Filter for this kind of model. Also, in contrast to other proposed algorithms, it comes with a theoretical guarantee of optimality if the number of particles goes to infinity. While this is, of course, a theoretical point, a first experimental evaluation of our algorithm shows that, as predicted, its performance improves with the use of more particles, and that it performs competitively with other online learners proposed in Pearl et al. (2011).
Exploring the role of stress in Bayesian word segmentation using adaptor grammars
Stress has long been established as a major cue in word segmentation for English infants. We show that enabling a current state-of-the-art Bayesian word segmentation model to take advantage of stress cues noticeably improves its performance. We find that the improvements range from 10 to 4%, depending on both the use of phonotactic cues and, to a lesser extent, the amount of evidence available to the learner. We also find that in particular early on, stress cues are much more useful for our model than phonotactic cues by themselves, consistent with the finding that children do seem to use stress cues before they use phonotactic cues. Finally, we study how the model's knowledge about stress patterns evolves over time. We not only find that our model correctly acquires the most frequent patterns relatively quickly but also that the Unique Stress Constraint that is at the heart of a previously proposed model does not need to be built in but can be acquired jointly with word segmentation.12 page(s
Using rejuvenation to improve particle filtering for Bayesian word segmentation
We present a novel extension to a recently proposed incremental learning algorithm for the word segmentation problem originally introduced in Goldwater (2006). By adding rejuvenation to a particle filter, we are able to considerably improve its performance, both in terms of finding higher probability and higher accuracy solutions.5 page(s
A joint model of word segmentation and phonological variation for English word-final /t/-deletion
Word-final /t/-deletion refers to a common phenomenon in spoken English where words such as /wEst / âwest â are pronounced as [wEs] âwes â in certain contexts. Phonological variation like this is common in naturally occurring speech. Current computational models of unsupervised word segmentation usually assume idealized input that is devoid of these kinds of variation. We extend a non-parametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model. We analyse how our model handles /t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying /t/s. We find that Bigram dependencies are important for performing well on real data and for learning appropriate deletion probabilities for different contexts. 1
Studying the effect of input size for Bayesian word segmentation on the providence corpus
Studies of computational models of language acquisition depend to a large part on the input available for experiments. In this paper, we study the effect that input size has on the performance of word segmentation models embodying different kinds of linguistic assumptions. Because currently available corpora for word segmentation are not suited for addressing this question, we perform our study on a novel corpus based on the Providence Corpus (Demuth et al., 2006). We find that input size can have dramatic effects on segmentation performance and that, somewhat surprisingly, models performing well on smaller amounts of data can show a marked decrease in performance when exposed to larger amounts of data. We also present the data-set on which we perform our experiments comprising longitudinal data for six children. This corpus makes it possible to ask more specific questions about computational models of word segmentation, in particular about intra-language variability and about how the performance of different models can change over time.16 page(s