Modelling written language production in English: a Bayesian model of spelling

Abstract

In English, word production (i.e., spelling) is more complex than word recognition (i.e., reading) as the relationship between graphemes and phonemes is not equally distributed. Despite this known complexity, models of spelling are largely descriptive and less sophisticated than models of reading. Furthermore, current models of spelling do not adequately explain how spelling information is learned and managed. Intriguingly, Bayesian reading models have facilitated research and development in recent years. Given the complex relationship between reading and spelling, I developed a Bayesian computational model of spelling to address these limitations. Operational validity was assessed both theoretically (i.e., conceptual and predictive validity) and empirically (i.e., data, event, and predictive validity). The model was designed to behave like a human speller based on what is currently known about spelling. The Bayesian spelling model simulates a dictation task and makes spelling decisions based on 10 parameters that are analogous to human spelling decisions. The model was trained with Queensland spelling lists based on National curriculum guidelines for grades 1 to 7, which provides data validity, and was tested with words from a computerised NAPLAN dictation task from grades 3, 5, 7, and 9 students, providing event validity. Predictive validity was examined by comparing the Bayesian spelling model responses with the NAPLAN student data. Accuracy and error data for students and for all parameters were calculated and transformed into density distributions to overcome sample size limitations, and skewed data in relation to letter frequencies. Independent-samples Bayesian t-tests compared the distributions of the model with the distributions of students for each testing grade. Results for grades 3 and 5 supported my hypotheses, showing positive evidence that there was no difference between the distributions of data from students and the distributions from expected model parameters. Although results for grades 7 and 9 supported my hypotheses for error only, accuracy data were still in alignment with the predictions of current spelling models and with Australian curriculum guidelines. My findings validate the model as spelling behaviour is effectively reproduced (i.e., empirical validation) and data are congruent with existing literature (i.e., theoretical validation). Furthermore, the progression of learning through parameter decisions aligns with known learning processes. These findings provide robust evidence that Bayesian decision making can be used to model spelling behaviour and that my model can reproduce the learning process of spelling. My model provides ample research opportunities, including investigation of early phonological learning and later morphological strategies. I suggest that further model development consider the ability to examine contractions and differentiate between homophones. Future research with the Bayesian spelling model feasibly provides a means of experimentally examining educational strategies and spelling disorders and could have implications for natural language processing. Most importantly, it is hoped that future research with the Bayesian model of spelling will highlight the important role of spelling and spelling education in everyday life

    Similar works