Search CORE

181 research outputs found

A role for the developing lexicon in phonetic category acquisition

Author: Feldman Naomi H.
Goldwater Sharon
Griffiths Thomas L.
Morgan James L.
Publication venue
Publication date: 01/01/2013
Field of study

Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning

Crossref

PubMed Central

Edinburgh Research Explorer

Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

Author: Nagasaka Shogo
Nakashima Ryo
Taniguchi Tadahiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/03/2016
Field of study

Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. In this paper, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we developed a novel double articulation analyzer. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. The novel unsupervised double articulation analyzer is called NPB-DAA. The NPB-DAA can automatically estimate double articulation structure embedded in speech signals. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on Autonomous Mental Development (TAMD

arXiv.org e-Print Archive

Distributional cues to word segmentation: Context is important

Author: Goldwater Sharon
Griffiths Thomas L.
Johnson Mark
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

'Clap your hands' or 'take your hands'?: One-year-olds distinguish between frequent and infrequent multiword phrases

Author: Arnon Inbal
O'Connor Rosemary
Ota Mitsuhiko
Skarabela Barbora
Publication venue: 'Elsevier BV'
Publication date: 01/06/2021
Field of study

Edinburgh Research Explorer

Building a Multimodal Lexicon: Lessons from Infants' Learning of Body Part Words

Author: Abu-Zhaya Rana
Cristia Alejandrina
Seidl Amanda
Tincoff Ruth
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 25/08/2017
Field of study

Human children outperform artificial learners because the former quickly acquire a multimodal, syntactically informed, and ever-growing lexicon with little evidence. Most of this lexicon is unlabelled and processed with unsupervised mechanisms, leading to robust and generalizable knowledge. In this paper, we summarize results related to 4-month-olds’ learning of body part words. In addition to providing direct experimental evidence on some of the Workshop’s assumptions, we suggest several avenues of research that may be useful to those developing and testing artificial learners. A first set of studies using a controlled laboratory learning paradigm shows that human infants learn better from tactile-speech than visual-speech co-occurrences, suggesting that the signal/modality should be considered when designing and exploiting multimodal learning tasks. A series of observational studies document the ways in which parents naturally structure the multimodal information they provide for infants, which probably happens in lexically specific ways. Finally, our results suggest that 4-month-olds can pick up on co-occurrences between words and specific touch locations (a prerequisite of learning an association between a body part word and the referent on the child’s own body) after very brief exposures, which we interpret as most compatible with unsupervised predictive models of learning

UCL Discovery

Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Eisenstein Jacob
Elsner Micha
Goldwater Sharon
Publication venue
Publication date: 01/07/2012
Field of study

Edinburgh Research Explorer

Computational and Robotic Models of Early Language Development: A Review

Author: Kachergis George
Oudeyer Pierre-Yves
Schueller William
Publication venue
Publication date: 25/03/2019
Field of study

We review computational and robotics models of early language learning and development. We first explain why and how these models are used to understand better how children learn language. We argue that they provide concrete theories of language learning as a complex dynamic system, complementing traditional methods in psychology and linguistics. We review different modeling formalisms, grounded in techniques from machine learning and artificial intelligence such as Bayesian and neural network approaches. We then discuss their role in understanding several key mechanisms of language development: cross-situational statistical learning, embodiment, situated social interaction, intrinsically motivated learning, and cultural evolution. We conclude by discussing future challenges for research, including modeling of large-scale empirical data about language acquisition in real-world environments. Keywords: Early language learning, Computational and robotic models, machine learning, development, embodiment, social interaction, intrinsic motivation, self-organization, dynamical systems, complexity.Comment: to appear in International Handbook on Language Development, ed. J. Horst and J. von Koss Torkildsen, Routledg

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server