Computational Models for Sensorimotor Integration and Word Learning

Abstract

This dissertation is an investigation of computational models for sensorimotor integration and word learning in pre-linguistic development. In particular, computational models are investigated for three problems: (1) acoustic-to-articulatory mapping or speech inversion, (2) speech motor skill acquisition and speech production, and (3) cross-situational noun learning. For the first problem, we show that the simpler general regression neural network model performs at par, if not better, than the state-of-the-art deep belief network in experiments with MOCHA-TIMIT and MNGU0 databases. For the second problem, we propose a developmental agent with perception (audio), action (vocalization) and learning capabilities, in the predictive coding framework. We show that, when exposed to an environment of linguistic sounds (Harvard-Haskins database of regularly-timed speech) without any external reinforcement signal, the agent learns to generate speech-like sounds (acoustic babbling followed by proto-syllables and vowels) as well as the timing for motor command execution. Random goal exploration leads to the self-organization of developmental stages of vocal sequences in the agent due to increase in complexity of vocalization. For the third problem, we investigate reinforcement learning models for early word learning. Cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver’s speech are considered. We show that, when a reinforcement learning model is exposed to a group of speakers, it comes to understand an initial set of vocabulary items belonging to the language used by the group. In standard experiments with the CHILDES dataset, the attentional-prosodic deep Q-network model outperforms existing word learning models

    Similar works

    Full text

    thumbnail-image