1,867 research outputs found
Computational and Robotic Models of Early Language Development: A Review
We review computational and robotics models of early language learning and
development. We first explain why and how these models are used to understand
better how children learn language. We argue that they provide concrete
theories of language learning as a complex dynamic system, complementing
traditional methods in psychology and linguistics. We review different modeling
formalisms, grounded in techniques from machine learning and artificial
intelligence such as Bayesian and neural network approaches. We then discuss
their role in understanding several key mechanisms of language development:
cross-situational statistical learning, embodiment, situated social
interaction, intrinsically motivated learning, and cultural evolution. We
conclude by discussing future challenges for research, including modeling of
large-scale empirical data about language acquisition in real-world
environments.
Keywords: Early language learning, Computational and robotic models, machine
learning, development, embodiment, social interaction, intrinsic motivation,
self-organization, dynamical systems, complexity.Comment: to appear in International Handbook on Language Development, ed. J.
Horst and J. von Koss Torkildsen, Routledg
Probabilistic grammar induction from sentences and structured meanings
The meanings of natural language sentences may be represented as compositional
logical-forms. Each word or lexicalised multiword-element has an associated logicalform
representing its meaning. Full sentential logical-forms are then composed from
these word logical-forms via a syntactic parse of the sentence.
This thesis develops two computational systems that learn both the word-meanings
and parsing model required to map sentences onto logical-forms from an example corpus
of (sentence, logical-form) pairs. One of these systems is designed to provide a
general purpose method of inducing semantic parsers for multiple languages and logical
meaning representations. Semantic parsers map sentences onto logical representations
of their meanings and may form an important part of any computational task that
needs to interpret the meanings of sentences. The other system is designed to model
the way in which a child learns the semantics and syntax of their first language. Here,
logical-forms are used to represent the potentially ambiguous context in which childdirected
utterances are spoken and a psycholinguistically plausible training algorithm
learns a probabilistic grammar that describes the target language. This computational
modelling task is important as it can provide evidence for or against competing theories
of how children learn their first language.
Both of the systems presented here are based upon two working hypotheses. First,
that the correct parse of any sentence in any language is contained in a set of possible
parses defined in terms of the sentence itself, the sentence’s logical-form and a small
set of combinatory rule schemata. The second working hypothesis is that, given a
corpus of (sentence, logical-form) pairs that each support a large number of possible
parses according to the schemata mentioned above, it is possible to learn a probabilistic
parsing model that accurately describes the target language.
The algorithm for semantic parser induction learns Combinatory Categorial Grammar
(CCG) lexicons and discriminative probabilistic parsing models from corpora of
(sentence, logical-form) pairs. This system is shown to achieve at or near state of the art
performance across multiple languages, logical meaning representations and domains.
As the approach is not tied to any single natural or logical language, this system represents
an important step towards widely applicable black-box methods for semantic parser induction. This thesis also develops an efficient representation of the CCG lexicon
that separately stores language specific syntactic regularities and domain specific
semantic knowledge. This factorised lexical representation improves the performance
of CCG based semantic parsers in sparse domains and also provides a potential basis
for lexical expansion and domain adaptation for semantic parsers.
The algorithm for modelling child language acquisition learns a generative probabilistic
model of CCG parses from sentences paired with a context set of potential
logical-forms containing one correct entry and a number of distractors. The online
learning algorithm used is intended to be psycholinguistically plausible and to assume
as little information specific to the task of language learning as is possible. It is shown
that this algorithm learns an accurate parsing model despite making very few initial
assumptions. It is also shown that the manner in which both word-meanings and syntactic
rules are learnt is in accordance with observations of both of these learning tasks
in children, supporting a theory of language acquisition that builds upon the two working
hypotheses stated above
Input and Intake in Language Acquisition
This dissertation presents an approach for a productive way forward in the study of language acquisition, sealing the rift between claims of an innate linguistic hypothesis space and powerful domain general statistical inference. This approach breaks language acquisition into its component parts, distinguishing the input in the environment from the intake encoded by the learner, and looking at how a statistical inference mechanism, coupled with a well defined linguistic hypothesis space could lead a learn to infer the native grammar of their native language. This work draws on experimental work, corpus analyses and computational models of Tsez, Norwegian and English children acquiring word meanings, word classes and syntax to highlight the need for an appropriate encoding of the linguistic input in order to solve any given problem in language acquisition
Learning language through pictures
We propose Imaginet, a model of learning visually grounded representations of
language from coupled textual and visual input. The model consists of two Gated
Recurrent Unit networks with shared word embeddings, and uses a multi-task
objective by receiving a textual description of a scene and trying to
concurrently predict its visual representation and the next word in the
sentence. Mimicking an important aspect of human language learning, it acquires
meaning representations for individual words from descriptions of visual
scenes. Moreover, it learns to effectively use sequential structure in semantic
interpretation of multi-word phrases.Comment: To appear at ACL 201
Universal Grammar: Wittgenstein versus Chomsky
Daniele Moyal-Sharrock, ‘Universal Grammar: Wittgenstein versus Chomsky’ in M. A. Peters and J. Stickney, eds., A Companion to Wittgenstein on Education: Pedagogical Investigations (Singapore: Springer Verlag, 2017), ISBN: 9789811031342The motivations for the claim that language is innate are, for many, quite straightforward. The innateness of language is seen as the only way to solve the so-called 'logical problem of language acquisition': the mismatch between linguistic input and linguistic output. In this paper, I begin by unravelling several strands of the nativist argument, offering replies as I go along. I then give an outline of Wittgenstein's view of language acquisition, showing how it renders otiose problems posed by nativists like Chomsky – not least by means of Wittgenstein's own brand of grammar which, unlike Chomsky's, does not reside in the brain, but in our practices.Peer reviewe
- …