62 research outputs found
Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation
We investigate whether infant-directed speech (IDS) could facilitate word
form learning when compared to adult-directed speech (ADS). To study this, we
examine the distribution of word forms at two levels, acoustic and
phonological, using a large database of spontaneous speech in Japanese. At the
acoustic level we show that, as has been documented before for phonemes, the
realizations of words are more variable and less discriminable in IDS than in
ADS. At the phonological level, we find an effect in the opposite direction:
the IDS lexicon contains more distinctive words (such as onomatopoeias) than
the ADS counterpart. Combining the acoustic and phonological metrics together
in a global discriminability score reveals that the bigger separation of
lexical categories in the phonological space does not compensate for the
opposite effect observed at the acoustic level. As a result, IDS word forms are
still globally less discriminable than ADS word forms, even though the effect
is numerically small. We discuss the implication of these findings for the view
that the functional role of IDS is to improve language learnability.Comment: Draf
Interactive Language Learning by Robots: The Transition from Babbling to Word Forms
The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language acquisition
Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner
International audienceSpectacular progress in the information processing sciences (machine learning, wearable sensors) promises to revolutionize the study of cognitive development. Here, we analyse the conditions under which ’reverse engineering’ language development, i.e., building an effective system thatmimics infant’s achievements, can contribute to our scientific understanding of early language development. We argue that, on the computational side, it is important to move from toy problems to the full complexity of the learning situation, and take as input as faithful reconstructions of the sensorysignals available to infants as possible. On the data side, accessible but privacy-preserving repositories of home data have to be setup. On the psycholinguistic side, specific tests have to be constructed to benchmark humans and machines at different linguistic levels. We discuss the feasibility of this approach and present an overview of current results
A usage-based model for the acquisition of syntactic constructions and its application in spoken language understanding
Gaspers J. A usage-based model for the acquisition of syntactic constructions and its application in spoken language understanding. Bielefeld: Universitätsbibliothek Bielefeld; 2014
The role of chunking and analogy in early vocabulary acquisition and processing
Chunking and analogy, learning through associations and similarities respectively, are crucial cognitive processes in a usage-based theory of language development. Assessing their roles in child naturalistic word learning has posed significant challenges. In this thesis, I offer methodological solutions to examine the developmental plausibility of these processes. Chapter 2 discusses limitations in studies of early word segmentation from naturalistic speech, affecting conclusions about the processes' developmental plausibility. I present a new chunking-based model, CLASSIC Utterance Boundary (CLASSIC-UB), to study how English infants discover words from continuous naturalistic speech. Its plausibility is assessed through new metrics focusing on child production vocabularies from large-scale conversational corpora. I show the advantages of using large word production samples and how this can improve the refinement of early word segmentation and learning theories. In Chapter 3, conclusions about CLASSIC-UB’s plausibility are supported by extending this approach cross-linguistically, using Italian as a case study. Across Chapters 2 and 3, CLASSIC-UB more accurately captures child productions than other chunking and non-chunking accounts, supporting its plausibility in early word segmentation and learning. In Chapter 4, I identify methodological challenges in assessing the independent effects of chunking and analogy in child word processing. I focus on how children use sentence context to resolve ambiguous word meanings (word sense disambiguation). I present ChiSense-12, a new open-access sense-tagged corpus of child-directed speech, and describe its use in creating experimental stimuli to disentangle variables (verb-object associations and verb-event structures) that are informative about the independent role of chunking and analogy. Using this corpus, I showed - for the first time - that 4-year-old children exploit both bottom-up verb-object associations and top-down verb-event structures to resolve lexical ambiguities. Overall, this thesis makes a significant contribution to usage-based theories of language development and improves our understanding of how children acquire language in real-life contexts
Statistical language learning
Theoretical arguments based on the "poverty of the stimulus" have denied a
priori the possibility that abstract linguistic representations can be learned
inductively from exposure to the environment, given that the linguistic input
available to the child is both underdetermined and degenerate. I reassess such
learnability arguments by exploring a) the type and amount of statistical
information implicitly available in the input in the form of distributional and
phonological cues; b) psychologically plausible inductive mechanisms for
constraining the search space; c) the nature of linguistic representations,
algebraic or statistical. To do so I use three methodologies: experimental
procedures, linguistic analyses based on large corpora of naturally occurring
speech and text, and computational models implemented in computer
simulations.
In Chapters 1,2, and 5, I argue that long-distance structural dependencies
- traditionally hard to explain with simple distributional analyses based on ngram
statistics - can indeed be learned associatively provided the amount of
intervening material is highly variable or invariant (the Variability effect). In
Chapter 3, I show that simple associative mechanisms instantiated in Simple
Recurrent Networks can replicate the experimental findings under the same
conditions of variability. Chapter 4 presents successes and limits of such results
across perceptual modalities (visual vs. auditory) and perceptual presentation
(temporal vs. sequential), as well as the impact of long and short training
procedures. In Chapter 5, I show that generalisation to abstract categories from
stimuli framed in non-adjacent dependencies is also modulated by the Variability
effect. In Chapter 6, I show that the putative separation of algebraic and
statistical styles of computation based on successful speech segmentation versus
unsuccessful generalisation experiments (as published in a recent Science paper)
is premature and is the effect of a preference for phonological properties of the
input. In chapter 7 computer simulations of learning irregular constructions
suggest that it is possible to learn from positive evidence alone, despite Gold's
celebrated arguments on the unlearnability of natural languages. Evolutionary
simulations in Chapter 8 show that irregularities in natural languages can emerge
from full regularity and remain stable across generations of simulated agents. In
Chapter 9 I conclude that the brain may endowed with a powerful statistical
device for detecting structure, generalising, segmenting speech, and recovering
from overgeneralisations. The experimental and computational evidence gathered
here suggests that statistical language learning is more powerful than heretofore
acknowledged by the current literature
An Examination of the Influence of Age on L2 Acquisition of English Sound-Symbolic Patterns
A number of researchers (DeKeyser, 2012; J. S. Johnson & Newport, 1989; Long, 1990) have argued that age is a critical factor in second language acquisition. This conclusion is based on extensive research over the last two decades that has demonstrated age-related effects in learners’ nonnativelike acquisition of phonology, morphosyntax, pragmatics, and discourse-level features of language. In the wake of such findings, there has recently been an increased interest in determining the precise linguistic areas that are difficult for adult learners and the cognitive mechanisms implicated in age-related effects. Because implicit learning plays a key role in first-language (L1) acquisition, particularly in the acquisition of statistical patterns in language, it has been proposed that age effects may be the result of attenuated implicit learning capabilities in late-teen and adult learners (DeKeyser, 2000; Janacsek, Fiser, & Nemeth, 2012). If this is true, age-related effects should be significant in linguistic areas that are not readily amenable to conscious learning processes and explicit instruction. To determine whether this is in fact the case, this study examined the linguistic knowledge of native speakers (NSs), early L2 learners, and learners who acquired English as adults. In particular, it examined these groups’ knowledge related to an area of English that is hypothesized to be difficult to learn explicitly, namely, English sound-symbolic (SS) patterns.
Participants were composed of English NSs (n = 20) and three NNS groups with L1 Korean and L2 of English. The NNS groups were divided into three groups based on age of onset (AO) , with an AO range from 3 to 9 years of age (n = 20), 10 to 16 (n = 20), and > 17 (n = 20). Three experiments were performed that tested the participants’ English magnitude SS sensitivities when forming assumptions about nonce words (Experiment 1 and 2) and their ability to utilize English SS patterns to bootstrap their learning of new vocabulary (Experiment 3). The two late L2 learner groups (AO 10-16; 17+) were found to have significantly reduced levels of SS knowledge compared to the early L2 learners (AO 3-9) and NSs in all experiments. Only in Experiment 1 and 2, the early L2 learners had diminished magnitude SS sensitivities compared to NSs, but not for Experiment 3.
Explicit and implicit aptitudes as measured by LLAMA (Meara, 2005) were also tested for potential relationships with test scores. Explicit aptitudes (LLAMA B, E, and F) did not have a significant effect on the performance of all AO groups, whereas implicit aptitude (LLAMA D) did have a moderate to strong correlation for test scores in only the two late learner groups. The early learner group was not affected by language aptitude levels during the experiments.
In sum, the study has found that there is evidence for SPE in the areas of magnitude and English phonesthemic SS patterns. Implicit language-learning aptitudes appeared to have a facilitative effect on the acquisition of these SS sensitivities for the two late L2 learner groups, but not for the early L2 learners
From the Richness of the Signal to the Poverty of the Stimulus: Mechanisms of Early Language Acquisition
1.1 The poverty of stimulus argument and the learnability of lan-guage................................ 12 1.1.1 The induction problem.................. 1
- …