62 research outputs found

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    Interactive Language Learning by Robots: The Transition from Babbling to Word Forms

    Get PDF
    The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language acquisition

    Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner

    Get PDF
    International audienceSpectacular progress in the information processing sciences (machine learning, wearable sensors) promises to revolutionize the study of cognitive development. Here, we analyse the conditions under which ’reverse engineering’ language development, i.e., building an effective system thatmimics infant’s achievements, can contribute to our scientific understanding of early language development. We argue that, on the computational side, it is important to move from toy problems to the full complexity of the learning situation, and take as input as faithful reconstructions of the sensorysignals available to infants as possible. On the data side, accessible but privacy-preserving repositories of home data have to be setup. On the psycholinguistic side, specific tests have to be constructed to benchmark humans and machines at different linguistic levels. We discuss the feasibility of this approach and present an overview of current results

    A usage-based model for the acquisition of syntactic constructions and its application in spoken language understanding

    Get PDF
    Gaspers J. A usage-based model for the acquisition of syntactic constructions and its application in spoken language understanding. Bielefeld: Universitätsbibliothek Bielefeld; 2014

    The role of chunking and analogy in early vocabulary acquisition and processing

    Get PDF
    Chunking and analogy, learning through associations and similarities respectively, are crucial cognitive processes in a usage-based theory of language development. Assessing their roles in child naturalistic word learning has posed significant challenges. In this thesis, I offer methodological solutions to examine the developmental plausibility of these processes. Chapter 2 discusses limitations in studies of early word segmentation from naturalistic speech, affecting conclusions about the processes' developmental plausibility. I present a new chunking-based model, CLASSIC Utterance Boundary (CLASSIC-UB), to study how English infants discover words from continuous naturalistic speech. Its plausibility is assessed through new metrics focusing on child production vocabularies from large-scale conversational corpora. I show the advantages of using large word production samples and how this can improve the refinement of early word segmentation and learning theories. In Chapter 3, conclusions about CLASSIC-UB’s plausibility are supported by extending this approach cross-linguistically, using Italian as a case study. Across Chapters 2 and 3, CLASSIC-UB more accurately captures child productions than other chunking and non-chunking accounts, supporting its plausibility in early word segmentation and learning. In Chapter 4, I identify methodological challenges in assessing the independent effects of chunking and analogy in child word processing. I focus on how children use sentence context to resolve ambiguous word meanings (word sense disambiguation). I present ChiSense-12, a new open-access sense-tagged corpus of child-directed speech, and describe its use in creating experimental stimuli to disentangle variables (verb-object associations and verb-event structures) that are informative about the independent role of chunking and analogy. Using this corpus, I showed - for the first time - that 4-year-old children exploit both bottom-up verb-object associations and top-down verb-event structures to resolve lexical ambiguities. Overall, this thesis makes a significant contribution to usage-based theories of language development and improves our understanding of how children acquire language in real-life contexts

    Statistical language learning

    Get PDF
    Theoretical arguments based on the "poverty of the stimulus" have denied a priori the possibility that abstract linguistic representations can be learned inductively from exposure to the environment, given that the linguistic input available to the child is both underdetermined and degenerate. I reassess such learnability arguments by exploring a) the type and amount of statistical information implicitly available in the input in the form of distributional and phonological cues; b) psychologically plausible inductive mechanisms for constraining the search space; c) the nature of linguistic representations, algebraic or statistical. To do so I use three methodologies: experimental procedures, linguistic analyses based on large corpora of naturally occurring speech and text, and computational models implemented in computer simulations. In Chapters 1,2, and 5, I argue that long-distance structural dependencies - traditionally hard to explain with simple distributional analyses based on ngram statistics - can indeed be learned associatively provided the amount of intervening material is highly variable or invariant (the Variability effect). In Chapter 3, I show that simple associative mechanisms instantiated in Simple Recurrent Networks can replicate the experimental findings under the same conditions of variability. Chapter 4 presents successes and limits of such results across perceptual modalities (visual vs. auditory) and perceptual presentation (temporal vs. sequential), as well as the impact of long and short training procedures. In Chapter 5, I show that generalisation to abstract categories from stimuli framed in non-adjacent dependencies is also modulated by the Variability effect. In Chapter 6, I show that the putative separation of algebraic and statistical styles of computation based on successful speech segmentation versus unsuccessful generalisation experiments (as published in a recent Science paper) is premature and is the effect of a preference for phonological properties of the input. In chapter 7 computer simulations of learning irregular constructions suggest that it is possible to learn from positive evidence alone, despite Gold's celebrated arguments on the unlearnability of natural languages. Evolutionary simulations in Chapter 8 show that irregularities in natural languages can emerge from full regularity and remain stable across generations of simulated agents. In Chapter 9 I conclude that the brain may endowed with a powerful statistical device for detecting structure, generalising, segmenting speech, and recovering from overgeneralisations. The experimental and computational evidence gathered here suggests that statistical language learning is more powerful than heretofore acknowledged by the current literature

    An Examination of the Influence of Age on L2 Acquisition of English Sound-Symbolic Patterns

    Get PDF
    A number of researchers (DeKeyser, 2012; J. S. Johnson & Newport, 1989; Long, 1990) have argued that age is a critical factor in second language acquisition. This conclusion is based on extensive research over the last two decades that has demonstrated age-related effects in learners’ nonnativelike acquisition of phonology, morphosyntax, pragmatics, and discourse-level features of language. In the wake of such findings, there has recently been an increased interest in determining the precise linguistic areas that are difficult for adult learners and the cognitive mechanisms implicated in age-related effects. Because implicit learning plays a key role in first-language (L1) acquisition, particularly in the acquisition of statistical patterns in language, it has been proposed that age effects may be the result of attenuated implicit learning capabilities in late-teen and adult learners (DeKeyser, 2000; Janacsek, Fiser, & Nemeth, 2012). If this is true, age-related effects should be significant in linguistic areas that are not readily amenable to conscious learning processes and explicit instruction. To determine whether this is in fact the case, this study examined the linguistic knowledge of native speakers (NSs), early L2 learners, and learners who acquired English as adults. In particular, it examined these groups’ knowledge related to an area of English that is hypothesized to be difficult to learn explicitly, namely, English sound-symbolic (SS) patterns. Participants were composed of English NSs (n = 20) and three NNS groups with L1 Korean and L2 of English. The NNS groups were divided into three groups based on age of onset (AO) , with an AO range from 3 to 9 years of age (n = 20), 10 to 16 (n = 20), and > 17 (n = 20). Three experiments were performed that tested the participants’ English magnitude SS sensitivities when forming assumptions about nonce words (Experiment 1 and 2) and their ability to utilize English SS patterns to bootstrap their learning of new vocabulary (Experiment 3). The two late L2 learner groups (AO 10-16; 17+) were found to have significantly reduced levels of SS knowledge compared to the early L2 learners (AO 3-9) and NSs in all experiments. Only in Experiment 1 and 2, the early L2 learners had diminished magnitude SS sensitivities compared to NSs, but not for Experiment 3. Explicit and implicit aptitudes as measured by LLAMA (Meara, 2005) were also tested for potential relationships with test scores. Explicit aptitudes (LLAMA B, E, and F) did not have a significant effect on the performance of all AO groups, whereas implicit aptitude (LLAMA D) did have a moderate to strong correlation for test scores in only the two late learner groups. The early learner group was not affected by language aptitude levels during the experiments. In sum, the study has found that there is evidence for SPE in the areas of magnitude and English phonesthemic SS patterns. Implicit language-learning aptitudes appeared to have a facilitative effect on the acquisition of these SS sensitivities for the two late L2 learner groups, but not for the early L2 learners

    From the Richness of the Signal to the Poverty of the Stimulus: Mechanisms of Early Language Acquisition

    Get PDF
    1.1 The poverty of stimulus argument and the learnability of lan-guage................................ 12 1.1.1 The induction problem.................. 1
    • …
    corecore