Search CORE

228 research outputs found

Non-word repetition in children learning Yélî Dnye

Author: Alejandrina Cristia
Marisa Casillas
Publication venue: Carnegie Mellon University Library Publishing Service
Publication date: 01/03/2022
Field of study

In non-word repetition (NWR) studies, participants are presented auditorily with an item that is phonologically legal but lexically meaningless in their language, and asked to repeat this item as closely as possible. NWR scores are thought to reflect some aspects of phonological development, saliently a perception-production loop supporting flexible production patterns. In this study, we report on NWR results among children learning Yélî Dnye, an isolate spoken on Rossel Island in Papua New Guinea. Results make three contributions that are specific, and a fourth that is general. First, we found that non-word items containing typologically frequent sounds are repeated without changes more often that non-words containing typologically rare sounds, above and beyond any within-language frequency effects. Second, we documented rather weak effects of item length. Third, we found that age has a strong effect on NWR scores, whereas there are weak correlations with gender, maternal education, and birth order. Fourth, we weave our results with those of others to serve the general goal of reflecting on how NWR scores can be compared across participants, studies, languages, and populations,  and the extent to which they shed light on the factors universally structuring variation  in phonological development at a global and individual level

Directory of Open Access Journals

Building a Multimodal Lexicon: Lessons from Infants' Learning of Body Part Words

Author: Abu-Zhaya Rana
Cristia Alejandrina
Seidl Amanda
Tincoff Ruth
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 25/08/2017
Field of study

Human children outperform artificial learners because the former quickly acquire a multimodal, syntactically informed, and ever-growing lexicon with little evidence. Most of this lexicon is unlabelled and processed with unsupervised mechanisms, leading to robust and generalizable knowledge. In this paper, we summarize results related to 4-month-olds’ learning of body part words. In addition to providing direct experimental evidence on some of the Workshop’s assumptions, we suggest several avenues of research that may be useful to those developing and testing artificial learners. A first set of studies using a controlled laboratory learning paradigm shows that human infants learn better from tactile-speech than visual-speech co-occurrences, suggesting that the signal/modality should be considered when designing and exploiting multimodal learning tasks. A series of observational studies document the ways in which parents naturally structure the multimodal information they provide for infants, which probably happens in lexically specific ways. Finally, our results suggest that 4-month-olds can pick up on co-occurrences between words and specific touch locations (a prerequisite of learning an association between a body part word and the referent on the child’s own body) after very brief exposures, which we interpret as most compatible with unsupervised predictive models of learning

UCL Discovery

Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

Author: Cristia Alejandrina
Dupoux Emmanuel
Guevara-Rukoz Adriana
Ludusan Bogdan
Martin Andrew
Mazuka Reiko
Thiollière Roland
Publication venue
Publication date: 23/12/2017
Field of study

We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

SCALa: A blueprint for computational models of language acquisition in social context

Author: Cristia Alejandrina
Dupoux Emmanuel
Tsuji Sho
Publication venue: 'Elsevier BV'
Publication date: 03/06/2021
Field of study

International audienceTheories and data on language acquisition suggest a range of cues are used, ranging from information on structure found in the linguistic signal itself, to information gleaned from the environmental context or through social interaction. We propose a blueprint for computational models of the early language learner (SCALa, for Socio-Computational Architecture of Language Acquisition) that makes explicit the connection between the kinds of information available to the social learner and the computational mechanisms required to extract language-relevant information and learn from it. SCALa integrates a range of views on language acquisition, further allowing us to make precise recommendations for future large-scale empirical research

INRIA a CCSD electronic archive server

An open-source voice type classifier for child-centered daylong recordings

Author: Bousbib Ruben
Bredin Hervé
Cristia Alejandrina
Dupoux Emmanuel
Lavechin Marvin
Publication venue
Publication date: 25/10/2020
Field of study

Spontaneous conversations in real-world settings such as those found in child-centered recordings have been shown to be amongst the most challenging audio files to process. Nevertheless, building speech processing models handling such a wide variety of conditions would be particularly useful for language acquisition studies in which researchers are interested in the quantity and quality of the speech that children hear and produce, as well as for early diagnosis and measuring effects of remediation. In this paper, we present our approach to designing an open-source neural network to classify audio segments into vocalizations produced by the child wearing the recording device, vocalizations produced by other children, adult male speech, and adult female speech. To this end, we gathered diverse child-centered corpora which sums up to a total of 260 hours of recordings and covers 10 languages. Our model can be used as input for downstream tasks such as estimating the number of words produced by adult speakers, or the number of linguistic units produced by children. Our architecture combines SincNet filters with a stack of recurrent layers and outperforms by a large margin the state-of-the-art system, the Language ENvironment Analysis (LENA) that has been used in numerous child language studies.Comment: accepted to Interspeech 202

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research

Author: Cristia Alejandrina
Cruz Blandon Maria
Räsänen Okko
Publication venue: COGNITIVE SCIENCE SOCIETY
Publication date: 01/07/2023
Field of study

Modelling of early language acquisition aims to understand how infants bootstrap their language skills. The modelling encompasses properties of the input data used for training the models, the cognitive hypotheses and their algorithmic implementations being tested, and the evaluation methodologies to compare models to human data. Recent developments have enabled the use of more naturalistic training data for computational models. This also motivates development of more naturalistic tests of model behaviour. A crucial step towards such an aim is to develop representative speech datasets consisting of speech heard by infants in their natural environments. However, a major drawback of such recordings is that they are typically noisy, and it is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data. In this paper, we explore this aspect for the case of infant-directed speech (IDS) and adult-directed speech (ADS) analysis. First, we manually and automatically annotated audio quality of utterances extracted from two corpora of child-centred long-form recordings (in English and French). We then compared acoustic features of IDS and ADS in an in-lab dataset and across different audio quality subsets of naturalistic data. Finally, we assessed how the audio quality and recording environment may change the conclusions of a modelling analysis using a recent self-supervised learning model. Our results show that the use of modest and high audio quality naturalistic speech data result in largely similar conclusions on IDS and ADS in terms of acoustic analyses and modelling experiments. We also found that an automatic sound quality assessment tool can be used to screen out useful parts of long-form recordings for a closer analysis with comparable results to that of manual quality annotation.Peer reviewe

Trepo - Institutional Repository of Tampere University

Motif discovery in infant-and adult-directed speech

Author: Alejandrina Cristia
Amanda Seidl
Bogdan Ludusan
Emmanuel Dupoux
Publication venue
Publication date: 03/04/2020
Field of study

Abstract Infant-directed speech (IDS) is thought to play a key role in determining infant language acquisition. It is thus important to describe how computational models of infant language acquisition behave when given an input of IDS, as compared to adult-directed speech (ADS). In this paper, we explore how an acoustic motif discovery algorithm fares when presented with speech from both registers. Results show small but significant differences in performance, with lower recall and lower cluster collocation in IDS than ADS, but a higher cluster purity in IDS. Overall, these results are inconsistent with a view suggesting that IDS is acoustically clearer than ADS in a way that systematically facilitates lexical recognition. Similarities and differences with human infants' word segmentation are discussed

CiteSeerX

Introducing Meta-analysis in the Evaluation of Computational Models of Infant Language Development

Author: Cristia Alejandrina
Cruz Blandón María Andrea
Räsänen Okko
Publication venue
Publication date: 01/07/2023
Field of study

Computational models of child language development can help us understand the cognitive underpinnings of the language learning process, which occurs along several linguistic levels at once (e.g., prosodic and phonological). However, in light of the replication crisis, modelers face the challenge of selecting representative and consolidated infant data. Thus, it is desirable to have evaluation methodologies that could account for robust empirical reference data, across multiple infant capabilities. Moreover, there is a need for practices that can compare developmental trajectories of infants to those of models as a function of language experience and development. The present study aims to take concrete steps to address these needs by introducing the concept of comparing models with large-scale cumulative empirical data from infants, as quantified by meta-analyses conducted across a large number of individual behavioral studies. We formalize the connection between measurable model and human behavior, and then present a conceptual framework for meta-analytic evaluation of computational models. We exemplify the meta-analytic model evaluation approach with two modeling experiments on infant-directed speech preference and native/non-native vowel discrimination.Peer reviewe

Trepo - Institutional Repository of Tampere University