13 research outputs found
Learning to Understand Child-directed and Adult-directed Speech
Speech directed to children differs from adult-directed speech in linguistic
aspects such as repetition, word choice, and sentence length, as well as in
aspects of the speech signal itself, such as prosodic and phonemic variation.
Human language acquisition research indicates that child-directed speech helps
language learners. This study explores the effect of child-directed speech when
learning to extract semantic information from speech directly. We compare the
task performance of models trained on adult-directed speech (ADS) and
child-directed speech (CDS). We find indications that CDS helps in the initial
stages of learning, but eventually, models trained on ADS reach comparable task
performance, and generalize better. The results suggest that this is at least
partially due to linguistic rather than acoustic properties of the two
registers, as we see the same pattern when looking at models trained on
acoustically comparable synthetic speech.Comment: Authors found an error in preprocessing of transcriptions before they
were fed to SBERT. After correction, the experiments were rerun. The updated
results can be found in this version. Importantly, - Most scores were
affected to a small degree (performance was slightly worse). - The effect was
consistent across conditions. Therefore, the general patterns remain the sam
Linguistic constraints on statistical word segmentation: The role of consonants in Arabic and English
Statistical learning is often taken to lie at the heart of many cognitive tasks, including the acquisition of language. One particular task in which probabilistic models have achieved considerable success is the segmentation of speech into words. However, these models have mostly been tested against English data, and as a result little is known about how a statistical learning mechanism copes with input regularities that arise from the structural properties of different languages. This study focuses on statistical word segmentation in Arabic, a Semitic language in which words are built around consonantal roots. We hypothesize that segmentation in such languages is facilitated by tracking consonant distributions independently from intervening vowels. Previous studies have shown that human learners can track consonant probabilities across intervening vowels in artificial languages, but it is unknown to what extent this ability would be beneficial in the segmentation of natural language. We assessed the performance of a Bayesian segmentation model on English and Arabic, comparing consonant-only representations with full representations. In addition, we examined to what extent structurally different proto-lexicons reflect adult language. The results suggest that for a child learning a Semitic language, separating consonants from vowels is beneficial for segmentation. These findings indicate that probabilistic models require appropriate linguistic representations in order to effectively meet the challenges of language acquisition
Interactive Language Learning by Robots: The Transition from Babbling to Word Forms
The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language acquisition
Recommended from our members
Disentangling sequential from hierarchical learning in Artificial Grammar Learning: Evidence from a modified Simon Task
In this paper we probe the interaction between sequential and hierarchical learning by investigating implicit learning in a group of school-aged children. We administered a serial reaction time task, in the form of a modified Simon Task in which the stimuli were organised following the rules of two distinct artificial grammars, specifically Lindenmayer systems: the Fibonacci grammar (Fib) and the Skip grammar (a modification of the former). The choice of grammars is determined by the goal of this study, which is to investigate how sensitivity to structure emerges in the course of exposure to an input whose surface transitional properties (by hypothesis) bootstrap structure. The studies conducted to date have been mainly designed to investigate low-level superficial regularities, learnable in purely statistical terms, whereas hierarchical learning has not been effectively investigated yet. The possibility to directly pinpoint the interplay between sequential and hierarchical learning is instead at the core of our study: we presented children with two grammars, Fib and Skip, which share the same transitional regularities, thus providing identical opportunities for sequential learning, while crucially differing in their hierarchical structure. More particularly, there are specific points in the sequence (k-points), which, despite giving rise to the same transitional regularities in the two grammars, support hierarchical reconstruction in Fib but not in Skip. In our protocol, children were simply asked to perform a traditional Simon Task, and they were completely unaware of the real purposes of the task. Results indicate that sequential learning occurred in both grammars, as shown by the decrease in reaction times throughout the task, while differences were found in the sensitivity to k-points: these, we contend, play a role in hierarchical reconstruction in Fib, whereas they are devoid of structural significance in Skip. More particularly, we found that children were faster in correspondence to k-points in sequences produced by Fib, thus providing an entirely new kind of evidence for the hypothesis that implicit learning involves an early activation of strategies of hierarchical reconstruction, based on a straightforward interplay with the statistically-based computation of transitional regularities on the sequences of symbols
Stabilising determinants in the transmission of phonotactic systems: Diachrony and acquisition of coda clusters in Dutch and Afrikaans
The phonotactic system of Afrikaans underwent multiple changes in its diachronic development. While some consonant clusters got lost, others still surface in contemporary Afrikaans. In this paper, we investigate to what extent articulatory difference between the segments of a cluster contribute to its successful transmission. We proceed in two steps. First, we analyse the respective effects of differences in manner of articulation, place of articulation and voicing on the age at which a cluster is acquired by analysing Dutch acquisition data. Second, we investigate the role that these articulatory differences play in the diachronic frequency development from Dutch to Afrikaans. We demonstrate that large differences in manner of articulation between segments contribute to a cluster’s success in acquisition and diachrony. In contrast, large differences in place of articulation have impeding effects, while voicing difference shows a more complicated behaviour.Keywords: Dutch/Afrikaans phonotactics, articulatory difference, first-language acquisition, diachronic chang
Machine learning in diachronic corpus phonology: mining verse data to infer trajectories in English phonotactics
Machine learning is a powerful method when working with large data sets such as diachronic corpora. However, as opposed to standard techniques from inferential statistics like regression modeling, machine learning is less commonly used among phonological corpus linguists. This paper discusses three different machine learning techniques (K nearest neighbors classifiers; Naïve Bayes classifiers; artificial neural networks) and how they can be applied to diachronic corpus data to address specific phonological questions. To illustrate the methodology, I investigate Middle English schwa deletion and when and how it potentially triggered reduction of final /mb/ clusters in English
Assessing the effect of ambiguity in compositionality signaling on the processing of diphones
Consonantal diphones differ as to their ambiguity (whether or not they indicate
morphological complexity reliably by occurring exclusively either within or across morphemes)
and lexicality (how frequently they occur within morphemes rather than across
morpheme boundaries). This study empirically investigates the influence of ambiguity and
lexicality on the processing speed of consonantal diphones in speech perception. More
specifically, its goal is to test the predictions of the Strong Morphonotactic Hypothesis,
which asserts that phonotactic processing is influenced by morphological structure, and to
clarify the two conceptions thereof present in extant research. In two discrimination task
experiments, it is found that the processing speed of cross-morpheme diphones decreases
with their ambiguity, but there is no processing difference between primarily crossmorphemic
and morpheme-internal diphones. We conclude that the predictions of the
Strong Morphonotactic Hypothesis are borne out only partially, and we discuss the
discrepancies
The cross-linguistic performance of word segmentation models over time.
We select three word segmentation models with psycholinguistic foundations - transitional probabilities, the diphone-based segmenter, and PUDDLE - which track phoneme co-occurrence and positional frequencies in input strings, and in the case of PUDDLE build lexical and diphone inventories. The models are evaluated on caregiver utterances in 132 CHILDES corpora representing 28 languages and 11.9 m words. PUDDLE shows the best performance overall, albeit with wide cross-linguistic variation. We explore the reasons for this variation, fitting regression models to performance scores with linguistic properties which capture lexico-phonological characteristics of the input: word length, utterance length, diversity in the lexicon, the frequency of one-word utterances, the regularity of phoneme patterns at word boundaries, and the distribution of diphones in each language. These properties together explain four-tenths of the observed variation in segmentation performance, a strong outcome and a solid foundation for studying further variables which make the segmentation task difficult
Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner
International audienceSpectacular progress in the information processing sciences (machine learning, wearable sensors) promises to revolutionize the study of cognitive development. Here, we analyse the conditions under which ’reverse engineering’ language development, i.e., building an effective system thatmimics infant’s achievements, can contribute to our scientific understanding of early language development. We argue that, on the computational side, it is important to move from toy problems to the full complexity of the learning situation, and take as input as faithful reconstructions of the sensorysignals available to infants as possible. On the data side, accessible but privacy-preserving repositories of home data have to be setup. On the psycholinguistic side, specific tests have to be constructed to benchmark humans and machines at different linguistic levels. We discuss the feasibility of this approach and present an overview of current results