3,487 research outputs found

    Adaptation of Whisper models to child speech recognition

    Full text link
    Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child speech to improve ASR for children. In addition, we compare Whisper child-adaptations with finetuned self-supervised models, such as wav2vec2. We demonstrate that finetuning Whisper on child speech yields significant improvements in ASR performance on child speech, compared to non finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2 models that have been finetuned on child speech outperforms Whisper finetuning.Comment: Accepted in Interspeech 202

    Pragmatic functions of lengthenings and filled pauses in the adult-directed speech of Hungarian children

    Get PDF
    Two most common disfluencies of spontaneous speech, vowel lengthenings (VLE) and non-lexicalized filled pauses (NLFP) were investigated in the adult-directed speech of eight Hungarian children. Though VLE and NLFP might seem to be similar vocalizations, recent investigations have shown that their occurrences might differ remarkably in child speech and may al-so change as a function of age. Based on these findings, in the present study the functional analysis of VLEs and NLFPs was performed. It was hypothesized that in child speech the two phenomena have roles not only in speech planning, but also in discourse management, and that they show functional distribution. The analysis provided evidence that VLE is more common than NLFP. VLE often tends to mark discourse events and may play a role in turn-final floor-holding strategies, while NLFP is mostly connected to speech planning, and occasionally, it may also participate in turn-taking gestures, as well

    Plurals in child speech

    Get PDF
    The development of plurals in two German-speaking children was analysed, based on observational data. It was found that (1) plurals were supplied in 90% of the obligatory contexts somewhere between Stage IV and Stage V; (2) plurals were not functionally distinguished from singulars, occurring also in singular contexts; (3) the predominant morphological deviations were of the type in which an additional plural marker was attached to an already correct plural; (4) referring to a single object or event, formally correct plural utterances were often constructed, partly because of as yet unestablished verb conjugation rules. It was argued that the children were learning plurals by rote, conditioned by morphological complexity which cannot be subsumed under any general rul

    CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment

    Full text link
    This paper describes the design and development of CUCHILD, a large-scale Cantonese corpus of child speech. The corpus contains spoken words collected from 1,986 child speakers aged from 3 to 6 years old. The speech materials include 130 words of 1 to 4 syllables in length. The speakers cover both typically developing (TD) children and children with speech disorder. The intended use of the corpus is to support scientific and clinical research, as well as technology development related to child speech assessment. The design of the corpus, including selection of words, participants recruitment, data acquisition process, and data pre-processing are described in detail. The results of acoustical analysis are presented to illustrate the properties of child speech. Potential applications of the corpus in automatic speech recognition, phonological error detection and speaker diarization are also discussed.Comment: Accepted to INTERSPEECH 2020, Shanghai, Chin

    HMM-based synthesis of child speech

    Get PDF
    The synthesis of child speech presents challenges both in the collection of data and in the building of a synthesiser from that data. Because only limited data can be collected, and the domain of that data is constrained, it is difficult to obtain the type of phonetically-balanced corpus usually used in speech synthesis. As a consequence, building a synthesiser from this data is difficult. Concatenative synthesisers are not robust to corpora with many missing units (as is likely when the corpus content is not carefully designed), so we chose to build a statistical parametric synthesiser using the HMM-based system HTS. This technique has previously been shown to perform well for limited amounts of data, and for data collected under imperfect conditions. We compared 6 different configurations of the synthesiser, using both speaker-dependent and speaker-adaptive modelling techniques, and using varying amounts of data. The output from these systems was evaluated alongside natural and vocoded speech, in a Blizzard-style listening test

    Simulating optional infinitive errors in child speech through the omission of sentence-internal elements.

    Get PDF
    A new version of the MOSAIC model of syntax acquisition is presented. The modifications to the model aim to address two weaknesses in its earlier simulations of the Optional Infinitive phenomenon: an over-reliance on questions in the input as the source for Optional Infinitive errors, and the use of an utterance-final bias in learning (recency effect), without a corresponding utterance-initial bias (primacy effect). Where the old version only produced utterance-final phrases, the new version of MOSAIC learns from both the left and right edge of the utterance, and associates utterance-initial and utterancefinal phrases. The new model produces both utterance-final phrases and concatenations of utterance-final and utteranceinitial phrases. MOSAIC now also differentiates between phrases learned from declarative and interrogative input. It will be shown that the new version is capable of simulating the Optional Infinitive phenomenon in English and Dutch without relying on interrogative input. Unlike the previous version of MOSAIC, the new version is also capable of simulating cross-linguistic variation in the occurrence of Optional Infinitive errors in Wh-questions

    Negative input for grammatical errors: effects after a lag of 12 weeks

    Get PDF
    Effects of negative input for 13 categories of grammatical error were assessed in a longitudinal study of naturalistic adult-child discourse. Two-hour samples of conversational interaction were obtained at two points in time, separated by a lag of 12 weeks, for 12 children (mean age 2;0 at the start). The data were interpreted within the framework offered by Saxton’s (1997; 2000) contrast theory of negative input. Corrective input was associated with subsequent improvements in the grammaticality of child speech for three of the target structures. No effects were found for two forms of positive input: non-contingent models, where the adult produces target structures in non-error-contingent contexts; and contingent models, where grammatical forms follow grammatical child usages. The findings lend support to the view that, in some cases at least, the structure of adult-child discourse yields information on the bounds of grammaticality for the language-learning child

    Polysemy and brevity versus frequency in language

    Get PDF
    The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages
    • …
    corecore