82 research outputs found

    Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics

    Full text link
    Automatic assessment of reading fluency using automatic speech recognition (ASR) holds great potential for early detection of reading difficulties and subsequent timely intervention. Precise assessment tools are required, especially for languages other than English. In this study, we evaluate six state-of-the-art ASR-based systems for automatically assessing Dutch oral reading accuracy using Kaldi and Whisper. Results show our most successful system reached substantial agreement with human evaluations (MCC = .63). The same system reached the highest correlation between forced decoding confidence scores and word correctness (r = .45). This system's language model (LM) consisted of manual orthographic transcriptions and reading prompts of the test data, which shows that including reading errors in the LM improves assessment performance. We discuss the implications for developing automatic assessment systems and identify possible avenues of future research

    Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

    Full text link
    Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language. We evaluated their performance on read and extemporaneous speech of native and non-native Dutch children. We also investigated the utility of using ASR technology to provide insight into the children's pronunciation and fluency. The results show that recent, pre-trained ASR transformer-based models achieve acceptable performance from which detailed feedback on phoneme pronunciation quality can be extracted, despite the challenging nature of child and non-native speech.Comment: Published on SLATE 2023, Esmad, Politecnico Do Porto, Portugal, 26-28 June, 2023, pp: 11:1-11:

    Normative Data of Dutch Idiomatic Expressions: Subjective Judgments You Can Bank on

    Get PDF
    The processing of idiomatic expressions is a topical issue in empirical research. Various factors have been found to influence idiom processing, such as idiom familiarity and idiom transparency. Information on these variables is usually obtained through norming studies. Studies investigating the effect of various properties on idiom processing have led to ambiguous results. This may be due to the variability of operationalizations of the idiom properties across norming studies, which in turn may affect the reliability of the subjective judgements. However, not all studies that collected normative data on idiomatic expressions investigated their reliability, and studies that did address the reliability of subjective ratings used various measures and produced mixed results. In this study, we investigated the reliability of subjective judgements, the relation between subjective and objective idiom frequency, and the impact of these dimensions on the participants’ idiom knowledge by collecting normative data of five subjective idiom properties (Frequency of Exposure, Meaning Familiarity, Frequency of Usage, Transparency, and Imageability) from 390 native speakers and objective corpus frequency for 374 Dutch idiomatic expressions. For reliability, we compared measures calculated in previous studies, with the D-coefficient, a metric taken from Generalizability Theory. High reliability was found for all subjective dimensions. One reliability metric, Krippendorff’s alpha, generally produced lower values, while similar values were obtained for three other measures (Cronbach’s alpha, Intraclass Correlation Coefficient, and the D-coefficient). Advantages of the D-coefficient are that it can be applied to unbalanced research designs, and to estimate the minimum number of raters required to obtain reliable ratings. Slightly higher coefficients were observed for so-called experience-based dimensions (Frequency of Exposure, Meaning Familiarity, and Frequency of Usage) than for content-based dimensions (Transparency and Imageability). In addition, fewer raters were required to obtain reliable ratings for the experience-based dimensions. Subjective and objective frequency appeared to be poorly correlated, while all subjective idiom properties and objective frequency turned out to affect idiom knowledge. Meaning Familiarity, Subjective and Objective Frequency of Exposure, Frequency of Usage, and Transparency positively contributed to idiom knowledge, while a negative effect was found for Imageability. We discuss these relationships in more detail, and give methodological recommendations with respect to the procedures and the measure to calculate reliability

    A corpus-based study of Spanish L2 mispronunciations by Japanese speakers

    Get PDF
    In a companion paper (Carranza et al.) submitted to this conference we discuss the importance of collecting specific L1-L2 speech corpora for the sake of developing effective Computer Assisted Pronunciation Training (CAPT) programs. In this paper we examine this point more deeply by reporting on a study that was aimed at compiling and analysing such a corpus to draw up an inventory of recurrent pronunciation errors to be addressed in a CAPT application that makes use of Automatic Speech Recognition (ASR). In particular we discuss some of the results obtained in the analyses of this corpus and some of the methodological issues we had to deal with. The corpus features 8.9 hours of spontaneous, semi-spontaneous and read speech recorded from 20 Japanese students of Spanish L2. The speech data was segmented and transcribed at the orthographic, canonical-phonemic and narrow-phonetic level using Praat software [1]. We adopted the SAMPA phonemic inventory for the phonemic transcription adapted to Spanish [2] and added 11 new symbols and 7 diacritics taken from X-SAMPA [3] for the narrow-phonetic transcription. Non linguistic phenomena and incidents were also annotated with XML tags in independent tiers. Standards for transcribing and annotating non-native spontaneous speech ([4], [5]), as well as the error encoding system used in the project will be addressed. Up to 13410 errors were segmented, aligned with the canonical-phonemic tier and the narrow-phonetic tier, and annotated following an encoding system that specifies the type of error (substitutions, insertion and deletion), the affected phone and the preceding and following phonemic contexts where the error occurred. We then carried out additional analyses to check the accuracy of the transcriptions by asking two other annotators to transcribe a subset of the speech material. We calculated intertranscriber agreement coefficients. The data was automatically recovered by Praat scripts and statistically analyzed with R. The resulting frequency ratios obtained for the most frequent errors and the most frequent contexts of appearance were statistically tested to determine their significance values. We report on the analyses of the combined annotations and draw up an inventory of errors that should be addressed in the training. We then consider how ASR can be employed to properly detect these errors. Furthermore, we suggest possible exercises that may be included in the training to improve the errors identified

    Comparison between expert listeners and continuous speech recognizers in selecting pronunciation variants.

    Get PDF
    In this paper, the performance of an automatic transcription tool is evaluated. The transcription tool is a continuous speech recognizer (CSR) which can be used to select pronunciation variants (i.e. detect insertions and deletions of phones). The performance of the CSR was compared to a reference transcription based on the judgments of expert listeners. We investigated to what extent the degree of agreement between the listeners and the CSR was affected by employing various sets of phone models (PMs). Overall, the PMs perform more similarly to the listeners when pronunciation variation is modeled. However, the various sets of PMs lead to different results for insertion and deletion processes. Furthermore, we found that to a certain degree, word error rates can be used to predict which set of PMs to use in the transcription tool

    Directions for the future of technology in pronunciation research and teaching

    Get PDF
    This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2) pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider how these insights pave the way for researchers and developers working to create research-informed, computer-assisted pronunciation teaching resources. We conclude with predictions for future developments

    Assessing transcription agreement: Methodological aspects

    Full text link

    Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback

    Get PDF
    Contains fulltext : 76490.pdf (author's version ) (Open Access)11 p
    • …
    corecore