16 research outputs found

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

    Automatic Screening of Childhood Speech Sound Disorders and Detection of Associated Pronunciation Errors

    Full text link
    Speech disorders in children can affect their fluency and intelligibility. Delay in their diagnosis and treatment increases the risk of social impairment and learning disabilities. With the significant shortage of Speech and Language Pathologists (SLPs), there is an increasing interest in Computer-Aided Speech Therapy tools with automatic detection and diagnosis capability. However, the scarcity and unreliable annotation of disordered child speech corpora along with the high acoustic variations in the child speech data has impeded the development of reliable automatic detection and diagnosis of childhood speech sound disorders. Therefore, this thesis investigates two types of detection systems that can be achieved with minimum dependency on annotated mispronounced speech data. First, a novel approach that adopts paralinguistic features which represent the prosodic, spectral, and voice quality characteristics of the speech was proposed to perform segment- and subject-level classification of Typically Developing (TD) and Speech Sound Disordered (SSD) child speech using a binary Support Vector Machine (SVM) classifier. As paralinguistic features are both language- and content-independent, they can be extracted from an unannotated speech signal. Second, a novel Mispronunciation Detection and Diagnosis (MDD) approach was introduced to detect the pronunciation errors made due to SSDs and provide low-level diagnostic information that can be used in constructing formative feedback and a detailed diagnostic report. Unlike existing MDD methods where detection and diagnosis are performed at the phoneme level, the proposed method achieved MDD at the speech attribute level, namely the manners and places of articulations. The speech attribute features describe the involved articulators and their interactions when making a speech sound allowing a low-level description of the pronunciation error to be provided. Two novel methods to model speech attributes are further proposed in this thesis, a frame-based (phoneme-alignment) method leveraging the Multi-Task Learning (MTL) criterion and training a separate model for each attribute, and an alignment-free jointly-learnt method based on the Connectionist Temporal Classification (CTC) sequence to sequence criterion. The proposed techniques have been evaluated using standard and publicly accessible adult and child speech corpora, while the MDD method has been validated using L2 speech corpora

    Design and evaluation of mobile computer-assisted pronunciation training tools for second language learning

    Get PDF
    The quality of speech technology (automatic speech recognition, ASR, and textto- speech, TTS) has considerably improved and, consequently, an increasing number of computer-assisted pronunciation (CAPT) tools has included it. However, pronunciation is one area of teaching that has not been developed enough since there is scarce empirical evidence assessing the effectiveness of tools and games that include speech technology in the field of pronunciation training and teaching. This PhD thesis addresses the design and validation of an innovative CAPT system for smart devices for training second language (L2) pronunciation. Particularly, it aims to improve learner’s L2 pronunciation at the segmental level with a specific set of methodological choices, such as learner’s first and second language connection (L1– L2), minimal pairs, a training cycle of exposure–perception–production, individualistic and social approaches, and the inclusion of ASR and TTS technology. The experimental research conducted applying these methodological choices with real users validates the efficiency of the CAPT prototypes developed for the four main experiments of this dissertation. Data is automatically gathered by the CAPT systems to give an immediate specific feedback to users and to analyze all results. The protocols, metrics, algorithms, and methods necessary to statistically analyze and discuss the results are also detailed. The two main L2 tested during the experimental procedure are American English and Spanish. The different CAPT prototypes designed and validated in this thesis, and the methodological choices that they implement, allow to accurately measuring the relative pronunciation improvement of the individuals who trained with them. Both rater’s subjective scores and CAPT’s objective scores show a strong correlation, being useful in the future to be able to assess a large amount of data and reducing human costs. Results also show an intensive practice supported by a significant number of activities carried out. In the case of the controlled experiments, students who worked with the CAPT tool achieved better pronunciation improvement values than their peers in the traditional in-classroom instruction group. In the case of the challenge-based CAPT learning game proposed, the most active players in the competition kept on playing until the end and achieved significant pronunciation improvement results.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic

    Diagnostic CALL tool for Arabic learners

    Full text link

    Improving mispronunciation detection and enriching diagnostic feedback for non-native learners of Mandarin

    Get PDF
    Computer assisted pronunciation training (CAPT) system has been designed to help students improve their speaking skills by providing automatic pronunciation scores and diagnostic feedback. Its mispronunciation detection performance highly depends on the quality of the ASR acoustic model trained with a non-native corpus, and the binary detectors verifying whether the current pronunciation is correctly articulated. Meanwhile, its diagnostic ability is dependent on the choice of the modeled units (e.g., phone, articulation manner, and place), and whether the made decision of selected verifiers/classifiers is interpretable. In this thesis, we show our effort to improve the mispronunciation detection of Mandarin and enrich diagnostic feedback for second language learners. The problem is tackled from the perspective of acoustic modeling, verification and feedback generation of Mandarin phones and tones. For the acoustic modeling part, speech attributes and soft targets are respectively proposed to help resolve phone and tone’s hard-assignment labels, which are not optimal for describing irregular non-native pronunciations. Subsequently, multisource information or better trained acoustic model can provide more accurate features for mispronunciation detectors. Experimental results show that enhanced features can bring consistent improvement for Mandarin phone/tone mispronunciation detection. For the verification part, segmental pronunciation representation, usually calculated by frame-level averaging in a DNN, is now learned by the memory components in a BLSTM, which directly uses sequential context information to embed a sequence of pronunciation scores into a pronunciation vector to improve the performance of mispronunciation detectors. This improvement is observed both in the phone and tone’s mispronunciation detection task. For the feedback generation part, with the help of phone-, articulatory-, and tone-level posterior scores and interpretable decision trees, we can visualize nonnative mispronunciations and provide comprehensive feedback, including articulation manner, place, and pitch contour-related diagnostic information, to help L2 learners. Experimental results confirm that our proposed decision trees can provide accurate diagnostic feedback.Ph.D

    Austronesian and other languages of the Pacific and South-east Asia : an annotated catalogue of theses and dissertations

    Get PDF

    Text Complexity Levels and Second Language Reading Performance in Indonesia

    Get PDF
    This study examined the effect of text complexity on L2 reading performance of Indonesian students with different language proficiency levels. Four passages consisting of two low complexities and two high complexities were given to participants. Text complexity levels were analysed using Flesch’s reading ease index or grade level formula (Flesch, 1948, 1951, 1979),Text-Evaluator designed by Education Testing Service (2013), and Coh-Metrix version 3.0 indexes (McNamara, Louwerse, Cai & Graesser, 2013). Additionally, Pearson’s correlation, regression and one-way ANOVA were employed. There were 1054 university EFL students participating in this study. The findings revealed that text complexity had a moderate correlation and significantly contributed to L2 reading. Keywords: text complexity, reading comprehension, readabilit
    corecore