8 research outputs found

    MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

    Get PDF
    This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

    Hizketa-ezagutzan oinarritutako estrategiak, euskarazko online OBHI (Ordenagailu Bidezko Hizkuntza Ikaskuntza) sistemetarako

    Get PDF
    211 p. (eng) 217 p. (eusk.)Tesi honetan, euskarazko hizketa-ezagutze automatikoaren bi inplementazio aztertzen dira, Ordenagailu Bidezko Hizkuntza Ikaskuntza (OBHI) sistemetarako: Ordenagailu Bidezko Ebakera Lanketa (OBEL) eta Ahozko Gramatika Praktika (AGP). OBEL sistema klasikoan, erabiltzaileari esaldi bat irakurrarazten zaio, eta fonema bakoitzerako puntuazio bat jasotzen du bueltan. AGPn, Hitzez Hitzeko Esaldi Egiaztapena (HHEE) teknika proposatu dugu, ariketak ebatzi ahala egiaztatzen dituen sistema. Bi sistemon oinarrian, esakuntza egiaztatzeko teknikak daude, Goodness of Pronunciation (GOP) puntuazioa, adibididez.Sistema horiek inplementatzeko, eredu akustikoak entrenatu behar dira, eta, horretarako, Basque Speecon-like datu-basea erabili dugu, euskararako publikoki erabilgarri dagoen datu-base bakarra. Eredu akustiko onak lortzearren, datu-basean egokitzapenak egin behar izan dira hiztegi alternatibadun bat sortuz, eta fasekako entrenamendua ere probatu da. % 12.21eko PER (fonemen errore-tasa) lortu da hala.Lehendabiziko sistema laborategiko baldintzetan testatu da, eta emaitza lehiakorrak lortu dira.Hala ere, tesi honetako OBEL eta AGP sistemen helburua da bezero/zerbitzari motako arkitektura batean ezartzea, ikasleek edonondik atzi dezaten. Hori ahalbidetzeko, HTML5eko zehaztapenak erabili dira audioa zerbitzarira grabatu ahala bidaltzeko, eta, gainera, onlineko batezbesteko- eta bariantza-normalizazio cepstraleko (CMVN, Cepstral Mean and Variance Normalisation) teknika berri bat proposatu da erabiltzaileek grabatutako audio-seinaleen kanal desberdintasunen eragina txikiagotzeko. Teknika hori tesi honetan aurkeztutako metodo batean oinarriturik dago: normalizazio anitzeko puntuatzea (MNS, Multi Normalization Scoring), eta onlineko ahots-aktibitatearen detektagailu (VAD, Voice Activity Detector) berri bat ere proposatu da metodo horretan oinarriturik. Azkenik, parametro desberdinak ebaluatu dira neurona-sareak erabiliz, eta ondorioztatu da GOP puntuazioa dela eraginkorrena

    Automatic Screening of Childhood Speech Sound Disorders and Detection of Associated Pronunciation Errors

    Full text link
    Speech disorders in children can affect their fluency and intelligibility. Delay in their diagnosis and treatment increases the risk of social impairment and learning disabilities. With the significant shortage of Speech and Language Pathologists (SLPs), there is an increasing interest in Computer-Aided Speech Therapy tools with automatic detection and diagnosis capability. However, the scarcity and unreliable annotation of disordered child speech corpora along with the high acoustic variations in the child speech data has impeded the development of reliable automatic detection and diagnosis of childhood speech sound disorders. Therefore, this thesis investigates two types of detection systems that can be achieved with minimum dependency on annotated mispronounced speech data. First, a novel approach that adopts paralinguistic features which represent the prosodic, spectral, and voice quality characteristics of the speech was proposed to perform segment- and subject-level classification of Typically Developing (TD) and Speech Sound Disordered (SSD) child speech using a binary Support Vector Machine (SVM) classifier. As paralinguistic features are both language- and content-independent, they can be extracted from an unannotated speech signal. Second, a novel Mispronunciation Detection and Diagnosis (MDD) approach was introduced to detect the pronunciation errors made due to SSDs and provide low-level diagnostic information that can be used in constructing formative feedback and a detailed diagnostic report. Unlike existing MDD methods where detection and diagnosis are performed at the phoneme level, the proposed method achieved MDD at the speech attribute level, namely the manners and places of articulations. The speech attribute features describe the involved articulators and their interactions when making a speech sound allowing a low-level description of the pronunciation error to be provided. Two novel methods to model speech attributes are further proposed in this thesis, a frame-based (phoneme-alignment) method leveraging the Multi-Task Learning (MTL) criterion and training a separate model for each attribute, and an alignment-free jointly-learnt method based on the Connectionist Temporal Classification (CTC) sequence to sequence criterion. The proposed techniques have been evaluated using standard and publicly accessible adult and child speech corpora, while the MDD method has been validated using L2 speech corpora
    corecore