907 research outputs found

    Automatic Pronunciation Assessment -- A Review

    Full text link
    Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding

    Analysis Of Variation In The Number Of MFCC Features In Contrast To LSTM In The Classification Of English Accent Sounds

    Get PDF
    Various studies have been carried out to classify English accents using traditional classifiers and modern classifiers. In general, research on voice classification and voice recognition that has been done previously uses the MFCC method as voice feature extraction. The stages in this study began with importing datasets, data preprocessing of datasets, then performing MFCC feature extraction, conducting model training, testing model accuracy and displaying a confusion matrix on model accuracy. After that, an analysis of the classification has been carried out. The overall results of the 10 tests on the test set show the highest accuracy value for feature 17 value of 64.96% in the test results obtained some important information, including; The test results on the MFCC coefficient values of twelve to twenty show overfitting. This is shown in the model training process which repeatedly produces high accuracy but produces low accuracy in the classification testing process. The feature assignment on MFCC shows that the higher the feature value assignment on MFCC causes a very large sound feature dimension. With the large number of features obtained, the MFCC method has a weakness in determining the number of features

    Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

    Full text link
    Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. Specifically, we train a Goodness Of Pronunciation feature-based Transformer (GOPT) with multi-task learning. Experiments show that GOPT achieves the best results on speechocean762 with a public automatic speech recognition (ASR) acoustic model trained on Librispeech.Comment: Accepted at ICASSP 2022. Code at https://github.com/YuanGongND/gopt Interactive Colab demo at https://colab.research.google.com/github/YuanGongND/gopt/blob/master/colab/GOPT_GPU.ipynb . ICASSP 202

    A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment

    Full text link
    Automatic Pronunciation Assessment (APA) plays a vital role in Computer-assisted Pronunciation Training (CAPT) when evaluating a second language (L2) learner's speaking proficiency. However, an apparent downside of most de facto methods is that they parallelize the modeling process throughout different speech granularities without accounting for the hierarchical and local contextual relationships among them. In light of this, a novel hierarchical approach is proposed in this paper for multi-aspect and multi-granular APA. Specifically, we first introduce the notion of sup-phonemes to explore more subtle semantic traits of L2 speakers. Second, a depth-wise separable convolution layer is exploited to better encapsulate the local context cues at the sub-word level. Finally, we use a score-restraint attention pooling mechanism to predict the sentence-level scores and optimize the component models with a multitask learning (MTL) framework. Extensive experiments carried out on a publicly-available benchmark dataset, viz. speechocean762, demonstrate the efficacy of our approach in relation to some cutting-edge baselines.Comment: Accepted to Interspeech 202
    corecore