Search CORE

6 research outputs found

Improving the Performance of Low-resourced Speaker Identification with Data Preprocessing

Author: Hay Mar Soe Naing
Win Lai Lai Phyu
Win Pa Pa
Publication venue: ITB Journal Publisher
Publication date: 01/12/2023
Field of study

Automatic speaker identification is done to tackle daily security problems. Speech data collection is an essential but very challenging task for under-resourced languages like Burmese. The speech quality is crucial to accurately recognize the speaker’s identity. This work attempted to find the optimal speech quality appropriate for Burmese tone to enhance identification compared with other more richy resourced languages on Mel-frequency cepstral coefficients (MFCCs). A Burmese speech dataset was created as part of our work because no appropriate dataset available for use. In order to achieve better performance, we preprocessed the foremost recording quality proper for not only Burmese tone but also for nine other Asian languages to achieve multilingual speaker identification. The performance of the preprocessed data was evaluated by comparing with the original data, using a time delay neural network (TDNN) together with a subsampling technique that can reduce time complexity in model training. The experiments were investigated and analyzed on speech datasets of ten Asian languages to reveal the effectiveness of the data preprocessing. The dataset outperformed the original dataset with improvements in terms of equal error rate (EER). The evaluation pointed out that the performance of the system with the preprocessed dataset improved that of the original dataset

Journal of ICT Research and Applications

Directory of Open Access Journals

Automatic Speech Recognition on Spontaneous Interview Speech

Author: Naing Hay Mar Soe
Pa Win Pa
Publication venue
Publication date: 22/02/2018
Field of study

This paper presents a spontaneous speechrecognition system for Myanmar language. Automaticspeech recognition (ASR) on some controlled speechhas achieved almost human performance. However, theperformance of spontaneous speech is drasticallydecreased due to the diversity of speaking styles, speakrate, presence of additive and non-linear distortion,accents and weakened articulation. In this study, webuilt a recognizer for Myanmar Interview speech byusing the classical Gaussian Mixture Model basedHidden Markov Model (HMM-GMM) approach. Weinvested that the effect of variation on acoustic featureand number of senones and Gaussian densities onMyanmar Interview speech. According to theseexperiments, we achieved the best Word Error Rate(WER) of 20.47%

MERAL Portal