7 research outputs found

    A Review of Accent-Based Automatic Speech Recognition Models for E-Learning Environment

    Get PDF
    The adoption of electronics learning (e-learning) as a method of disseminating knowledge in the global educational system is growing at a rapid rate, and has created a shift in the knowledge acquisition methods from the conventional classrooms and tutors to the distributed e-learning technique that enables access to various learning resources much more conveniently and flexibly. However, notwithstanding the adaptive advantages of learner-centric contents of e-learning programmes, the distributed e-learning environment has unconsciously adopted few international languages as the languages of communication among the participants despite the various accents (mother language influence) among these participants. Adjusting to and accommodating these various accents has brought about the introduction of accents-based automatic speech recognition into the e-learning to resolve the effects of the accent differences. This paper reviews over 50 research papers to determine the development so far made in the design and implementation of accents-based automatic recognition models for the purpose of e-learning between year 2001 and 2021. The analysis of the review shows that 50% of the models reviewed adopted English language, 46.50% adopted the major Chinese and Indian languages and 3.50% adopted Swedish language as the mode of communication. It is therefore discovered that majority of the ASR models are centred on the European, American and Asian accents, while unconsciously excluding the various accents peculiarities associated with the less technologically resourced continents

    Accent identification of Malaysian and Nigerian English based on acoustic features

    Get PDF
    Purpose - This paper studies acoustics features of energy, pitch and formants of Malaysian and Nigerian English vowels with the aim of effective accents identification using multi liner regression (MLR) and linear discriminant analysis (LDA) classifiers for performance improvement of ASR when exposed to accented speech.Accent being a foremost source of ASR performance degradation has received a great attention from ASR researchers.Majority of ASR applications were developed with native English speakers speech samples without considering fact that most of its potential users speaks English as a second language with a marked accent, hence its poor performance when exposed to accented speech. Previous studies on accent has shown that the ability to correctly recognized accent has greatly enhanced the recognition performance of ASR when exposed to accented speech data.In a study of 14 regional accents of British, (Hanani, Russell, & Carey, 2013) achieved a performance increase of 5.58%.A study by (Vergyri, Lamel, & Gauvain, 2010) using six different regional accented English shows an average of 41.43% WER.Which was reduced to 27% on the incorporation of accent identification module.Several studies have explored several acoustic features of speech such as energy, pitch, formants, MFCC, and LPC to establish the differences between regional or cross ethnics accent aimed at better understanding of the differences in the acoustic features to enhance ASR performance.Apparently from the previous studies reviewed above, it is evident that accent constitute a hurdle to the performance of ASR. Hence, consequently serves as a barrier to ASR wide reception and usage in real life situations. Consequently, it becomes pertinent that accent should be given adequate research attention with the view of enhancing ASR performance to accented speech which will inherently promotes its wide acceptability and applicability globally

    Acoustic Analysis of Nigerian English Vowels Based on Accents

    Get PDF
    Accent has been widely acclaimed to be a major source of automatic speech recognition (ASR) performance degradation. Most ASR applications were developed with native English speaker speech samples not minding the fact that the majority of its potential users speaks English as a second language with a marked accent. Nigeria like most nations colonized by Britain, speaks English as official language despite being a multi-ethnic nation. This work explores the acoustic features of energy, fundamental frequency and the first three formats of the three major ethnic groups of Nigerian based on features extracted from five pure vowels of English obtained from subjects who are Nigerians. This research aimed at determining the differences or otherwise between the pronunciations of the three major ethnic nationalities in Nigeria to aid the development of ASR that is robust to NE accent. The results show that there exist significant differences between the mean values of the pure English vowels based on the pronunciation of the three major ethnics: Hausa, Ibo, and Yoruba. The differences can be explored to enhance the performance of ASR in recognition of NE

    Master of Science

    Get PDF
    thesisPresently, speech recognition is gaining worldwide popularity in applications like Google Voice, speech-to-text reporter (speech-to-text transcription, video captioning, real-time transcriptions), hands-free computing, and video games. Research has been done for several years and many speech recognizers have been built. However, most of the speech recognizers fail to recognize the speech accurately. Consider the well-known application of Google Voice, which aids in users search of the web using voice. Though Google Voice does a good job in transcribing the spoken words, it does not accurately recognize the words spoken with different accents. With the fact that several accents are evolving around the world, it is essential to train the speech recognizer to recognize accented speech. Accent classification is defined as the problem of classifying the accents in a given language. This thesis explores various methods to identify the accents. We introduce a new concept of clustering windows of a speech signal and learn a distance metric using specific distance measure over phonetic strings to classify the accents. A language structure is incorporated to learn this distance metric. We also show how kernel approximation algorithms help in learning a distance metric

    Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

    Get PDF
    This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists
    corecore