78 research outputs found

    Towards using CMU Sphinx Tools for the Holy Quran recitation verification

    Get PDF
    The use of the Automatic Speech Recognition (ASR) technology is being used is many different applications that help simplify the interaction with a wider range of devices. This paper investigates the use of a simplified set of phonemes in an ASR system applied to Holy Quran. The Carnegie Mellon University Sphinx 4 tools were used to train and evaluate an acoustic model on Holy Quran recitations that are widely available online. The building of the acoustic model was done using a simplified list of phonemes instead of the mainly used Romanized in order to simplify the process of training the acoustic model. In this paper, the experiment resulted in Word Error Rates (WER) as low as 1.5% even with a very small set of audio files to use in the training phase

    Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus

    Get PDF
    This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic speech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is based on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observation’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The language model contains both bi-grams and tri-grams. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a WER of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a WER of 15.59% and 14.44% with and without diacritical marks respectively

    Amazigh Spoken Digit Recognition using a Deep Learning Approach based on MFCC

    Get PDF
    The field of speech recognition has made human-machine voice interaction more convenient. Recognizing spoken digits is particularly useful for communication that involves numbers, such as providing a registration code, cellphone number, score, or account number. This article discusses our experience with Amazigh\u27s Automatic Speech Recognition (ASR) using a deep learning- based approach. Our method involves using a convolutional neural network (CNN) with Mel-Frequency Cepstral Coefficients (MFCC) to analyze audio samples and generate spectrograms. We gathered a database of numerals from zero to nine spoken by 42 native Amazigh speakers, consisting of men and women between the ages of 20 and 40, to recognize Amazigh numerals. Our experimental results demonstrate that spoken digits in Amazigh can be recognized with an accuracy of 91.75%, 93% precision, and 92% recall. The preliminary outcomes we have achieved show great satisfaction when compared to the size of the training database. This motivates us to further enhance the system\u27s performance in order to attain a higher rate of recognition. Our findings align with those reported in the existing literature

    On Developing an Automatic Speech Recognition System for Commonly used English Words in Indian English

    Get PDF
    Speech is one of the easiest and the fastest way to communicate. Recognition of speech by computer for various languages is a challenging task. The accuracy of Automatic speech recognition system (ASR) remains one of the key challenges, even after years of research. Accuracy varies due to speaker and language variability, vocabulary size and noise. Also, due to the design of speech recognition that is based on issues like- speech database, feature extraction techniques and performance evaluation. This paper aims to describe the development of a speaker-independent isolated automatic speech recognition system for Indian English language. The acoustic model is build using Carnegie Mellon University (CMU) Sphinx tools. The corpus used is based on Most Commonly used English words in everyday life. Speech database includes the recordings of 76 Punjabi Speakers (north-west Indian English accent). After testing, the system obtained an accuracy of 85.20 %, when trained using 128 GMMs (Gaussian Mixture Models)

    Arabic Continuous Speech Recognition System using Sphinx-4

    Get PDF
    Speech is the most natural form of human communication and speech processing has been one of the most exciting areas of the signal processing. Speech recognition technology has made it possible for computer to follow human voice commands and understand human languages. The main goal of speech recognition area is to develop techniques and systems for speech input to machine and treat this speech to be used in many applications. As Arabic is one of the most widely spoken languages in the world. Statistics show that it is the first language (mother-tongue) of 206 million native speakers ranked as fourth after Mandarin, Spanish and English. In spite of its importance, research effort on Arabic Automatic Speech Recognition (ASR) is unfortunately still inadequate[7]. This thesis proposes and describes an efficient and effective framework for designing and developing a speaker-independent continuous automatic Arabic speech recognition system based on a phonetically rich and balanced speech corpus. The developing Arabic speech recognition system is based on the Carnegie Mellon university Sphinx tools. To build the system, we develop three basic components. The dictionary which contains all possible phonetic pronunciations of any word in the domain vocabulary. The second one is the language model such a model tries to capture the properties of a sequence of words by means of a probability distribution, and to predict the next word in a speech sequence. The last one is the acoustic model which will be created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. The system use the rich and balanced database that contains 367 sentences, a total of 14232 words. The phonetic dictionary contains about 23,841 definitions corresponding to the database words. And the language model contains14233 mono-gram and 32813 bi-grams and 37771 tri-grams. The engine uses 3-emmiting states Hidden Markov Models (HMMs) for tri-phone-based acoustic models

    A reflection on the design and user acceptance of Tamil talk

    Get PDF
    Tamil talk is a speech to text application and was designed from a perspective of language and philosophy. This paper takes an indigenous approach in reflecting on the design and user acceptance of Tamil talk. The paper makes use of literature in critically reflecting on the design and the potential user acceptance of the application. It takes a multidisciplinary approach and explores the influence of factors like language shift, language maintenance and philosophy in the context of user acceptance of speech to text. The application may appeal to a section of the native Tamil speakers as suggested in the literature but there are complex challenges that needs further research. Further research shall be in developing the application that conforms to the conceptual framework and widely test with the native speakers to arrive at a more precise prediction of user acceptance
    corecore