429 research outputs found

    A Machine Learning Model for the Identification of the Holy Quran Reciter Utilizing K-Nearest Neighbor and Artificial Neural Networks

    Get PDF
    The method of identification of the Holy Quran reciter, which is entered on the various features of the acoustic wave, is referred to as the Holy Quran Reciter Identification. The Muslim communitys Holy Book is the Holy Quran. Listening to or reading the Holy Quran is one of the obligatory activities for Muslims. This research proposes a machine learning model for identifying the Holy Quran reciter using a machine learning language. Here, the presented system comprises the essential phases for a voice recognition system encompassing the processes of classification, extraction of features, preprocessing, and data acquisition. Moreover, the voices of ten known reciters are framed as a dataset in this research. The reciters are leaders of prayers in the Holy masjids of Madinah and Makkah. The analysis of the audio dataset is performed using the mel frequency cepstral coefficients (MFCC). The artificial neural network (ANN) and the k-nearest neighbor (KNN) classifiers are employed for classification. The pitch is utilized as features employed to train the KNN and ANN classifiers. The proposed system is validated using two chapters chosen from the Holy Quran. The results revealed an excellent level of accuracy. With the help of the ANN classifier, the proposed system offered 98.5% accuracy for chapter 7 and 97.2% accuracy for chapter 32. On the other hand, while utilizing KNN, the accuracy for chapter 7 is 97.02% and for chapter 32 is 96.07%. Then, the system’s performance is compared with the utilization of support vector machines (SVM) in recognition of Quranic voice reciter. The comparison results revealed that ANN is a better machine learning algorithm for voice recognition when compared to SVM

    A Statistical Learning Approach to Evidence the Acoustic Miracles in the Holy Quran Using Audio Features

    Get PDF
    This paper presents a novel approach for exploring the intrinsic acoustic properties of the Holy Quran, in an attempt to provide yet one more evidence of the miraculous nature of the Quran. The study uses a dataset composed of recitations made by seven prominent reciters and three chapters of the Quran. A novel statistical approach is used to detect the correlation between the recitations of the reciters for three different Chapters (Quranic Surah). The study utilizes the Mel-Frequency Cepstral Coefficients (MFCCs) feature to detect certain common patterns among the recitations. The main measurement indexes used in this study are the correlation and the Euclidian Distance (ED) between the mean of the MFCCs Cepstral Coefficients, and deltadelta MFCCs. The study reveals a strong correlation and short distance between all recitations for one verse at a time, and relatively high correlation and short distance for two or more verses. Furthermore, the study lays down a foundation to detect and formulate acoustic clusters for sequential verses in the Holy Quran

    Analysis of momentous fragmentary formants in Talaqi-like neoteric assessment of Quran recitation using MFCC miniature features of Quranic syllables

    Get PDF
    The use of technological speech recognition systems with a variety of approaches and techniques has grown rapidly in a variety of human-machine interaction applications. Further to this, a computerized assessment system to identify errors in reading the Qur’an can be developed to practice the advantages of technology that exist today. Based on Quranic syllable utterances, which contain Tajweed rules that generally consist of Makhraj (articulation process), Sifaat (letter features or pronunciation) and Harakat (pronunciation extension), this paper attempts to present the technological capabilities in realizing Quranic recitation assessment. The transformation of the digital signal of the Quranic voice with the identification of reading errors (based on the Law of Tajweed) is the main focus of this paper. This involves many stages in the process related to the representation of Quranic syllable-based Recitation Speech Signal (QRSS), feature extraction, non-phonetic transcription Quranic Recitation Acoustic Model (QRAM), and threshold classification processes. MFCC-Formants are used in a miniature state that are hybridized with three bands in representing QRSS combined vowels and consonants. A human-guided threshold classification approach is used to assess recitation based on Quranic syllables and threshold classification performance for the low, medium, and high band groups with performances of 87.27%, 86.86%and 86.33%, respectively

    Building a neural speech recognizer for quranic recitations

    Get PDF
    This work is an effort towards building Neural Speech Recognizers system for Quranic recitations that can be effectively used by anyone regardless of their gender and age. Despite having a lot of recitations available online, most of them are recorded by professional male adult reciters, which means that an ASR system trained on such datasets would not work for female/child reciters. We address this gap by adopting a benchmark dataset of audio records of Quranic recitations that consists of recitations by both genders from different ages. Using this dataset, we build several speaker-independent NSR systems based on the DeepSpeech model and use word error rate (WER) for evaluating them. The goal is to show how an NSR system trained and tuned on a dataset of a certain gender would perform on a test set from the other gender. Unfortunately, the number of female recitations in our dataset is rather small while the number of male recitations is much larger. In the first set of experiments, we avoid the imbalance issue between the two genders and down-sample the male part to match the female part. For this small subset of our dataset, the results are interesting with 0.968 WER when the system is trained on male recitations and tested on female recitations. The same system gives 0.406 WER when tested on male recitations. On the other hand, training the system on female recitations and testing it on male recitation gives 0.966 WER while testing it on female recitations gives 0.608 WER

    Statistical Parsing by Machine Learning from a Classical Arabic Treebank

    Get PDF
    Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

    Parallel corpus multi stream question answering with applications to the Qu'ran

    Get PDF
    Question-Answering (QA) is an important research area, which is concerned with developing an automated process that answers questions posed by humans in a natural language. QA is a shared task for the Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing communities (NLP). A technical review of different QA system models and methodologies reveals that a typical QA system consists of different components to accept a natural language question from a user and deliver its answer(s) back to the user. Existing systems have been usually aimed at structured/ unstructured data collected from everyday English text, i.e. text collected from television programmes, news wires, conversations, novels and other similar genres. Despite all up-to-date research in the subject area, a notable fact is that none of the existing QA Systems has been tested on a Parallel Corpus of religious text with the aim of question answering. Religious text has peculiar characteristics and features which make it more challenging for traditional QA methods than other kinds of text. This thesis proposes PARMS (Parallel Corpus Multi Stream) Methodology; a novel method applying existing advanced IR (Information Retrieval) techniques, and combining them with NLP (Natural Language Processing) methods and additional semantic knowledge to implement QA (Question Answering) for a parallel corpus. A parallel Corpus involves use of multiple forms of the same corpus where each form differs from others in a certain aspect, e.g. translations of a scripture from one language to another by different translators. Additional semantic knowledge can be referred as a stream of information related to a corpus. PARMS uses Multiple Streams of semantic knowledge including a general ontology (WordNet) and domain-specific ontologies (QurTerms, QurAna, QurSim). This additional knowledge has been used in embedded form for Query Expansion, Corpus Enrichment and Answer Ranking. The PARMS Methodology has wider applications. This thesis applies it to the Quran – the core text of Islam; as a first case study. The PARMS Method uses parallel corpus comprising ten different English translations of the Quran. An individual Quranic verse is treated as an answer to questions asked in a natural language, English. This thesis also implements PARMS QA Application as a proof of concept for the PARMS methodology. The PARMS Methodology aims to evaluate the range of semantic knowledge streams separately and in combination; and also to evaluate alternative subsets of the DATA source: QA from one stream vs. parallel corpus. Results show that use of Parallel Corpus and Multiple Streams of semantic knowledge have obvious advantages. To the best of my knowledge, this method is developed for the first time and it is expected to be a benchmark for further research area

    Quranic Arabic Semantic Search Model Based on Ontology of Concepts

    Get PDF
    The Holy Quran is the essential resource for Islamic sciences and Arabic language. Therefore, numerous Quranic search applications have been built to facilitate the retrieval of knowledge from the Quran. This thesis presents a novel Arabic Quran semantic search model. First, this thesis evaluated existing search tools constructed for the Holy Quran, against 13 criteria depending on: search features, output features, the precision of the retrieved verses, recall database size, and types of database contents. Then, the study reviewed the existing Quran ontologies and compared them against 11 criteria. Some deficits have been found in all these ontologies. Additionally, a single Quranic ontology does not cover most of the knowledge in the Quran. Therefore, I developed a new Arabic-English Quran ontology from ten datasets related to the Quran such as: Quran chapter and verse names, Quran word meanings, and Quran topics. The main aim of developing a Quranic ontology is to facilitate the retrieval of knowledge from the Quran. Additionally, the Quran ontology will enrich the raw Arabic and English Quran text with Islamic semantic tags. Furthermore, I developed the first Annotated Corpus of Quran Questions and Answers in Arabic. This corpus has 2200 pairs of question and answer collected from trusted Islamic sources. Each pair of question and answer is labelled with 5 tags. Examples of tags are: question type: either factoid or descriptive, topic of question-based on the Quran ontology, and question class. Finally, the thesis explains a new semantic search model for the Arabic Quran based on my Quran ontology. This model aims at overcoming limitations in the existing Quran search applications. This search tool employs both Information Retrieval techniques and semantic search technologies. The performance of this search model is evaluated by using The Annotated Corpus of Arabic Quran Questions and Answers
    corecore