8 research outputs found

    Sunnah Arabic Corpus: Design and Methodology.

    Get PDF
    Sunnah Arabic Corpus is an annotated linguistic resource that consists of 144K words/170K tokens of the Hadith narratives (an utterance attributed to prophet Mohammed) extracted from Riyāḍu Aṣṣāliḥīn book. As a first layer of annotation, the corpus has been fully diacritized. In addition, each orthographic word/token is segmented into its syntactic words. And each syntactic word is tagged with its part-of-speech in addition to multiple morphological features. Several hadith translations in different languages are provided and aligned at the narrative/paragraph level. Hadith Arabic Corpus follows the successful Quranic Arabic Corpus in its standards (corpus.quran.com). Sunnah Arabic Corpus is freely available under the Creative Commons Attribution-ShareAlike 4.0 International License

    Mini-batch k-Means versus k-Means to Cluster English Tafseer Text: View of Al-Baqarah Chapter

    Get PDF
    Al-Quran is the primary text of Muslims' religion and practise. Millions of Muslims around the world use al-Quran as their reference guide, and so knowledge can be obtained from it by Muslims and Islamic scholars in general. Al-Quran has been reinterpreted to various languages in the world, for example, English and has been written by several translators. Each translator has ideas, comments and statements to translate the verses from which he has obtained (Tafseer). Therefore, this paper tries to cluster the translation of the Tafseer using text clustering. Text clustering is the text mining method that needs to be clustered in the same section of related documents. The study adapted (mini-batch k-means and k-means) algorithms of clustering techniques to explain and to define the link between keywords known as features or concepts for Al-Baqarah chapter of 286 verses. For this dataset, data preprocessing and extraction of features using TF-IDF (Term Frequency-Inverse Document Frequency), and PCA (Principal Component Analysis) applied. Results show two/three-dimensional clustering plotting assigning seven cluster categories (k=7) for the Tafseer. The implementation time of the mini-batch k-means algorithm (0.05485s) outperforms the time of the k-means algorithm (0.23334s). Finally, the features 'god', 'people', and 'believe' was the most frequent features

    Using Arabic Numbers (Singular, Dual, and Plurals) Patterns To Enhance Question Answering System Results

    Get PDF
    In the field of information retrieval, it is very difficult to answer the question entered by the user, because the search engine retrieve a ranked documents that contain any key word or phrase inside the documents, this need another extra effort to search the answer inside the documents, and there may be no answer. The alternative of search engine is a question answering system, which it retrieves the exact answer of the question in the natural language if found. A question answering system accepts the question in the natural, then many processes were done to extract the exact answer. In general a question answering system is composed of three main components: question classification module, information retrieval module and answer extraction module. A question answering system is applied in holy Quran which written and cited in Arabic language, some characteristic of the Arabic language were used to enhance the answer extraction, one of these important characteristics is numbering, singular, dual and plural. A prototype build uses special pattern used to process the number in Arabic language, which enhance the answers by adding more words and meaning. A corpus of questions and its answers from holy Quran used to test and answers the question

    Ensemble Morphosyntactic Analyser for Classical Arabic

    Get PDF
    Classical Arabic (CA) is an influential language for Muslim lives around the world. It is the language of two sources of Islamic laws: the Quran and the Sunnah, the collection of traditions and sayings attributed to the prophet Mohammed. However, classical Arabic in general, and the Sunnah, in particular, is underexplored and under-resourced in the field of computational linguistics. This study examines the possible directions for adapting existing tools, specifically morphological analysers, designed for modern standard Arabic (MSA) to classical Arabic. Morphological analysers of CA are limited, as well as the data for evaluating them. In this study, we adapt existing analysers and create a validation data-set from the Sunnah books. Inspired by the advances in deep learning and the promising results of ensemble methods, we developed a systematic method for transferring morphological analysis that is capable of handling different labelling systems and various sequence lengths. In this study, we handpicked the best four open access MSA morphological analysers. Data generated from these analysers are evaluated before and after adaptation through the existing Quranic Corpus and the Sunnah Arabic Corpus. The findings are as follows: first, it is feasible to analyse under-resourced languages using existing comparable language resources given a small sufficient set of annotated text. Second, analysers typically generate different errors and this could be exploited. Third, an explicit alignment of sequences and the mapping of labels is not necessary to achieve comparable accuracies given a sufficient size of training dataset. Adapting existing tools is easier than creating tools from scratch. The resulting quality is dependent on training data size and number and quality of input taggers. Pipeline architecture performs less well than the End-to-End neural network architecture due to error propagation and limitation on the output format. A valuable tool and data for annotating classical Arabic is made freely available

    Procceding 2rd International Seminar on Linguistics

    Get PDF

    LANGUAGE AND CIVILIZATION: PROCEEDING OF THE 2nd INTERNATIONAL SEMINAR ON LINGUISTICS

    Get PDF

    PROCEEDING THE 2nd INTERNATIONAL SEMINAR ON LINGUISTICS (ISOL-2): Language and Civilization

    Get PDF
    ISOL is a biennial international seminar held by the Linguistics Graduate Program of Faculty of Humanity, Andalas University in collaboration with the Linguistic Society of Indonesia (MLI), Unand Chapter. ISOL aims to provide a discussion platform for linguists and language observers across Indonesia. Its main objective is to enhance the exchange of research and new approaches in language studies. The seminar is open to interested people from outside of Indonesia. The theme of the 2nd ISOL is Language and Civilization. Civilization is the process by which a society or place reaches an advanced stage of social development and organization. It is also defined as the society, culture, and way of life of a particular area. Over time, the word civilization has come to imply something beyond the organization. It refers to a particular shared way of thinking about the world as well as a reflection on that world in art, literature, drama and a host of other cultural happenings. Language is itself a social construct – a component of social reality. Thus, like all social constructs and conventions, it can be changed. A civilization is any complex state society which is characterized by urban development, social stratification, symbolic communication forms and a perceived separation from and domination over the natural environment. To advance civilization is to construct a new social reality which emerges through language. In other words, social reality is the operational expression of words and the meanings of them that society has agreed upon. Language is itself a social construct – a component of social reality. Thus, like all social constructs and conventions, it can be changed
    corecore