Search CORE

6,756 research outputs found

Compression versus Machine Learning for Classifying Modern Arabic Code-Switching in Social Media and Classical Arabic Hadith

Author: Tarmom Taghreed Awad
Publication venue
Publication date: 01/10/2022
Field of study

This thesis aims to enrich Arabic resources by building several Arabic corpora and making them freely available to the Arabic research community. Therefore, the Bangor Arabic–English codeswitching (BAEC) corpus, the Saudi Dialect Corpus (SDC) and the Egyptian Dialect Corpus (EDC) and the Non-Authentic Hadith (NAH) corpus were built. This thesis carries out the detection of code-switching in Arabic varieties and dialects from social media platforms to evaluate the prediction by partial matching (PPM) compression approach, comparing it with a the support vector machine (SVM) classifier with character-based and wordbased approaches. The aim was to test the PPM compression on modern standard Arabic (MSA) and Arabic dialect before using it on Hadith.To the best of our knowledge, no previous study involving the detection of code-switching between Arabic and English using PPM compression has been published before. The experimental results show that PPM compression achieved a higher accuracy rate than the SVM classifier when the training corpus correctly represented the language or dialect being studied. Then, classifying experiments on Arabic Hadith to evaluate the PPM compression approach and compare it against machine learning and deep learning approaches was also performed. The aim was to classify Arabic Hadith into two main classification tasks: Hadith components classification and Hadith authenticity classification. For the former, the experimental results show that deep learning classifiers can achieve a higher classification accuracy than the other classifiers under study. However, the execution time for deep learning classifiers was high. For the latter, the experimental results showed that Isnad was the part of a Hadith resulting in the most effective automatic determination of authenticity. In addition, the results proved that Matan can be used to judge Hadiths with up to 85% accuracy. These experiments were novel in their approaches to Hadith authenticity classification because they investigated the use of the ii character-based text compression scheme PPM and DL classifiers. Finally, the current thesis also investigated the automatic segmentation of Arabic Hadith using PPM compression. The experiments showed that PPM was effective in segmenting Hadith into its two main components, having been tested on different Hadith corpora that have different structures. The main innovation in these experiments was their use of a character-based text compression method to segment the Hadiths

White Rose E-theses Online

Sparse Radial Sampling LBP for Writer Identification

Author: Bagdanov Andrew D.
Karatzas Dimosthenis
Liwicki Marcus
Nicolaou Anguelos
Publication venue
Publication date: 23/04/2015
Field of study

In this paper we present the use of Sparse Radial Sampling Local Binary Patterns, a variant of Local Binary Patterns (LBP) for text-as-texture classification. By adapting and extending the standard LBP operator to the particularities of text we get a generic text-as-texture classification scheme and apply it to writer identification. In experiments on CVL and ICDAR 2013 datasets, the proposed feature-set demonstrates State-Of-the-Art (SOA) performance. Among the SOA, the proposed method is the only one that is based on dense extraction of a single local feature descriptor. This makes it fast and applicable at the earliest stages in a DIA pipeline without the need for segmentation, binarization, or extraction of multiple features.Comment: Submitted to the 13th International Conference on Document Analysis and Recognition (ICDAR 2015

arXiv.org e-Print Archive

Crossref

Compression-based Parts-of-Speech Tagger for the Arabic Language

Author: Alkhazi Ibrahim
Publication venue
Publication date: 18/12/2019
Field of study

Bangor University Research Portal