Search CORE

12 research outputs found

Domain Classification for Marathi Blog Articles using Deep Learning

Author: Kiran N. Girase et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 02/11/2023
Field of study

Nowadays the exponential growth of online content, particularly in the form of blog articles is tremendous, the need for effective techniques to automatically categorize them into relevant domains has become increasingly important. To overcome the challenges the domains like natural language processing (NLP), machine learning (ML) and deep learning (DL)are being working as booster effect to emerge out with solutions. In this proposed system methodology-based NLP and DL domain the long short-term memory (LSTM) classifier for domain classification and compared the existing multiclass classification techniques with having accuracy around 94% and 91% by long short-term memory (LSTM) model using two different data sets one is Marathi new article and another one Financial article data set. The proposed model is being compared with multiple other models like naïve bayes (NB), XGBoost, support vector machine (SVM) and random forest (RF). The final estimated result achieved is best combination of dataset and deep learning algorithm LSTM

International Journal on Recent and Innovation Trends in Computing and Communication

Arabic Text Mining

Author: AL-Ghuribi Sumaia Mohammed
Noah Shahrul Azman Mohd
Publication venue
Publication date: 04/11/2022
Field of study

The rapid growth of the internet has increased the number of online texts. This led to the rapid growth of the number of online texts in the Arabic language. The enormous amount of text must be organized into classes to make the analysis process and text retrieval easier. Text classification is, therefore, a key component of text mining. There are numerous systems and approaches for categorizing literature in English, European (French, German, Spanish), and Asian (Chinese, Japanese). In contrast, there are relatively few studies on categorizing Arabic literature due to the difficulty of the Arabic language. In this work, a brief explanation of key ideas relevant to Arabic text mining are introduced then a new classification system for the Arabic language is presented using light stemming and Classifier Na\"ive Bayesian (CNB). Texts from two classes: politics and sports, are included in our corpus. Some texts are added to the system, and the system correctly classified them, demonstrating the effectiveness of the system

arXiv.org e-Print Archive

An improved Arabic text classification method using word embedding

Author: Bahassine Said
El Beggar Omar
Kissi Mohamed
Sabri Tarik
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Feature selection (FS) is a widely used method for removing redundant or irrelevant features to improve classification accuracy and decrease the model’s computational cost. In this paper, we present an improved method (referred to hereafter as RARF) for Arabic text classification (ATC) that employs the term frequency-inverse document frequency (TF-IDF) and Word2Vec embedding technique to identify words that have a particular semantic relationship. In addition, we have compared our method with four benchmark FS methods namely principal component analysis (PCA), linear discriminant analysis (LDA), chi-square, and mutual information (MI). Support vector machine (SVM), k-nearest neighbors (K-NN), and naive Bayes (NB) are three machine learning based algorithms used in this work. Two different Arabic datasets are utilized to perform a comparative analysis of these algorithms. This paper also evaluates the efficiency of our method for ATC on the basis of performance metrics viz accuracy, precision, recall, and F-measure. Results revealed that the highest accuracy achieved for the SVM classifier applied to the Khaleej-2004 Arabic dataset with 94.75%, while the same classifier recorded an accuracy of 94.01% for the Watan-2004 Arabic dataset

Institute of Advanced Engineering and Science

Text Classification on Islamic Jurisprudence using Machine Learning Techniques

Author: Petir Papilo -
Publication venue
Publication date: 01/01/2019
Field of study

Analisis Harga Pokok Produksi Rumah Pada

Performance Analysis of Machine Learning Approaches in Automatic Classification of Arabic Language

Author: S. Alharithi Fahd
Publication venue: Arab Journals Platform
Publication date: 29/04/2023
Field of study

Text classification (TC) is a crucial subject. The number of digital files available on the internet is enormous. The goal of TC is to categorize texts into a series of predetermined groups. The number of studies conducted on the English database is significantly higher than the number of studies conducted on the Arabic database. Therefore, this research analyzes the performance of automatic TC of the Arabic language using Machine Learning (ML) approaches. Further, Single-label Arabic News Articles Datasets (SANAD) are introduced, which contain three different datasets, namely Akhbarona, Khaleej, and Arabiya. Initially, the collected texts are pre-processed in which tokenization and stemming occur. In this research, three kinds of stemming are employed, namely light stemming, Khoja stemming, and no- stemming, to evaluate the effect of the pre-processing technique on Arabic TC performance. Moreover, feature extraction and feature weighting are performed; in feature weighting, the term weighting process is completed by the term frequency- inverse document frequency (tf-idf) method. In addition, this research selects C4.5, Support Vector Machine (SVM), and Naïve Bayes (NB) as a classification algorithm. The results indicated that the SVM and NB methods had attained higher accuracy than the C4.5 method. NB achieved the maximum accuracy with a performance of 99.9%

Arab Journals Platform

Harnessing Deep Learning Techniques for Text Clustering and Document Categorization

Author: Kancherla Gangadhara Rao
Paladugu Rama Krishna
Publication venue: Auricle Global Society of Education and Research
Publication date: 20/09/2023
Field of study

This research paper delves into the realm of deep text clustering algorithms with the aim of enhancing the accuracy of document classification. In recent years, the fusion of deep learning techniques and text clustering has shown promise in extracting meaningful patterns and representations from textual data. This paper provides an in-depth exploration of various deep text clustering methodologies, assessing their efficacy in improving document classification accuracy. Delving into the core of deep text clustering, the paper investigates various feature representation techniques, ranging from conventional word embeddings to contextual embeddings furnished by BERT and GPT models.By critically reviewing and comparing these algorithms, we shed light on their strengths, limitations, and potential applications. Through this comprehensive study, we offer insights into the evolving landscape of document analysis and classification, driven by the power of deep text clustering algorithms.Through an original synthesis of existing literature, this research serves as a beacon for researchers and practitioners in harnessing the prowess of deep learning to enhance the accuracy of document classification endeavors

International Journal on Recent and Innovation Trends in Computing and Communication

Semantic Classification of Multidialectal Arabic Social Media

Author: Rishel Tom
Publication venue: The Aquila Digital Community
Publication date: 01/05/2021
Field of study

Arabic is one of the most widely used languages in the world, but due in part to its morphological and syntactic richness, resources for automated processing of Arabic are relatively rare. Arabic takes three primary forms: Classical Arabic as seen in the Qur’an and other classical texts; Modern Standard Arabic (MSA) as seen in newspapers, formal documents, and other written text intended for widespread distribution; and dialectal Arabic as used in common speech and informal communication. Social media posts are often written in informal language and may include non-standard spellings, abbreviations, emoticons, hashtags, and emojis. Dialectal Arabic is commonly used in social media. Semantic classification is the task of assigning a label to a text based on its primary semantic content. Given the increased use of dialectal Arabic on social media platforms in recent years, there is an urgent need for semantic classification of dialectal Arabic. Even compared to MSA there are few resources for automated processing of dialectal Arabic. The prior work dealing with automated processing of dialectal Arabic are limited to only one or two dialects. One of the major obstacles to doing semantic classification of multi-dialectal Arabic is the lack of a large, multi-dialectal, tagged corpus. To the best of our knowledge there are no automated processes for semantic classification of multi-dialectal Arabic social media texts. We gather a data set of more than one million tweets collected from 449 accounts located in 12 Arabic-speaking countries. We group those tweets into 21,791 documents by country, account, and month. We first construct a query to represent a particular semantic concept. Then, using Latent Semantic Analysis (LSA) we rank the documents by semantic similarity to the query. Next, we use that ranking to train a deep neural network classifier to identify documents whose text is semantically similar to the query. Experiments demonstrate an overall accuracy of 98.075% and a positive accuracy of 88.178% have been achieved by this approach to semantic classification of multi-dialectal Arabic. The source code and the data set are provided on GitHub at https://github.com/therishel/ArabLeader

Aquila Digital Community (University of Southern Mississippi, USM)

The 1st International Electronic Conference on Algorithms

Author
Publication venue: 'MDPI AG'
Publication date: 06/05/2022
Field of study

This book presents 22 of the accepted presentations at the 1st International Electronic Conference on Algorithms which was held completely online from September 27 to October 10, 2021. It contains 16 proceeding papers as well as 6 extended abstracts. The works presented in the book cover a wide range of fields dealing with the development of algorithms. Many of contributions are related to machine learning, in particular deep learning. Another main focus among the contributions is on problems dealing with graphs and networks, e.g., in connection with evacuation planning problems

Directory of Open Access Books (DOAB)