26,480 research outputs found
An Intelligent System For Arabic Text Categorization
Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process. Experiments are performed over self collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language. The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme. Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process. The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98%
An Experimental Study on Sentiment Classification of Moroccan dialect texts in the web
With the rapid growth of the use of social media websites, obtaining the
users' feedback automatically became a crucial task to evaluate their
tendencies and behaviors online. Despite this great availability of
information, and the increasing number of Arabic users only few research has
managed to treat Arabic dialects. The purpose of this paper is to study the
opinion and emotion expressed in real Moroccan texts precisely in the YouTube
comments using some well-known and commonly used methods for sentiment
analysis. In this paper, we present our work of Moroccan dialect comments
classification using Machine Learning (ML) models and based on our collected
and manually annotated YouTube Moroccan dialect dataset. By employing many text
preprocessing and data representation techniques we aim to compare our
classification results utilizing the most commonly used supervised classifiers:
k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and
deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and
Long Short-Term Memory (LTSM). Experiments were performed using both raw and
preprocessed data to show the importance of the preprocessing. In fact, the
experimental results prove that DL models have a better performance for
Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.Comment: 13 pages, 5 tables, 2 figure
- …