26,480 research outputs found

    An Intelligent System For Arabic Text Categorization

    Get PDF
    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process. Experiments are performed over self collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language. The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme. Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process. The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98%

    An Experimental Study on Sentiment Classification of Moroccan dialect texts in the web

    Full text link
    With the rapid growth of the use of social media websites, obtaining the users' feedback automatically became a crucial task to evaluate their tendencies and behaviors online. Despite this great availability of information, and the increasing number of Arabic users only few research has managed to treat Arabic dialects. The purpose of this paper is to study the opinion and emotion expressed in real Moroccan texts precisely in the YouTube comments using some well-known and commonly used methods for sentiment analysis. In this paper, we present our work of Moroccan dialect comments classification using Machine Learning (ML) models and based on our collected and manually annotated YouTube Moroccan dialect dataset. By employing many text preprocessing and data representation techniques we aim to compare our classification results utilizing the most commonly used supervised classifiers: k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LTSM). Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing. In fact, the experimental results prove that DL models have a better performance for Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.Comment: 13 pages, 5 tables, 2 figure
    • …
    corecore