134 research outputs found
An Experimental Study on Sentiment Classification of Moroccan dialect texts in the web
With the rapid growth of the use of social media websites, obtaining the
users' feedback automatically became a crucial task to evaluate their
tendencies and behaviors online. Despite this great availability of
information, and the increasing number of Arabic users only few research has
managed to treat Arabic dialects. The purpose of this paper is to study the
opinion and emotion expressed in real Moroccan texts precisely in the YouTube
comments using some well-known and commonly used methods for sentiment
analysis. In this paper, we present our work of Moroccan dialect comments
classification using Machine Learning (ML) models and based on our collected
and manually annotated YouTube Moroccan dialect dataset. By employing many text
preprocessing and data representation techniques we aim to compare our
classification results utilizing the most commonly used supervised classifiers:
k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and
deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and
Long Short-Term Memory (LTSM). Experiments were performed using both raw and
preprocessed data to show the importance of the preprocessing. In fact, the
experimental results prove that DL models have a better performance for
Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.Comment: 13 pages, 5 tables, 2 figure
A Hybrid Method of Linguistic and Statistical Features for Arabic Sentiment Analysis
تحليل الآراء هي عملية إيجاد تصنيف إيجابي أو سلبي لنص يحتمل احتوائه على آراء. اللغة العربية واحدة من اللغات التي تضخم محتواها بشكل كبير في العقد السابق وخصوصا مع تصاعد وسائل الاتصال الاجتماعي مثل تويتر، فيسبوك وآخرين. دراسات كثيرة عاينت مهمة تحليل الآراء في اللغة العربية باستخدام تقنيات متعددة. أحد أكفأ الطرق المستخدمة في الدراسات السابقة كانت تعود لتقنيات تعلم الآلة وذلك لقدرتها على بناء قاعدة من التعلم من الحالات السابقة. مع ذلك هنالك قضايا كثيرة ممكن أن تواجه تقنيات تعلم الآلة في مهمة تحليل الرأي. واحدة من هذه القضايا هي كيفية إيجاد خصائص دقيقة في اللغة العربية التي بدورها ممكن أن تساعد على التفريق بين الآراء السلبية والإيجابية. هذه الدراسة تهدف الى اقتراح خليط من الادوات اللغوية والاحصائية في سبيل الحصول على خصائص مميزة لتحليل الرأي في اللغة العربية. الأدوات اللغوية تحتوي على تقنيات إرجاع الكلمة لأصلها وتصنيف الكلمات بالنسبة لنوعها النحوي، بينما الادوات الاحصائية تحتوي على تقنيات إيجاد أكثر الكلمات ترددا. تمت التجاربباستخدام قاعدة بيانات لآراء باللغة العربية . بالإضافة الى ذلك، تم استخدام ثلاث أنواع من تقنيات تعلم الآلة وهم (اس في ام)، (كي ان ان) و (ام اي). النتائج أظهرت بأن الـ (اس في ام) تفوقت على الطرق الأخرى باستخدام الخصائص المقترحة وذلك بحصولها على دقة تساوي 72.15 بالمئة. تشير هذه النتائج الى فائدة استخدام الـ (اس في ام) مع الخصائص المقترحة في تصنيف الآراء باللغة العربية. Sentiment analysis refers to the task of identifying polarity of positive and negative for particular text that yield an opinion. Arabic language has been expanded dramatically in the last decade especially with the emergence of social websites (e.g. Twitter, Facebook, etc.). Several studies addressed sentiment analysis for Arabic language using various techniques. The most efficient techniques according to the literature were the machine learning due to their capabilities to build a training model. Yet, there is still issues facing the Arabic sentiment analysis using machine learning techniques. Such issues are related to employing robust features that have the ability to discriminate the polarity of sentiments. This paper proposes a hybrid method of linguistic and statistical features along with classification methods for Arabic sentiment analysis. Linguistic features contains stemming and POS tagging, while statistical contains the TF-IDF. A benchmark dataset of Arabic tweets have been used in the experiments. In addition, three classifiers have been utilized including SVM, KNN and ME. Results showed that SVM has outperformed the other classifiers by obtaining an f-score of 72.15%. This indicates the usefulness of using SVM with the proposed hybrid features
Customer sentiment analysis for Arabic social media using a novel ensemble machine learning approach
Arabic’s complex morphology, orthography, and dialects make sentiment analysis difficult. This activity makes it harder to extract text attributes from short conversations to evaluate tone. Analyzing and judging a person’s emotional state is complex. Due to these issues, interpreting sentiments accurately and identifying polarity may take much work. Sentiment analysis extracts subjective information from text. This research evaluates machine learning (ML) techniques for understanding Arabic emotions. Sentiment analysis (SA) uses a support vector machine (SVM), Adaboost classifier (AC), maximum entropy (ME), k-nearest neighbors (KNN), decision tree (DT), random forest (RF), logistic regression (LR), and naive Bayes (NB). A model for the ensemble-based sentiment was developed. Ensemble classifiers (ECs) with 10-fold cross-validation out-performed other machine learning classifiers in accuracy (A), specificity (S), precision (P), F1 score (FS), and sensitivity (S).
A review of sentiment analysis research in Arabic language
Sentiment analysis is a task of natural language processing which has
recently attracted increasing attention. However, sentiment analysis research
has mainly been carried out for the English language. Although Arabic is
ramping up as one of the most used languages on the Internet, only a few
studies have focused on Arabic sentiment analysis so far. In this paper, we
carry out an in-depth qualitative study of the most important research works in
this context by presenting limits and strengths of existing approaches. In
particular, we survey both approaches that leverage machine translation or
transfer learning to adapt English resources to Arabic and approaches that stem
directly from the Arabic language
Arabic sentiment analysis using GCL-based architectures and a customized regularization function
Sentiment analysis aims to extract emotions from textual data; with the proliferation of various social media platforms and the flow of data, particularly in the Arabic language, significant challenges have arisen, necessitating the development of various frameworks to handle issues. In this paper, we firstly design an architecture called Gated Convolution Long (GCL) to perform Arabic Sentiment Analysis. GCL can overcome difficulties with lengthy sequence training samples, extracting the optimal features that help improve Arabic sentiment analysis performance for binary and multiple classifications. The proposed method trains and tests in various Arabic datasets; The results are better than the baselines in all cases. GCL includes a Custom Regularization Function (CRF), which improves the performance and optimizes the validation loss. We carry out an ablation study and investigate the effect of removing CRF. CRF is shown to make a difference of up to 5.10% (2C) and 4.12% (3C). Furthermore, we study the relationship between Modern Standard Arabic and five Arabic dialects via a cross-dialect training study. Finally, we apply GCL through standard regularization (GCL+L1, GCL+L2, and GCL+LElasticNet) and our Lnew on two big Arabic sentiment datasets; GCL+Lnew gave the highest results (92.53%) with less performance time
Sentiment analysis in SemEval: a review of sentiment identification approaches
ocial media platforms are becoming the foundations of social interactions including messaging and opinion expression. In this regard, sentiment analysis techniques focus on providing solutions to ensure the retrieval and analysis of generated data including sentiments, emotions, and discussed topics. International competitions such as the International Workshop on Semantic Evaluation (SemEval) have attracted many researchers and practitioners with a special research interest in building sentiment analysis systems. In our work, we study top-ranking systems for each SemEval edition during the 2013-2021 period, a total of 658 teams participated in these editions with increasing interest over years. We analyze the proposed systems marking the evolution of research trends with a focus on the main components of sentiment analysis systems including data acquisition, preprocessing, and classification. Our study shows an active use of preprocessing techniques, an evolution of features engineering and word representation from lexicon-based approaches to word embeddings, and the dominance of neural networks and transformers over the classification phasefostering the use of ready-to-use models. Moreover, we provide researchers with insights based on experimented systems which will allow rapid prototyping of new systems and help practitioners build for future SemEval editions
Bridging the Domain Gap for Stance Detection for the Zulu language
Misinformation has become a major concern in recent last years given its
spread across our information sources. In the past years, many NLP tasks have
been introduced in this area, with some systems reaching good results on
English language datasets. Existing AI based approaches for fighting
misinformation in literature suggest automatic stance detection as an integral
first step to success. Our paper aims at utilizing this progress made for
English to transfers that knowledge into other languages, which is a
non-trivial task due to the domain gap between English and the target
languages. We propose a black-box non-intrusive method that utilizes techniques
from Domain Adaptation to reduce the domain gap, without requiring any human
expertise in the target language, by leveraging low-quality data in both a
supervised and unsupervised manner. This allows us to rapidly achieve similar
results for stance detection for the Zulu language, the target language in this
work, as are found for English. We also provide a stance detection dataset in
the Zulu language. Our experimental results show that by leveraging English
datasets and machine translation we can increase performances on both English
data along with other languages.Comment: accepted to Intellisy
ArAutoSenti: Automatic annotation and new tendencies for sentiment classification of Arabic messages
The file attached to this record is the author's final peer reviewed version.A corpus-based sentiment analysis approach for messages written in Arabic and its dialects is presented and implemented. The originality of this approach resides in the automation construction of the annotated sentiment corpus, which relies mainly on a sentiment lexicon that is also constructed automatically. For the classification step, shallow and deep classifiers are used with features being extracted applying word embedding models. For the validation of the constructed corpus, we proceed with a manual reviewing and it was found that 85.17% were correctly annotated. This approach is applied on the under-resourced Algerian dialect and the approach is tested on two external test corpora presented in the literature. The obtained results are very
encouraging with an F1-score that is up to 88% (on the first test corpus) and up to 81% (on the second test corpus). These results respectively represent a 20% and a 6% improvement, respectively, when compared with existing work in the research literature
- …