786 research outputs found

    A Hybrid Method of Linguistic and Statistical Features for Arabic Sentiment Analysis

    Get PDF
              تحليل الآراء هي عملية إيجاد تصنيف إيجابي أو سلبي لنص يحتمل احتوائه على آراء. اللغة العربية واحدة من اللغات التي تضخم محتواها بشكل كبير في العقد السابق وخصوصا مع تصاعد وسائل الاتصال الاجتماعي مثل تويتر، فيسبوك وآخرين. دراسات كثيرة عاينت مهمة تحليل الآراء في اللغة العربية باستخدام تقنيات متعددة. أحد أكفأ الطرق المستخدمة في الدراسات السابقة كانت تعود لتقنيات تعلم الآلة وذلك لقدرتها على بناء قاعدة من التعلم من الحالات السابقة. مع ذلك هنالك قضايا كثيرة ممكن أن تواجه تقنيات تعلم الآلة في مهمة تحليل الرأي. واحدة من هذه القضايا هي كيفية إيجاد خصائص دقيقة في اللغة العربية التي بدورها ممكن أن تساعد على التفريق بين الآراء السلبية والإيجابية. هذه الدراسة تهدف الى اقتراح خليط من الادوات اللغوية والاحصائية في سبيل الحصول على خصائص مميزة لتحليل الرأي في اللغة العربية. الأدوات اللغوية تحتوي على تقنيات إرجاع الكلمة لأصلها وتصنيف الكلمات بالنسبة لنوعها النحوي، بينما الادوات الاحصائية تحتوي على تقنيات إيجاد أكثر الكلمات ترددا. تمت التجاربباستخدام قاعدة بيانات لآراء باللغة العربية . بالإضافة الى ذلك، تم استخدام ثلاث أنواع من تقنيات تعلم الآلة وهم (اس في ام)، (كي ان ان) و (ام اي). النتائج أظهرت بأن الـ (اس في ام) تفوقت على الطرق الأخرى باستخدام الخصائص المقترحة وذلك بحصولها على دقة تساوي 72.15 بالمئة. تشير هذه النتائج الى فائدة استخدام الـ (اس في ام) مع الخصائص المقترحة في تصنيف الآراء باللغة العربية.          Sentiment analysis refers to the task of identifying polarity of positive and negative for particular text that yield an opinion. Arabic language has been expanded dramatically in the last decade especially with the emergence of social websites (e.g. Twitter, Facebook, etc.). Several studies addressed sentiment analysis for Arabic language using various techniques. The most efficient techniques according to the literature were the machine learning due to their capabilities to build a training model. Yet, there is still issues facing the Arabic sentiment analysis using machine learning techniques. Such issues are related to employing robust features that have the ability to discriminate the polarity of sentiments. This paper proposes a hybrid method of linguistic and statistical features along with classification methods for Arabic sentiment analysis. Linguistic features contains stemming and POS tagging, while statistical contains the TF-IDF. A benchmark dataset of Arabic tweets have been used in the experiments. In addition, three classifiers have been utilized including SVM, KNN and ME. Results showed that SVM has outperformed the other classifiers by obtaining an f-score of 72.15%. This indicates the usefulness of using SVM with the proposed hybrid features

    Classification of Encouragement (Targhib) And Warning (Tarhib) Using Sentiment Analysis on Classical Arabic

    Get PDF
    The Holy Qur’an is the main religious text of Islam. The Qur’an has its own methods of Targhib (encouragement) and Tarhib (warning), which are important features of the Qur’an. Most of the Quranic verses would urge and encourage people to do right and good deeds, and also warn them from committing evil and bad deeds. The method of classifying a text into two opposing opinions has been applied previously in solving the problem of sentiment analysis. Currently, it is applied in identifying between Targhib (encouragement) and Tarhib (warning) verses in the Qur’an. Each verse of the Qur’an can be treated as either an encouragement, warning or neutral. The language of the Holy Qur’an is one of the most challenging natural languages in sentiment analysis.  The aim of this work is to classify the verses of encouragement and warning using sentiment analysis and NLP techniques. Several approaches are used in the Sentiment Analysis classification, such as the machine learning approach, the lexicon-based approach and the hybrid approach. In carrying out this aim, the applied machine learning approach was used, where the impact of the use of different techniques such as POS tagging, N-Gram and Feature selection with correlation based were evaluated and investigated. 95.6% accuracy was achieved using Naïve Bayes (NB) and 91.5% accuracy was achieved using the Support Vector Machines (SVM). This study is a significant study in extracting information and knowledge from the Holy Qur’an. It is significant for both researchers in the field of Islamic studies as well as non-specialized researchers

    Sentiment Analysis of Spanish Words of Arabic Origin Related to Islam: A Social Network Analysis

    Get PDF
    With the arrival of Muslims in 711 till their expulsion in the 1600s, Arabic language was present in Spain for more than eight centuries. Although social networks have become a valuable resource for mining sentiments, there is no previous research investigating the layman’s sentiment towards Spanish words of Arabic etymology related to Islamic terminology. This study aim at analyzing Spanish words of Arabic origin related to Islam. A random sample of 4586 out of 45860 tweets was used to evaluate general sentiment towards some Spanish words of Arabic origin related to Islam. An expert-predefined Spanish lexicon of around 6800 seed adjectives was used to conduct the analysis. Results indicate a generally positive sentiment towards several Spanish words of Arabic etymology related to Islam. By implementing both a qualitative and quantitative methodology to analyze tweets’ sentiments towards Spanish words of Arabic etymology, this research adds breadth and depth to the debate over Arabic linguistic influence on Spanish vocabulary

    Anti-Russia or anti-Ukraine: How do Twitter users feel about the ongoing conflict between August 2022 and February 2023? A sentiment analysis approach

    Get PDF
    The research presented in this thesis aimed to investigate the shifting sentiment among Twitter users regarding the Ukraine-Russia conflict between August 2022 and February 2023. To comprehend this sentiment variation and public opinion, we travelled back to 1991, the year of the Soviet Union's dissolution, and reviewed literature to gain deeper insights into the Ukraine-Russia relationship. Employing a combination of descriptive analysis techniques, Sentiment Analysis, Topic Modelling, and Machine Learning algorithms such as Logistic Regression, Decision Tree, Naïve Bayes, AdaBoost, and XGBoost, we examined the evolving Anti-Ukraine and Anti-Russia sentiments expressed by Twitter users during the second semester of the conflict. Our findings revealed that, within our datasets, there was a higher prevalence of tweets expressing Anti-Ukraine sentiments than those expressing Anti-Russia sentiments. Notably, the XGBoost model exhibited the most promising performance metrics, achieving an accuracy rate of 90% for the dataset with data from August and September 2022 and 93% accuracy for the dataset with data from February 2023.A investigação apresentada nesta tese teve como objetivo analisar a evolução do sentimento dos utilizadores do Twitter face ao conflito Ucrânia-Rússia entre agosto de 2022 e fevereiro de 2023. Para melhor compreender esta evolução de sentimento e da opinião pública, pesquisámos literatura relativa às relações entre a Ucrânia e a Rússia desde 1991, o ano da dissolução da União Soviética. Utilizando uma combinação de técnicas de análise descritiva, Análise de Sentimento, Topic Modelling e algoritmos de Machine Learning, como Regressão Logística, Árvore de Decisão, Naïve Bayes, AdaBoost e XGBoost, analisámos a evolução dos sentimentos Anti-Ucrânia e Anti-Rússia expressos pelos utilizadores do Twitter durante o segundo semestre do conflito. Concluímos que, dentro dos nossos conjuntos de dados, existe uma maior prevalência de tweets que expressam sentimentos Anti-Ucrânia em comparação com sentimentos Anti-Rússia. O modelo XGBoost apresentou as melhores métricas de performance, com uma taxa de accuracy de 90% para o dataset com dados de agosto e setembro de 2022 e uma taxa de accucary de 93% para o dataset com dados de fevereiro de 2023

    Sentiment Analysis for micro-blogging platforms in Arabic

    Get PDF
    Sentiment Analysis (SA) concerns the automatic extraction and classification of sentiments conveyed in a given text, i.e. labelling a text instance as positive, negative or neutral. SA research has attracted increasing interest in the past few years due to its numerous real-world applications. The recent interest in SA is also fuelled by the growing popularity of social media platforms (e.g. Twitter), as they provide large amounts of freely available and highly subjective content that can be readily crawled. Most previous SA work has focused on English with considerable success. In this work, we focus on studying SA in Arabic, as a less-resourced language. This work reports on a wide set of investigations for SA in Arabic tweets, systematically comparing three existing approaches that have been shown successful in English. Specifically, we report experiments evaluating fully-supervised-based (SL), distantsupervision- based (DS), and machine-translation-based (MT) approaches for SA. The investigations cover training SA models on manually-labelled (i.e. in SL methods) and automatically-labelled (i.e. in DS methods) data-sets. In addition, we explored an MT-based approach that utilises existing off-the-shelf SA systems for English with no need for training data, assessing the impact of translation errors on the performance of SA models, which has not been previously addressed for Arabic tweets. Unlike previous work, we benchmark the trained models against an independent test-set of >3.5k instances collected at different points in time to account for topic-shifts issues in the Twitter stream. Despite the challenging noisy medium of Twitter and the mixture use of Dialectal and Standard forms of Arabic, we show that our SA systems are able to attain performance scores on Arabic tweets that are comparable to the state-of-the-art SA systems for English tweets. The thesis also investigates the role of a wide set of features, including syntactic, semantic, morphological, language-style and Twitter-specific features. We introduce a set of affective-cues/social-signals features that capture information about the presence of contextual cues (e.g. prayers, laughter, etc.) to correlate them with the sentiment conveyed in an instance. Our investigations reveal a generally positive impact for utilising these features for SA in Arabic. Specifically, we show that a rich set of morphological features, which has not been previously used, extracted using a publicly-available morphological analyser for Arabic can significantly improve the performance of SA classifiers. We also demonstrate the usefulness of languageindependent features (e.g. Twitter-specific) for SA. Our feature-sets outperform results reported in previous work on a previously built data-set

    Framework for sentiment analysis of Arabic text

    Get PDF
    corecore