17,696 research outputs found

    AT-ODTSA: a Dataset of Arabic Tweets for Open Domain Targeted Sentiment Analysis

    Get PDF
    In the field of sentiment analysis, most of research has conducted experiments on datasets collected from Twitter for manipulating a specific language. Little number of datasets has been collected for detecting sentiments expressed in Arabic tweets. Moreover, very limited number of such datasets is suitable for conducting recent research directions such as target dependent sentiment analysis and open-domain targeted sentiment analysis. Thereby, there is a dire need for reliable datasets that are specifically acquired for open-domain targeted sentiment analysis with Arabic language. Therefore, in this paper, we introduce AT-ODTSA, a dataset of Arabic Tweets for Open-Domain Targeted Sentiment Analysis, which includes Arabic tweets along with labels that specify targets (topics) and sentiments (opinions) expressed in the collected tweets. To the best of our knowledge, our work presents the first dataset that manually annotated for applying Arabic open-domain targeted sentiment analysis. We also present a detailed statistical analysis of the dataset. The AT-ODTSA dataset is suitable for train numerous machine learning models such as a deep learning-based model

    Improving Sentiment Analysis in Arabic Using Word Representation

    Get PDF
    The complexities of Arabic language in morphology, orthography and dialects makes sentiment analysis for Arabic more challenging. Also, text feature extraction from short messages like tweets, in order to gauge the sentiment, makes this task even more difficult. In recent years, deep neural networks were often employed and showed very good results in sentiment classification and natural language processing applications. Word embedding, or word distributing approach, is a current and powerful tool to capture together the closest words from a contextual text. In this paper, we describe how we construct Word2Vec models from a large Arabic corpus obtained from ten newspapers in different Arab countries. By applying different machine learning algorithms and convolutional neural networks with different text feature selections, we report improved accuracy of sentiment classification (91%-95%) on our publicly available Arabic language health sentiment dataset [1]Comment: Authors accepted version of submission for ASAR 201

    Comparative Evaluation of Sentiment Analysis Methods Across Arabic Dialects

    Get PDF
    Sentiment analysis in Arabic is challenging due to the complex morphology of the language. The task becomes more challenging when considering Twitter data that contain significant amounts of noise such as the use of Arabizi, code-switching and different dialects that varies significantly across the Arab world, the use of non-Textual objects to express sentiments, and the frequent occurrence of misspellings and grammatical mistakes. Modeling sentiment in Twitter should become easier when we understand the characteristics of Twitter data and how its usage varies from one Arab region to another. We describe our effort to create the first Multi-Dialect Arabic Sentiment Twitter Dataset (MD-ArSenTD) that is composed of tweets collected from 12 Arab countries, annotated for sentiment and dialect. We use this dataset to analyze tweets collected from Egypt and the United Arab Emirates (UAE), with the aim of discovering distinctive features that may facilitate sentiment analysis. We also perform a comparative evaluation of different sentiment models on Egyptian and UAE tweets. These models are based on feature engineering and deep learning, and have already achieved state-of-The-Art accuracies in English sentiment analysis. Results indicate the superior performance of deep learning models, the importance of morphological features in Arabic NLP, and that handling dialectal Arabic leads to different outcomes depending on the country from which the tweets are collected.This work was made possible by NPRP 6-716-1-138 grant from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu

    Using Deep Learning Networks to Predict Telecom Company Customer Satisfaction Based on Arabic Tweets

    Get PDF
    Information systems are transforming businesses, which are using modern technologies towards new business models based on digital solutions, which ultimately lead to the design of novel socio-economic systems. Sentiment analysis is, in this context, a thriving research area. This paper is a case study of Saudi telecommunications (telecom) companies, using sentiment analysis for customer satisfaction based on a corpus of Arabic tweets. This paper compares, for the first time for Saudi social media in telecommunication, the most popular machine learning approach, support vector machine (SVM), with two deep learning approaches: long short-term memory (LSTM) and gated recurrent unit (GRU). This study used LSTM and GRU with two different implementations, adding attention mechanism and character encoding. The study concluded that the bidirectional-GRU with attention mechanism achieved a better performance in the telecommunication domain and allowed detection of customer satisfaction in the telecommunication domain with high accuracy

    A semi-supervised approach for sentiment analysis of arab (ic+ izi) messages: Application to the algerian dialect

    Get PDF
    In this paper, we propose a semi-supervised approach for sentiment analysis of Arabic and its dialects. This approach is based on a sentiment corpus, constructed automatically and reviewed manually by Algerian dialect native speakers. This approach consists of constructing and applying a set of deep learning algorithms to classify the sentiment of Arabic messages as positive or negative. It was applied on Facebook messages written in Modern Standard Arabic (MSA) as well as in Algerian dialect (DALG, which is a low resourced-dialect, spoken by more than 40 million people) with both scripts Arabic and Arabizi. To handle Arabizi, we consider both options: transliteration (largely used in the research literature for handling Arabizi) and translation (never used in the research literature for handling Arabizi). For highlighting the effectiveness of a semi-supervised approach, we carried out different experiments using both corpora for the training (i.e. the corpus constructed automatically and the one that was reviewed manually). The experiments were done on many test corpora dedicated to MSA/DALG, which were proposed and evaluated in the research literature. Both classifiers are used, shallow and deep learning classifiers such as Random Forest (RF), Logistic Regression(LR) Convolutional Neural Network (CNN) and Long short-term memory (LSTM). These classifiers are combined with word embedding models such as Word2vec and fastText that were used for sentiment classification. Experimental results (F1 score up to 95% for intrinsic experiments and up to 89% for extrinsic experiments) showed that the proposed system outperforms the existing state-of-the-art methodologies (the best improvement is up to 25%)
    • …
    corecore