20 research outputs found
SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis
Data annotation is an important but time-consuming and costly procedure. To
sort a text into two classes, the very first thing we need is a good annotation
guideline, establishing what is required to qualify for each class. In the
literature, the difficulties associated with an appropriate data annotation has
been underestimated. In this paper, we present a novel approach to
automatically construct an annotated sentiment corpus for Algerian dialect (a
Maghrebi Arabic dialect). The construction of this corpus is based on an
Algerian sentiment lexicon that is also constructed automatically. The
presented work deals with the two widely used scripts on Arabic social media:
Arabic and Arabizi. The proposed approach automatically constructs a sentiment
corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to
Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi
test sets, respectively. Ongoing work is aimed at integrating transliteration
process for Arabizi messages to further improve the obtained results.Comment: To appear in the 9th International Conference on Brain Inspired
Cognitive Systems (BICS 2018
Developing resources for sentiment analysis of informal Arabic text in social media
Natural Language Processing (NLP) applications such as text categorization, machine translation, sentiment analysis, etc., need annotated corpora and lexicons to check quality and performance. This paper describes the development of resources for sentiment analysis specifically for Arabic text in social media. A distinctive feature of the corpora and lexicons developed are that they are determined from informal Arabic that does not conform to grammatical or spelling standards. We refer to Arabic social media content of this sort as Dialectal Arabic (DA) - informal Arabic originating from and potentially mixing a range of different individual dialects. The paper describes the process adopted for developing corpora and sentiment lexicons for sentiment analysis within different social media and their resulting characteristics. The addition to providing useful NLP data sets for Dialectal Arabic the work also contributes to understanding the approach to developing corpora and lexicons
Comparative Evaluation of Sentiment Analysis Methods Across Arabic Dialects
Sentiment analysis in Arabic is challenging due to the complex morphology of the language. The task becomes more challenging when considering Twitter data that contain significant amounts of noise such as the use of Arabizi, code-switching and different dialects that varies significantly across the Arab world, the use of non-Textual objects to express sentiments, and the frequent occurrence of misspellings and grammatical mistakes. Modeling sentiment in Twitter should become easier when we understand the characteristics of Twitter data and how its usage varies from one Arab region to another. We describe our effort to create the first Multi-Dialect Arabic Sentiment Twitter Dataset (MD-ArSenTD) that is composed of tweets collected from 12 Arab countries, annotated for sentiment and dialect. We use this dataset to analyze tweets collected from Egypt and the United Arab Emirates (UAE), with the aim of discovering distinctive features that may facilitate sentiment analysis. We also perform a comparative evaluation of different sentiment models on Egyptian and UAE tweets. These models are based on feature engineering and deep learning, and have already achieved state-of-The-Art accuracies in English sentiment analysis. Results indicate the superior performance of deep learning models, the importance of morphological features in Arabic NLP, and that handling dialectal Arabic leads to different outcomes depending on the country from which the tweets are collected.This work was made possible by NPRP 6-716-1-138 grant from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu
Evaluating Lexical Similarity to build Sentiment Similarity
International audienceIn this article, we propose to evaluate the lexical similarity information provided by word representations against several opinion resourcesusing traditional Information Retrieval tools. Word representation have been used to build and to extend opinion resources such aslexicon, and ontology and their performance have been evaluated on sentiment analysis tasks. We question this method by measuring thecorrelation between the sentiment proximity provided by opinion resources and the semantic similarity provided by word representationsusing different correlation coefficients. We also compare the neighbors found in word representations and list of similar opinion words.Our results show that the proximity of words in state-of-the-art word representations is not very effective to build sentiment similarity
Arabic Opinion Mining Using a Hybrid Recommender System Approach
Recommender systems nowadays are playing an important role in the delivery of
services and information to users. Sentiment analysis (also known as opinion
mining) is the process of determining the attitude of textual opinions, whether
they are positive, negative or neutral. Data sparsity is representing a big
issue for recommender systems because of the insufficiency of user rating or
absence of data about users or items. This research proposed a hybrid approach
combining sentiment analysis and recommender systems to tackle the problem of
data sparsity problems by predicting the rating of products from users reviews
using text mining and NLP techniques. This research focuses especially on
Arabic reviews, where the model is evaluated using Opinion Corpus for Arabic
(OCA) dataset. Our system was efficient, and it showed a good accuracy of
nearly 85 percent in predicting rating from review