20 research outputs found

    SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

    Full text link
    Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (a Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.Comment: To appear in the 9th International Conference on Brain Inspired Cognitive Systems (BICS 2018

    Developing resources for sentiment analysis of informal Arabic text in social media

    Get PDF
    Natural Language Processing (NLP) applications such as text categorization, machine translation, sentiment analysis, etc., need annotated corpora and lexicons to check quality and performance. This paper describes the development of resources for sentiment analysis specifically for Arabic text in social media. A distinctive feature of the corpora and lexicons developed are that they are determined from informal Arabic that does not conform to grammatical or spelling standards. We refer to Arabic social media content of this sort as Dialectal Arabic (DA) - informal Arabic originating from and potentially mixing a range of different individual dialects. The paper describes the process adopted for developing corpora and sentiment lexicons for sentiment analysis within different social media and their resulting characteristics. The addition to providing useful NLP data sets for Dialectal Arabic the work also contributes to understanding the approach to developing corpora and lexicons

    Comparative Evaluation of Sentiment Analysis Methods Across Arabic Dialects

    Get PDF
    Sentiment analysis in Arabic is challenging due to the complex morphology of the language. The task becomes more challenging when considering Twitter data that contain significant amounts of noise such as the use of Arabizi, code-switching and different dialects that varies significantly across the Arab world, the use of non-Textual objects to express sentiments, and the frequent occurrence of misspellings and grammatical mistakes. Modeling sentiment in Twitter should become easier when we understand the characteristics of Twitter data and how its usage varies from one Arab region to another. We describe our effort to create the first Multi-Dialect Arabic Sentiment Twitter Dataset (MD-ArSenTD) that is composed of tweets collected from 12 Arab countries, annotated for sentiment and dialect. We use this dataset to analyze tweets collected from Egypt and the United Arab Emirates (UAE), with the aim of discovering distinctive features that may facilitate sentiment analysis. We also perform a comparative evaluation of different sentiment models on Egyptian and UAE tweets. These models are based on feature engineering and deep learning, and have already achieved state-of-The-Art accuracies in English sentiment analysis. Results indicate the superior performance of deep learning models, the importance of morphological features in Arabic NLP, and that handling dialectal Arabic leads to different outcomes depending on the country from which the tweets are collected.This work was made possible by NPRP 6-716-1-138 grant from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu

    Evaluating Lexical Similarity to build Sentiment Similarity

    Get PDF
    International audienceIn this article, we propose to evaluate the lexical similarity information provided by word representations against several opinion resourcesusing traditional Information Retrieval tools. Word representation have been used to build and to extend opinion resources such aslexicon, and ontology and their performance have been evaluated on sentiment analysis tasks. We question this method by measuring thecorrelation between the sentiment proximity provided by opinion resources and the semantic similarity provided by word representationsusing different correlation coefficients. We also compare the neighbors found in word representations and list of similar opinion words.Our results show that the proximity of words in state-of-the-art word representations is not very effective to build sentiment similarity

    Arabic Opinion Mining Using a Hybrid Recommender System Approach

    Full text link
    Recommender systems nowadays are playing an important role in the delivery of services and information to users. Sentiment analysis (also known as opinion mining) is the process of determining the attitude of textual opinions, whether they are positive, negative or neutral. Data sparsity is representing a big issue for recommender systems because of the insufficiency of user rating or absence of data about users or items. This research proposed a hybrid approach combining sentiment analysis and recommender systems to tackle the problem of data sparsity problems by predicting the rating of products from users reviews using text mining and NLP techniques. This research focuses especially on Arabic reviews, where the model is evaluated using Opinion Corpus for Arabic (OCA) dataset. Our system was efficient, and it showed a good accuracy of nearly 85 percent in predicting rating from review
    corecore