6 research outputs found
A review on corpus annotation for arabic sentiment analysis
Mining publicly available data for meaning and value is an important
research direction within social media analysis. To automatically analyze
collected textual data, a manual effort is needed for a successful machine learning algorithm to effectively classify text. This pertains to annotating the text adding labels to each data entry. Arabic is one of the languages that are growing rapidly in the research of sentiment analysis, despite limited resources and scares annotated corpora. In this paper, we review the annotation process carried out by those papers. A total of 27 papers were reviewed between the
years of 2010 and 2016
Different valuable tools for Arabic sentiment analysis: a comparative evaluation
Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain
A review of sentiment analysis research in Arabic language
Sentiment analysis is a task of natural language processing which has
recently attracted increasing attention. However, sentiment analysis research
has mainly been carried out for the English language. Although Arabic is
ramping up as one of the most used languages on the Internet, only a few
studies have focused on Arabic sentiment analysis so far. In this paper, we
carry out an in-depth qualitative study of the most important research works in
this context by presenting limits and strengths of existing approaches. In
particular, we survey both approaches that leverage machine translation or
transfer learning to adapt English resources to Arabic and approaches that stem
directly from the Arabic language
Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers
The flexibility in mobile communications allows customers to quickly switch from one service provider to
another, making customer churn one of the most critical challenges for the data and voice telecommunication
service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia
decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses.
Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended
on historical customer data to measure customer churn. However, historical data does not reveal current
customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing
churn rates are inadequate and faced some issues, particularly in the Saudi market.
This research was conducted to realize the relationship between customer satisfaction and customer churn
and how to use social media mining to measure customer satisfaction and predict customer churn.
This research conducted a systematic review to address the churn prediction models problems and their
relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating
structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings
show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic
language itself, its complexity, and lack of resources.
As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies,
comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted
from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a
new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits
the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and
churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in
Saudi telecom companies, which has not been attempted before. Different fields, such as education, have
different features, making applying the proposed model is interesting because it based on text-mining
Recommended from our members
Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach
Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text