7 research outputs found
An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter
Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words, hence decreases the performance of sentiment analysis models when applied on tweets collected, and c) handling valence shifter words were not thoroughly addressed in Arabic sentiment analysis. Therefore, this study aims to construct a PAL lexicon for Palestinian tweets and to design an Expandable and Up-to-date Lexicon for Arabic (EULA). A new valence shifter rules in enhancing the performance of lexicon-based sentiment analysis on Arabic tweets is also been constructed. In this study, a PAL lexicon is built by using phonology matching algorithm while EULA is constructed by harnessing a general lexicon on a tweets dataset to find new terms and predict its polarity through some linguistic rules. Furthermore, a set of rules are proposed to handle the valence shifters words by applying rules to find the scope of words, and shifting value that is produced by these words. Palestinian and Arabic tweets datasets from March to May 2018 are used to evaluate the proposed idea. Experimental results indicate that the proposed PAL lexicon has produced better results compared to other lexicons when tested on Palestinian dataset. Meanwhile, EULA enhanced the performance of lexicon-based approach to be competitive with machine learning approach. Moreover, applying the proposed valence shifter rules have increased overall performance of 5% on average. The new proposed PAL sentiment lexicon is able to handle Palestinian’s dialects. Furthermore, the EULA has overcome the emergence of new slang words in social media. Moreover, the constructed valence shifter rules are capable to handle negation, intensifiers and contrasts in enhancing the performance of Arabic sentiment analysis
MULDASA:Multifactor Lexical Sentiment Analysis of Social-Media Content in Nonstandard Arabic Social Media
The semantically complicated Arabic natural vocabulary, and the shortage of available techniques and skills to capture Arabic emotions from text hinder Arabic sentiment analysis (ASA). Evaluating Arabic idioms that do not follow a conventional linguistic framework, such as contemporary standard Arabic (MSA), complicates an incredibly difficult procedure. Here, we define a novel lexical sentiment analysis approach for studying Arabic language tweets (TTs) from specialized digital media platforms. Many elements comprising emoji, intensifiers, negations, and other nonstandard expressions such as supplications, proverbs, and interjections are incorporated into the MULDASA algorithm to enhance the precision of opinion classifications. Root words in multidialectal sentiment LX are associated with emotions found in the content under study via a simple stemming procedure. Furthermore, a feature–sentiment correlation procedure is incorporated into the proposed technique to exclude viewpoints expressed that seem to be irrelevant to the area of concern. As part of our research into Saudi Arabian employability, we compiled a large sample of TTs in 6 different Arabic dialects. This research shows that this sentiment categorization method is useful, and that using all of the characteristics listed earlier improves the ability to accurately classify people’s feelings. The classification accuracy of the proposed algorithm improved from 83.84% to 89.80%. Our approach also outperformed two existing research projects that employed a lexical approach for the sentiment analysis of Saudi dialect
Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications
Negation and speculation are universal linguistic phenomena that affect the performance of Natural Language Processing (NLP) applications, such as those for opinion mining and information retrieval, especially in biomedical data. In this article, we review the corpora annotated with negation and speculation in various natural languages and domains. Furthermore, we discuss the ongoing research into recent rule-based, supervised, and transfer learning techniques for the detection of negating and speculative content. Many English corpora for various domains are now annotated with negation and speculation; moreover, the availability of annotated corpora in other languages has started to increase. However, this growth is insufficient to address these important phenomena in languages with limited resources. The use of cross-lingual models and translation of the well-known languages are acceptable alternatives. We also highlight the lack of consistent annotation guidelines and the shortcomings of the existing techniques, and suggest alternatives that may speed up progress in this research direction. Adding more syntactic features may alleviate the limitations of the existing techniques, such as cue ambiguity and detecting the discontinuous scopes. In some NLP applications, inclusion of a system that is negation- and speculation-aware improves performance, yet this aspect is still not addressed or considered an essential step
Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers
The flexibility in mobile communications allows customers to quickly switch from one service provider to
another, making customer churn one of the most critical challenges for the data and voice telecommunication
service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia
decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses.
Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended
on historical customer data to measure customer churn. However, historical data does not reveal current
customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing
churn rates are inadequate and faced some issues, particularly in the Saudi market.
This research was conducted to realize the relationship between customer satisfaction and customer churn
and how to use social media mining to measure customer satisfaction and predict customer churn.
This research conducted a systematic review to address the churn prediction models problems and their
relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating
structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings
show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic
language itself, its complexity, and lack of resources.
As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies,
comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted
from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a
new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits
the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and
churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in
Saudi telecom companies, which has not been attempted before. Different fields, such as education, have
different features, making applying the proposed model is interesting because it based on text-mining
Recommended from our members
Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach
Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text