5 research outputs found
Improving Sentiment Analysis in Arabic Using Word Representation
The complexities of Arabic language in morphology, orthography and dialects
makes sentiment analysis for Arabic more challenging. Also, text feature
extraction from short messages like tweets, in order to gauge the sentiment,
makes this task even more difficult. In recent years, deep neural networks were
often employed and showed very good results in sentiment classification and
natural language processing applications. Word embedding, or word distributing
approach, is a current and powerful tool to capture together the closest words
from a contextual text. In this paper, we describe how we construct Word2Vec
models from a large Arabic corpus obtained from ten newspapers in different
Arab countries. By applying different machine learning algorithms and
convolutional neural networks with different text feature selections, we report
improved accuracy of sentiment classification (91%-95%) on our publicly
available Arabic language health sentiment dataset [1]Comment: Authors accepted version of submission for ASAR 201
Different valuable tools for Arabic sentiment analysis: a comparative evaluation
Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain
Sentiment analysis in tweets
Orientador: Jacques WainerDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Análise do sentimento é um campo de estudo de recente popularização devido ao crescimento da Internet e ao conteúdo gerado por seus usuários. Mais recentemente, as redes sociais surgiram, nessas redes as pessoas publicam suas opiniões em linguagem coloquial e compacta. Isto é o que acontece, por exemplo, no Twitter, uma ferramenta de comunicação que pode ser facilmente utilizada como fonte de informação para várias ferramentas automatizadas de inferência de sentimento. Esforços de pesquisa foram direcionados para lidar com o problema da análise do sentimento nas redes sociais do ponto de vista de um problema de classificação, onde não há consenso sobre qual é o melhor classificador, qual a melhor forma de pré- processamento entre outros. O objetivo desta dissertação é investigar a influência de algumas técnicas de pré-processamento, da técnica TF-IDF, do volume do conjunto de treinamento e de técnicas ensembles na acurácia de alguns classificadores supervisionadosAbstract: Sentiment analysis is a field of study that shows recent popularization due to the growth of Internet and the content that is generated by its users. More recently, social networks have emerged, where people post their opinions in colloquial and compact language. This is what happens in Twitter, a communication tool that can easily be used as a source of information for various automatic tools of sentiment inference. Research efforts have been directed to deal with the problem of sentiment analysis in social networks from the point of view of a classification problem, where there is no consensus about what the best classifier is, and what is the best configuration provided by the feature engineering process. The objective of this dissertation is to investigate the influence of some pre-processing techniques, the TF-IDF technique, the volume of the training set and ensembles techniques in the accuracy of some supervised techniquesMestradoCiência da ComputaçãoMestre em Ciência da Computaçã
Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers
The flexibility in mobile communications allows customers to quickly switch from one service provider to
another, making customer churn one of the most critical challenges for the data and voice telecommunication
service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia
decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses.
Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended
on historical customer data to measure customer churn. However, historical data does not reveal current
customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing
churn rates are inadequate and faced some issues, particularly in the Saudi market.
This research was conducted to realize the relationship between customer satisfaction and customer churn
and how to use social media mining to measure customer satisfaction and predict customer churn.
This research conducted a systematic review to address the churn prediction models problems and their
relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating
structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings
show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic
language itself, its complexity, and lack of resources.
As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies,
comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted
from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a
new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits
the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and
churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in
Saudi telecom companies, which has not been attempted before. Different fields, such as education, have
different features, making applying the proposed model is interesting because it based on text-mining
Contribution à l’amélioration de la recherche d’information par utilisation des méthodes sémantiques: application à la langue arabe
Un système de recherche d’information est un ensemble de programmes et de modules qui sert à interfacer avec l’utilisateur, pour prendre et interpréter une requête, faire la recherche dans l’index et retourner un classement des documents sélectionnés à cet utilisateur. Cependant le plus grand challenge de ce système est qu’il doit faire face au grand volume d’informations multi modales
et multilingues disponibles via les bases documentaires ou le web pour trouver celles qui correspondent au mieux aux besoins des utilisateurs. A travers ce travail, nous avons présenté deux contributions. Dans la première nous avons
proposé une nouvelle approche pour la reformulation des requêtes dans le contexte de la recherche d’information en arabe. Le principe est donc de représenter la requête par un arbre sémantique pondéré pour mieux identifier le besoin d'information de l'utilisateur, dont les nœuds représentent les concepts (synsets) reliés par des relations sémantiques. La construction de cet arbre est réalisée
par la méthode de la Pseudo-Réinjection de la Pertinence combinée à la ressource sémantique du
WordNet Arabe. Les résultats expérimentaux montrent une bonne amélioration dans les
performances du système de recherche d’information. Dans la deuxième contribution, nous avons aussi proposé une nouvelle approche pour la construction d’une collection de test de recherche d’information arabe. L'approche repose sur la combinaison de la méthode de la stratégie de Pooling utilisant les moteurs de recherches et l’algorithme Naïve-Bayes de classification par l’apprentissage automatique. Pour l’expérimentation nous avons créé une nouvelle collection de test composée d’une base documentaire de 632
documents et de 165 requêtes avec leurs jugements de pertinence sous plusieurs topics. L’expérimentation a également montré l’efficacité du classificateur Bayésien pour la récupération de pertinences des documents, encore plus, il a réalisé des bonnes performances
après l’enrichissement sémantique de la base documentaire par le modèle word2vec