13 research outputs found

    Arabic Language Sentiment Analysis on Health Services

    Get PDF
    The social media network phenomenon leads to a massive amount of valuable data that is available online and easy to access. Many users share images, videos, comments, reviews, news and opinions on different social networks sites, with Twitter being one of the most popular ones. Data collected from Twitter is highly unstructured, and extracting useful information from tweets is a challenging task. Twitter has a huge number of Arabic users who mostly post and write their tweets using the Arabic language. While there has been a lot of research on sentiment analysis in English, the amount of researches and datasets in Arabic language is limited. This paper introduces an Arabic language dataset which is about opinions on health services and has been collected from Twitter. The paper will first detail the process of collecting the data from Twitter and also the process of filtering, pre-processing and annotating the Arabic text in order to build a big sentiment analysis dataset in Arabic. Several Machine Learning algorithms (Naive Bayes, Support Vector Machine and Logistic Regression) alongside Deep and Convolutional Neural Networks were utilized in our experiments of sentiment analysis on our health dataset.Comment: Authors accepted version of submission for ASAR 201

    Sentiment Analysis in Karonese Tweet using Machine Learning

    Get PDF
    Recently, many social media users expressed their conditions, ideas, emotions using local languages ​​on social media, for example via tweets or status. Due to the large number of texts, sentiment analysis is used to identify opinions, ideas, or thoughts from social media. Sentiment analysis research has also been widely applied to local languages. Karonese is one of the largest local languages ​​in North Sumatera, Indonesia. Karo society actively use the language in expression on twitter. This study proposes two things: Karonese tweet dataset for classification and analysis of sentiment on Karonese. Several machine learning algorithms are implemented in this research, that is Logistic regression, Naive bayes, K-nearest neighbor, and Support Vector Machine (SVM). Karonese tweets is obtained from timeline twitter based on several keywords and hashtags. Transcribers from ethnic figures helped annotating the Karo tweets into three classes: positive, negative, and neutral. To get the best model, several scenarios were run based on various compositions of training data and test data. The SVM algorithm has highest accuracy, precision, recall, and F-1 scores than others. As the research is a preliminary research of sentiment analysis on Karonese language, there are many feature works to improvement

    ANALISIS SENTIMEN MENGGUNAKAN ARSITEKTUR LONG SHORT-TERM MEMORY (LSTM) TERHADAP FENOMENA CITAYAM FASHION WEEK

    Get PDF
    Analisis sentimen pada teks bertujuan untuk melihat sebuah teks mengandung emosi positif, negatif, atau netral. Hasil analisis dapat digunakan sebagai bahan pertimbangan untuk mengambil keputusan terhadap sebuah isu. Seperti fenomena Citayam Fashion Week yang ramai diperdebatkan di Indonesia, khususnya pada bulan Juli 2022, sangat dibutuhkan analisis sentimen terhadap fenomena tersebut. Dataset yang digunakan berasal dari tweet masyarakat Indonesia dengan kata kunci Citayam Fashion Week. Selanjutnya, setiap tweet akan dilabeli dengan kelas positif, negatif, atau netral berdasarkan leksikal bahasa Indonesia. Penelitian ini menghasilkan model yang dapat digunakan untuk memprediksi setiap tweet bahasa Indonesia ke dalam kategori sentimen positif, negatif, atau netral terkait pandangan dan pendapat masyarakat tentang fenomena Citayam Fashion Week. Metode membangun model yang digunakan, yaitu Long Short Term Memory (LSTM). Akurasi model yang dihasilkan menggunakan LSTM cukup baik, yaitu sebesar 88%

    Improving Sentiment Analysis in Arabic Using Word Representation

    Get PDF
    The complexities of Arabic language in morphology, orthography and dialects makes sentiment analysis for Arabic more challenging. Also, text feature extraction from short messages like tweets, in order to gauge the sentiment, makes this task even more difficult. In recent years, deep neural networks were often employed and showed very good results in sentiment classification and natural language processing applications. Word embedding, or word distributing approach, is a current and powerful tool to capture together the closest words from a contextual text. In this paper, we describe how we construct Word2Vec models from a large Arabic corpus obtained from ten newspapers in different Arab countries. By applying different machine learning algorithms and convolutional neural networks with different text feature selections, we report improved accuracy of sentiment classification (91%-95%) on our publicly available Arabic language health sentiment dataset [1]Comment: Authors accepted version of submission for ASAR 201

    An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter

    Get PDF
    Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words, hence decreases the performance of sentiment analysis models when applied on tweets collected, and c) handling valence shifter words were not thoroughly addressed in Arabic sentiment analysis. Therefore, this study aims to construct a PAL lexicon for Palestinian tweets and to design an Expandable and Up-to-date Lexicon for Arabic (EULA). A new valence shifter rules in enhancing the performance of lexicon-based sentiment analysis on Arabic tweets is also been constructed. In this study, a PAL lexicon is built by using phonology matching algorithm while EULA is constructed by harnessing a general lexicon on a tweets dataset to find new terms and predict its polarity through some linguistic rules. Furthermore, a set of rules are proposed to handle the valence shifters words by applying rules to find the scope of words, and shifting value that is produced by these words. Palestinian and Arabic tweets datasets from March to May 2018 are used to evaluate the proposed idea. Experimental results indicate that the proposed PAL lexicon has produced better results compared to other lexicons when tested on Palestinian dataset. Meanwhile, EULA enhanced the performance of lexicon-based approach to be competitive with machine learning approach. Moreover, applying the proposed valence shifter rules have increased overall performance of 5% on average. The new proposed PAL sentiment lexicon is able to handle Palestinian’s dialects. Furthermore, the EULA has overcome the emergence of new slang words in social media. Moreover, the constructed valence shifter rules are capable to handle negation, intensifiers and contrasts in enhancing the performance of Arabic sentiment analysis

    Sehaa: A big data analytics tool for healthcare symptoms and diseases detection using Twitter, Apache Spark, and Machine Learning

    Get PDF
    Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual aicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics

    A review of sentiment analysis research in Arabic language

    Full text link
    Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

    Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

    Get PDF
    Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset
    corecore