4 research outputs found
Perbandingan Metode Support Vector Machine Dengan Metode Lexicon Dalam Analisis Sentimen Bahasa Indonesia
Berbagi informasi di era saat ini semakin mudah dengan adanya media sosial, salah satu yang populer adalah twitter. Media ini memang didesain khusus untuk mengutarakan pendapat dan mengekspresikan perasaan seseorang dengan jumlah karakter yang terbatas yaitu 280 karakter. Unggahan-unggahan dalam media tersebut memiliki isi yang menggambarkan permasalahan/perasaan seseorang yang mengandung pengetahuan yang tersembunyi. Oleh karena itu untuk mengetahui makna dari kalimat-kalimat tersebut harus dilakukan analisis sentimen. Terdapat beberapa metode yang dapat digunakan untuk menganalisis sentimen suatu kalimat seperti pendekatan lexicon based dan pendekatan knowledge based dengan algoritma Support Vector Machine (SVM). Kedua metode tersebut memiliki prinsip kerja yang berbeda dalam melakukan sentimen analisis. Penelitian ini membandingkan kemampuan dari pendekatan metode SVM dan lexicon based menggunakan dataset sentimen berbahasa Indonesia. Dari beberapa skenario percobaan terhadap 4000 dataset, didapatkan bahwa metode SVM lebih diunggulkan dalam mengklasifikasikan sentimen positif dan negatif dengan akurasi sebesar 98,5% dengan parameter ekstraksi fitur unigram dengan rasio dataset 80:20. Pendekatan berbasis lexicon based kurang baik dalam melakukan analisis sentimen dengan akurasi tertinggi sebesar 78,43%. Hal tersebut dikarenakan minimnya kamus kata positif yang jumlahnya adalah setengah dari jumlah kamus kata negatif, sehingga kata yang bernilai positif tidak dapat dikenali dengan baik. Kamus kata yang memiliki nilai pada setiap kata lebih akurat dalam menganalisis sentimen dibandingkan dengan kamus kata yang tidak memiliki nilai/skor
An Improved Sentiment Classification Approach for Measuring User Satisfaction toward Governmental Services’ Mobile Apps Using Machine Learning Methods with Feature Engineering and SMOTE Technique
Analyzing the sentiment of Arabic texts is still a big research challenge due to the special characteristics and complexity of the Arabic language. Few studies have been conducted on Arabic sentiment analysis (ASA) compared to English or other Latin languages. In addition, most of the existing studies on ASA analyzed datasets collected from Twitter. However, little attention was given to the huge amounts of reviews for governmental or commercial mobile applications on Google Play or the App Store. For instance, the government of Saudi Arabia developed several mobile applications in healthcare, education, and other sectors as a response to the COVID-19 pandemic. To address this gap, this paper aims to analyze the users’ opinions of six applications in the healthcare sector. An improved sentiment classification approach was proposed for measuring user satisfaction toward governmental services’ mobile apps using machine learning models with different preprocessing methods. The Arb-AppsReview dataset was collected from the reviews of these six mobile applications available on Google Play and the App Store, which includes 51k reviews. Then, several feature engineering approaches were applied, which include Bing Liu lexicon, AFINN, and MPQA Subjectivity Lexicon, bag of words (BoW), term frequency-inverse document frequency (TF-IDF), and the Google pre-trained Word2Vec. Additionally, the SMOTE technique was applied as a balancing technique on this dataset. Then, five ML models were applied to classify the sentiment opinions. The experimental results showed that the highest accuracy score (94.38%) was obtained by applying a support vector machine (SVM) using the SMOTE technique with all concatenated features
Recommended from our members
Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach
Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text