    Sentiment Analysis Classification Of Political Parties On Twitter Using Gated Recurrent Unit Algorithm And Natural Language Processing

    General elections cannot be separated from the issue of political parties. The issue can be in the form of surveys to sentiment. The results of the current survey need to be done in-depth validation related to the truth. Sentiment analysis aims to validate the truth of the survey institution. There are 5 political parties used as datasets in this study, namely Partai Demokrasi Indonesia Perjuangan Party (PDIP), Gerakan Indonesia Raya Party (Gerindra), Golongan Karya Party (Golkar), Partai Kebangkitan Bangsa Party (PKB), and Nasional Demokrat Party (Nasdem). The Gated Recurrent Unit (GRU) algorithm is implemented in this research as an experiment in data calculation. Based on the results of the GRU algorithm calculation in calculating sentiment on political parties, it produces the highest data at 56.50% accuracy, 72.76% precision, and 100% recal

    Pembentukan Dataset Token Sentimen Berdasarkan Akun Instagram Brand Elektronik Menggunakan K-Nearest Neighbors

    Abstract. Generating Sentiment Token Dataset Based on Electronics Brand Instagram Account using K-Nearest Neighbors. Instagram is currently one of the most popular social media platforms for businesses and brand owners to promote their products. Because Instagram is a two-way communication platform, people can respond to any promotional content posted on Instagram. People's reactions come in a variety of form, and frequently include both positive and negative sentiment. This study aims to identify the words used in one type of sentiment, then use the K-NN approach to construct a token dataset by summarizing the phrases in many labels according to the sentiment type. The total accuracy value of the dataset for K = 1 is 33.38% (positive), 59.96% (negative), and 56.60% (neutral) based on the results of the tests performed.Keywords: sentiment analysis, K-Nearest Neighbors, dataset, InstagramAbstrak. Instagram saat ini menjadi salah satu media sosial yang banyak digunakan oleh perusahaan atau pemilik brand untuk melakukan promosi terhadap produk-produk yang dimilikinya. Karena bersifat dua arah, masyarakat dapat memberikan respon terhadap aktivitas promosi yang dilakukan oleh sebuah perusahaan melalui Instagram. Respon dari masyarakat memiliki varian yang beragam dan seringkali mengandung unsur sentimen baik positif maupun negatif. Penelitian ini mencoba untuk mengidentifikasi kata-kata yang digunakan dalam satu jenis sentimen, kemudian membuat dataset token dengan cara merangkum kata-kata tersebut dalam beberapa label sesuai jenis sentimen masing-masing menggunakan metode K-NN. Berdasarkan hasil pengujian yang dilakukan, didapatkan nilai akurasi dari dataset sebesar 33.38% (positif), 59.96% (negatif), dan 56.60% (netral) untuk K = 1.Kata Kunci: analisis sentimen, K-Nearest Neighbors, dataset, Instagra

    Adapting Machine Learning Techniques for Developing Automatic Q&A Interaction Module for Translation Robots based on NLP

    Research on Automatic Q&A Interaction Module of Computer-based Translation Robot is a study that focuses on developing an automatic question and answer (Q&A) interaction module for computer-based translation robots. The goal of the research is to enhance the capability of translation robots to perform more human-like interactions with users, particularly in terms of providing more efficient and accurate translations. In this paper proposed a Conditional Random Field Discriminative Analysis (CRFDA) for feature extraction to derive translation robot with Q&A. The proposed CRFDA model comprises of the discriminative analysis for the CRF model. The estimation CRF model uses the bi-directional classifier for the estimation of the feature vector. Finally, the classification is performed with the voting-based classification model for feature extraction. The performance of the CRFDA model is examined based on the Name Entity (Nes) in the TempVal1 &2 dataset. The extraction is based on the strict and relaxed feature model for the exact match and slight variation. The simulation analysis expressed that proposed CRFDA model achieves a classification accuracy of 91% which is significantly higher than the state-of-art techniques

    Sentiment analysis of Arabic social media texts: A machine learning approach to deciphering customer perceptions

    entiment analysis (SA) is a subfield of artificial intelligence that entails natural language processing. This has become increasingly significant because it discerns the emotional tone of reviews, categorising them as positive, neutral, or negative. In the highly competitive coffee industry, understanding customer sentiment and perception is paramount for businesses seeking to optimise their product offerings. Traditional methods of market analysis often fall short of capturing the nuanced views of consumers, necessitating a more sophisticated approach to sentiment analysis. This research is motivated by the need for a nuanced understanding of customer sentiments across various coffee products, enabling companies to make informed decisions regarding product promotion, improvement, and discontinuation. However, sentiment analysis faces a challenge when it comes to analysing Arabic text due to the language's extraordinarily complex inflectional and derivational morphology. Consequently, to address this challenge, we have developed a new method designed to improve the precision and effectiveness of Arabic sentiment analysis, specifically focusing on understanding customer opinions about various coffee products on social media platforms like Twitter. We gathered 10,646 various coffee products' Twitter reviews and applied feature extraction techniques using the term frequency-inverse document frequency (TF-IDF) and minimum redundancy maximum relevance (MRMR). Subsequently, we performed sentiment analysis using four supervised learning algorithms: k-nearest neighbor, support vector machine, decision tree, and random forest. All the classification statements derived in the analysis were aggregated via ensemble learning to convey the final results. Our results demonstrated an increase in prediction accuracy, with our method achieving over 95.95% accuracy in the Hard voting and soft voting at 94.51 %

    Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using Bert Language Models

    General election is one of the crucial moments for a democratic country, e.g., Indonesia. Good election preparation can increase people's participation in the general election. In this study, we conduct a sentiment analysis of Indonesian public opinion on the upcoming 2024 election using Twitter data and IndoBERT model. This study is aimed at helping the government and related institutions to understand public perception. Therefore, they could obtain valuable insights to better prepare for elections, including evaluating the election policies, developing campaign strategies, increasing voter engagement, addressing issues and conflicts, and increasing transparency and public trust. The main contribution of this study is threefold: (i) the application of state-of-the-art transformer-based model IndoBERT for sentiment analysis on political domain; (ii) the empirical evaluation of IndoBERT model against machine learning and lexicon-based models; and (iii) the new dataset creation for sentiment analysis in political domain. Our Twitter data shows that Indonesian public mostly reacts neutrally (83.7%) towards the upcoming 2024 election. Then, the experimental results demonstrate that IndoBERT large-p1 is the best-performing model that achieves an accuracy of 83.5%. It improves our baseline systems by 48.5% and 46.49% for TextBlob, 2.5% and 14.49% for Multinomial Naïve Bayes, and 3.5% and 13.49% for Support Vector Machine in terms of accuracy and F-1 score, respectively

    Mining Geotagged Tweets: Tracking Spatiotemporal Variation of Mental Health in Canada during COVID-19 Pandemic

    This thesis explores and analyzes the evolution of the pandemic as a stressor on mental health in Canada through monitoring the sentiment polarity dynamics, emotion trends, and changes in keywords being discussed on Twitter, spanning between January 2020 to December 2022. Leveraging the surging amount of geotagged social media data, this study deploys a combination of machine learning, geospatial mapping, and social sensing as a new approach to observe, quantify and evaluate the evolution of national-wide emotion trends and psychological status along the COVID-19 pandemic timeline in Canada, interpret the underlying key factors and events, and thus inform us on how to mentally “re-start” in the post-pandemic era. The proposed methods include social sensing, large-scale sentiment polarity detection, emotion classification, keyword analysis, and kernel density mapping. The dataset after processing is consisting of 430,399 geo-tagged tweets discussing pandemic subjects posted by Canadian users from January 1, 2020 to December 31, 2022. The results of this study reveal that the overall sentiment and emotion composition was the most optimistic during the early half of the pandemic, from the early spring of 2020 to the summer of 2021, and turned to decline from then to the end of 2022, sending a warning signal in public mental well-being. Beneath this trend, several driving events emerged, ranging from the declaration of state of emergency in March 2020, the peak of vaccine hesitancy in November 2020, the release of new vaccine mandate in January 2022 to the Freedom Convey lasting from January 2022 to February 2022. The results also indicate that there is an observable geospatial disparity in the shifting patterns and the overall mental health levels between Montréal, a French-dominant region, and Vancouver, Calgary, Edmonton, Toronto, and Ottawa-Gatineau, which are English-dominant or bilingual regions. Also, along with a delayed period of peaks and bottoms in sentiment polarity, Toronto is displaying a slightly different mood than the other English-speaking cities. Last but not the least, we propose two action strategies, promoting education on the importance of vaccine behaviours and rebalancing the COVID-19 restrictions, for boosting public confidence regarding the pandemic and rebuilding psychological resilience in the current post-pandemic era. As the first work tracking the long-term mental health of Canada as a country during the pandemic, this study evidences the conclusion that as the global economy starts to recover and the number of cases becomes gradually under control with the availability of the vaccine, the public psychological condition is not lifting as fast as the economy and the physical health in today’s post-pandemic world

    Fuzzy natural language similarity measures through computing with words

    A vibrant area of research is the understanding of human language by machines to engage in conversation with humans to achieve set goals. Human language is naturally fuzzy by nature, with words meaning different things to different people, depending on the context. Fuzzy words are words with a subjective meaning, typically used in everyday human natural language dialogue and often ambiguous and vague in meaning and dependent on an individual’s perception. Fuzzy Sentence Similarity Measures (FSSM) are algorithms that can compare two or more short texts which contain fuzzy words and return a numeric measure of similarity of meaning between them. The motivation for this research is to create a new FSSM called FUSE (FUzzy Similarity mEasure). FUSE is an ontology-based similarity measure that uses Interval Type-2 Fuzzy Sets to model relationships between categories of human perception-based words. Four versions of FUSE (FUSE_1.0 – FUSE_4.0) have been developed, investigating the presence of linguistic hedges, the expansion of fuzzy categories and their use in natural language, incorporating logical operators such as ‘not’ and the introduction of the fuzzy influence factor. FUSE has been compared to several state-of-the-art, traditional semantic similarity measures (SSM’s) which do not consider the presence of fuzzy words. FUSE has also been compared to the only published FSSM, FAST (Fuzzy Algorithm for Similarity Testing), which has a limited dictionary of fuzzy words and uses Type-1 Fuzzy Sets to model relationships between categories of human perception-based words. Results have shown FUSE is able to improve on the limitations of traditional SSM’s and the FAST algorithm by achieving a higher correlation with the average human rating (AHR) compared to traditional SSM’s and FAST using several published and gold-standard datasets. To validate FUSE, in the context of a real-world application, versions of the algorithm were incorporated into a simple Question & Answer (Q&A) dialogue system (DS), referred to as FUSION, to evaluate the improvement of natural language understanding. FUSION was tested on two different scenarios using human participants and results compared to a traditional SSM known as STASIS. Results of the DS experiments showed a True rating of 88.65% compared to STASIS with an average True rating of 61.36%. Results showed that the FUSE algorithm can be used within real world applications and evaluation of the DS showed an improvement of natural language understanding, allowing semantic similarity to be calculated more accurately from natural user responses. The key contributions of this work can be summarised as follows: The development of a new methodology to model fuzzy words using Interval Type-2 fuzzy sets; leading to the creation of a fuzzy dictionary for nine fuzzy categories, a useful resource which can be used by other researchers in the field of natural language processing and Computing with Words with other fuzzy applications such as semantic clustering. The development of a FSSM known as FUSE, which was expanded over four versions, investigating the incorporation of linguistic hedges, the expansion of fuzzy categories and their use in natural language, inclusion of logical operators such as ‘not’ and the introduction of the fuzzy influence factor. Integration of the FUSE algorithm into a simple Q&A DS referred to as FUSION demonstrated that FSSM can be used in a real-world practical implementation, therefore making FUSE and its fuzzy dictionary generalisable to other applications