    BBM, or fuel oil, is one of the essential needs of the Indonesian people. The government's policy regarding the increase in fuel prices raises many opinions from the public. Twitter is one of the social media that Indonesian people often use to express opinions on a topic. In this study, sentiment analysis was carried out on public opinion regarding the fuel price increase policy from Twitter social media. This research is expected to help determine public opinion regarding the fuel price increase policy with positive, neutral and negative sentiments. The sentiment analysis method used is the Support Vector Machine (SVM) classification algorithm. The results of the accuracy of SVM were compared with accuracy by adding a feature selection process. The Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) algorithms are used for the feature selection method. After several experiments using the three methods, the SVM method with the Radial Basis Function (RBF) kernel produced the best accuracy of 71.2%. The combination of the SVM method with the RBF and PSO kernels obtained an accuracy of 68.84%, and the combination of the RBF and GA kernel SVM methods obtained an accuracy of 69.52%

    Sentiment Analysis to Measure Celebrity Endorsment’s Effect using Support Vector Machine Algorithm

    Celebrity endorsement is a phenomenon in which companies advertises their products by using celebrity services, and celebrities take advantage of their popularity to promote a brand or product of the company through social media. In this study, KFC did a celebrity endorsement to make their menu more popular. KFC choose to work with Raditya Dika to promote their latest menu, KFC Salted Egg Chicken. This study will examine whether in such cases there is a change in public sentiment towards the product after the celebrity endorsement. It can be done using text mining and sentiment analysis. There are several algorithms that can be used to perform sentiment analysis, one of them is Support Vector Machine. Support Vector Machine (SVM) was chosen because this method is quite accurate in various studies. SVM also takes into account various features of the document, including features that often do not appear on the document, so it can reduce the loss of information from the data. The data used in this research are taken from YouTube and Twitter comment about KFC Salted Egg Chicken. Several step was done in this sentiment analysis research, that are preprocessing text, feature extraction, classification, and evaluation. The result model is tested and evaluated before and after endorsement by looking at the value of accuracy, precision, recall, and f1-measure. The test result of accuracy, precision, recall, and f-measure before endorsement were 67,83%, 69%, 68%, and 66%. After the endorsement, the test results were 74.06%, 74%, 74%, and 74% respectively. The results of this study indicate that SVM has an accurate measurement in sentiment analysis studies. Moreover, this study found that there was not significant change in public sentiment regarding the product before and after the celebrity endorsement


    Religious lectures are activities that are identical to the religious presentation, delivered verbally by a person who has religious knowledge and then delivered to the community with the aim of the knowledge delivered can be understood. Ustadz Abdul Somad was one of the preachers who had been known to various levels of society, but his lectures were not all acceptable to the people who liked or disliked those who came from various positive and negative comments on social media. To solve these problems, Sentiment Analysis was used by applying the Support Vector Machine Algorithm method. The purpose of this study is to compile using the selection of feature Particle Swarm Optimization and Information Gain. The results for Particle Swarm Optimization Selection Feature resulted in Accuracy of 80.57%, Precision of 85.45%, and Recall of 79.52%, Selection Feature Information Gain resulted in Accuracy of 79.78%, Precision of 78.47%, and Recall of 78, 43%, Based on the results of this study, it can be concluded that using the Particle Swarm Optimization selection feature is better at the level of accuracy when compared to using the Information Gain selection feature

    Stock price change prediction using news text mining

    Along with the advent of the Internet as a new way of propagating news in a digital format, came the need to understand and transform this data into information. This work presents a computational framework that aims to predict the changes of stock prices along the day, given the occurrence of news articles related to the companies listed in the Down Jones Index. For this task, an automated process that gathers, cleans, labels, classifies, and simulates investments was developed. This process integrates the existing data mining and text algorithms, with the proposal of new techniques of alignment between news articles and stock prices, pre-processing, and classifier ensemble. The result of experiments in terms of classification measures and the Cumulative Return obtained through investment simulation outperformed the other results found after an extensive review in the related literature. This work also argues that the classification measure of Accuracy and incorrect use of cross validation technique have too few to contribute in terms of investment recommendation for financial market. Altogether, the developed methodology and results contribute with the state of art in this emerging research field, demonstrating that the correct use of text mining techniques is an applicable alternative to predict stock price movements in the financial market.Com o advento da Internet como um meio de propagação de notícias em formato digital, veio a necessidade de entender e transformar esses dados em informação. Este trabalho tem como objetivo apresentar um processo computacional para predição de preços de ações ao longo do dia, dada a ocorrência de notícias relacionadas às companhias listadas no índice Down Jones. Para esta tarefa, um processo automatizado que coleta, limpa, rotula, classifica e simula investimentos foi desenvolvido. Este processo integra algoritmos de mineração de dados e textos já existentes, com novas técnicas de alinhamento entre notícias e preços de ações, pré-processamento, e assembleia de classificadores. Os resultados dos experimentos em termos de medidas de classificação e o retorno acumulado obtido através de simulação de investimentos foram maiores do que outros resultados encontrados após uma extensa revisão da literatura. Este trabalho também discute que a acurácia como medida de classificação, e a incorreta utilização da técnica de validação cruzada, têm muito pouco a contribuir em termos de recomendação de investimentos no mercado financeiro. Ao todo, a metodologia desenvolvida e resultados contribuem com o estado da arte nesta área de pesquisa emergente, demonstrando que o uso correto de técnicas de mineração de dados e texto é uma alternativa aplicável para a predição de movimentos no mercado financeiro

    Evolutionary Multiobjective Feature Selection for Sentiment Analysis

    AuthorSentiment analysis is one of the prominent research areas in data mining and knowledge discovery, which has proven to be an effective technique for monitoring public opinion. The big data era with a high volume of data generated by a variety of sources has provided enhanced opportunities for utilizing sentiment analysis in various domains. In order to take best advantage of the high volume of data for accurate sentiment analysis, it is essential to clean the data before the analysis, as irrelevant or redundant data will hinder extracting valuable information. In this paper, we propose a hybrid feature selection algorithm to improve the performance of sentiment analysis tasks. Our proposed sentiment analysis approach builds a binary classification model based on two feature selection techniques: an entropy-based metric and an evolutionary algorithm. We have performed comprehensive experiments in two different domains using a benchmark dataset, Stanford Sentiment Treebank, and a real-world dataset we have created based on World Health Organization (WHO) public speeches regarding COVID-19. The proposed feature selection model is shown to achieve significant performance improvements in both datasets, increasing classification accuracy for all utilized machine learning and text representation technique combinations. Moreover, it achieves over 70% reduction in feature size, which provides efficiency in computation time and space

    Sentiment Polarity Classification of Comments on Korean News Articles Using Feature Reweighting

    일반적으로 인터넷 신문 기사에 대한 댓글은 그 신문 기사에 대한 주관적인 감정이나 의견을 포함하고 있다. 따라서 이런 신문 기사의 댓글에 대한 감정을 인식하고 분류하는 데에는 그 신문 기사의 원문 내용이 중요한 영향을 미친다. 이런 점에 착안하여 본 논문은 기사의 원문 내용과 감정 사전을 이용하는 가중치 조정 방법을 제안하고, 제안된 가중치 조정 방법을 이용해서 한국어 신문 기사의 댓글에 대한 감정 이진 분류 방법을 제안한다. 가중치 조정 방법에는 다양한 자질 집합이 사용되는데 그것은 댓글에 포함된 감정 단어, 그리고 감정 사전과 뉴스 기사의 본문에 관련된 자질들, 마지막으로 뉴스 기사의 카테고리 정보가 포함되어 있다. 여기서 말하는 감정 사전은 한국어 감정 사전을 의미하며 아직 공개된 것이 없기 때문에, 기존에 있는 영어 감정 사전을 이용하여 구축하였다. 본 논문에서 제안된 감정 이진 분류는 기계 학습을 이용한다. 일반적으로 기계 학습을 위해서는 학습 말뭉치가 필요한데 특별히 감정 분류 문제에서는 긍정 혹은 부정 감정 태그가 부착된 말뭉치가 필요하다. 이 말뭉치의 경우도, 공개된 한국어 감정 말뭉치가 아직 없기 때문에 말뭉치를 직접 구축하였다. 사용된 기계 학습 방법으로는 Na&iumlve Bayes, k-NN, SVM이 있고, 자질 선택 방법으로는 Document Frequency, χ^2 statistic, Information Gain이 있다. 그 결과, 댓글 안에 포함된 감정 단어와 그 댓글에 대한 기사 본문이 감정 분류에 매우 효과적인 자질임을 확인할 수 있었다.Chapter 1 Introduction 1 Chapter 2 Related Works 4 2.1 Sentiment Classification 4 2.2 Feature Weighting in Vector Space Model 5 2.3 Feature Extraction and Selection 7 2.4 Classifiers 10 2.5 Accuracy Measures 14 Chapter 3 Feature Reweighting 16 3.1 Feature extraction in Korean 16 3.2 Feature Reweighting Methods 17 3.3 Examples of Feature Reweighting Methods 18 Chapter 4 Sentiment Polarity Classification System 21 4.1 Model Generation 21 4.2 Sentiment Polarity Classification 23 Chapter 5 Data Preparation 25 5.1 Korean Sentiment Corpus 25 5.2 Korean Sentiment Lexicon 27 Chapter 6 Experiments 29 6.1 Experimental Environment 29 6.2 Experimental Results 30 Chapter 7 Conclusions and Future Works 38 Bibliography 40 Acknowledgments 4

    A Hybrid Method of Linguistic and Statistical Features for Arabic Sentiment Analysis

              تحليل الآراء هي عملية إيجاد تصنيف إيجابي أو سلبي لنص يحتمل احتوائه على آراء. اللغة العربية واحدة من اللغات التي تضخم محتواها بشكل كبير في العقد السابق وخصوصا مع تصاعد وسائل الاتصال الاجتماعي مثل تويتر، فيسبوك وآخرين. دراسات كثيرة عاينت مهمة تحليل الآراء في اللغة العربية باستخدام تقنيات متعددة. أحد أكفأ الطرق المستخدمة في الدراسات السابقة كانت تعود لتقنيات تعلم الآلة وذلك لقدرتها على بناء قاعدة من التعلم من الحالات السابقة. مع ذلك هنالك قضايا كثيرة ممكن أن تواجه تقنيات تعلم الآلة في مهمة تحليل الرأي. واحدة من هذه القضايا هي كيفية إيجاد خصائص دقيقة في اللغة العربية التي بدورها ممكن أن تساعد على التفريق بين الآراء السلبية والإيجابية. هذه الدراسة تهدف الى اقتراح خليط من الادوات اللغوية والاحصائية في سبيل الحصول على خصائص مميزة لتحليل الرأي في اللغة العربية. الأدوات اللغوية تحتوي على تقنيات إرجاع الكلمة لأصلها وتصنيف الكلمات بالنسبة لنوعها النحوي، بينما الادوات الاحصائية تحتوي على تقنيات إيجاد أكثر الكلمات ترددا. تمت التجاربباستخدام قاعدة بيانات لآراء باللغة العربية . بالإضافة الى ذلك، تم استخدام ثلاث أنواع من تقنيات تعلم الآلة وهم (اس في ام)، (كي ان ان) و (ام اي). النتائج أظهرت بأن الـ (اس في ام) تفوقت على الطرق الأخرى باستخدام الخصائص المقترحة وذلك بحصولها على دقة تساوي 72.15 بالمئة. تشير هذه النتائج الى فائدة استخدام الـ (اس في ام) مع الخصائص المقترحة في تصنيف الآراء باللغة العربية.          Sentiment analysis refers to the task of identifying polarity of positive and negative for particular text that yield an opinion. Arabic language has been expanded dramatically in the last decade especially with the emergence of social websites (e.g. Twitter, Facebook, etc.). Several studies addressed sentiment analysis for Arabic language using various techniques. The most efficient techniques according to the literature were the machine learning due to their capabilities to build a training model. Yet, there is still issues facing the Arabic sentiment analysis using machine learning techniques. Such issues are related to employing robust features that have the ability to discriminate the polarity of sentiments. This paper proposes a hybrid method of linguistic and statistical features along with classification methods for Arabic sentiment analysis. Linguistic features contains stemming and POS tagging, while statistical contains the TF-IDF. A benchmark dataset of Arabic tweets have been used in the experiments. In addition, three classifiers have been utilized including SVM, KNN and ME. Results showed that SVM has outperformed the other classifiers by obtaining an f-score of 72.15%. This indicates the usefulness of using SVM with the proposed hybrid features

    Urdu News Content Classification Using Machine Learning Algorithms

    As the world has become a global village, the flow of news in terms of volume and speed increases. It is necessary to engage computing machines for assisting people in dealing with this massive data. The availability of different types of news and such material on the Internet serves as a source of information for billions of users. Millions of people in our subcontinent speak and understand Urdu. There are several classification techniques that are available and are applied to classify English news like political, Education, Medical, etc. Plenty of research work has been done in multiple languages but Urdu is still to be worked on due to a lack of resources. This research evaluates the performance of twelve (12) different Machine learning classifiers for the Urdu News text Classification problem. The analysis was performed on a relatively big and recent collection of Urdu text that contains over 0.15 million (153,050) labeled instances of eight different classes. In addition, after applying pre-processing techniques, the TF-IDF weighting technique was adopted for feature selection and data extraction. After evaluating various machine learning methods, the SVM outperforms the other eleven algorithms with an accuracy of 91.37 %. We also compare its results with other classifiers like linear SVM, Logistic regression, SGD, Naïve bays, ridge regression, and a few others