    The Influence of Stemming on Indonesian Tweet Sentiment Analysis

    Stemming has commonly used in some researchabout text mining, information retrieval, and natural languageprocessing. However, there is an indication that stemming does notdeliver significant influence toward accuracy in text classification.Hence, this research attempts to investigate the influence of thestemming process on Indonesian tweet sentiment analysis.Furthermore, this work examines about the difference effectbetween two conditions by involving stemming and withoutinvolving stemming on pre-preprocessing task. The experimentsshow that the accuracy difference for SVM using stemming in preprocessingacquired 0.67% and 1.34% higher than pre-processingwithout stemming, whereas, Naive Bayes obtained 0.23% and1.12%. Finally, this research proves that stemming does not raisethe accuracy either using SVM or Naive Bayes algorith


    To combat the Covid-19 epidemic, the government issues laws governing vaccination implementation. Health Minister Number Ten of 2021 issued the regulation. This program raises advantages and disadvantages, necessitating examination through feedback. The opinions and narratives that individuals share on social media sites like Twitter can be used to get feedback. This work seeks to construct a model to assess public opinion of the Covid-19 Booster Vaccination by using the Lexicon Based technique to identify sentiment on tweet data. Naïve Bayes and logistic regression are the classification techniques employed in this study. The comparison of the two methods' findings reveals that Logistic Regression, with an accuracy of 72%, is superior to Naïve Bayes, which has an accuracy of 70%. There were 607 tweet messages from Twitter that were processed. From January 1 to July 30, 2022, the model was tested for its ability to interpret public opinion on Twitter. The model found that people's attitudes toward the COVID-19 booster shot tended to be favorable. It can be developed by including datasets for additional research. For further research, it can be developed by adding datasets

    Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

    Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Document Frequency (TF-IDF) to build topic models and employed the coherence score method to determine how many different topics there are for each year’s data. We also provided a visualization of the topic interpretation and word distribution for each topic as well as its relevance using word cloud and PyLDAvis. In the future, we expect to add more features to show the relevance and interconnections between each topic to make it even easier for researchers to use this tool in their research projects

    Analysis of Stemming Influence on Indonesian Tweet Classification

    Stemming has been commonly used by some researchers in natural language processing area such as text mining, text classification, and information retrieval. In information retrieval, stemming may help to raise retrieval performance. However, there is an indication that stemming does not hand over significant influence toward the accuracy in text classification. Therefore, this paper analyzes further research about the influence of stemming on tweet classification in Bahasa Indonesia. This work examines about the accuracy result between two conditions by involving stemming and without involving stemming in pre-processing task for tweet classification. The contribution of this research is to find out a better pre-processing task in order to obtain good accuracy in text classification. According to the experiments, it is observed that all accuracy results in tweet classification tend to decrease. Stemming task does not raise the accuracy either using SVM or Naive Bayes algorithm. Therefore, this work summarized that stemming process does not affect significantly towards the accuracy performance


    Keberadaan Twitter telah digunakan secara luas oleh berbagai lapisan masyarakat dalam beberapa tahun terakhir. Kebiasaan masyarakat mem-posting tweet untuk menilai tokoh publik adalah salah satu media yangmerepresentasikan tanggapan masyarakat terhadap tokoh publik. Menjelang pemilihan umum, biasanya ada pihak-pihak tertentu yang ingin mengetahui sentimen dan tanggapan terhadap tokoh publik. Tokoh publik yangdinilai adalah tokoh yang dianggap layak dan memiliki kemampuan untuk dipilih menjadi pemimpin. Oleh karena itu, penelitian ini mencoba menganalisis tweet berbahasa Indonesia yang membicarakan tentang tokohpublik. Analisis dilakukan dengan melakukan klasifikasi tweet yang berisi sentimen masyarakat tentang tokoh tertentu. Metode klasifikasi yang digunakan dalam penelitian ini adalah Naive Bayes Classifier. Naive BayeClassifier dikombinasikan dengan fitur untuk dapat mendeteksi negasi dan pembobotan menggunakan term frequency serta TF-IDF. Klasifikasi tweet pada penelitian ini diperoleh berdasarkan kombinasi antara kelasentimen dan kelas kategori. Klasifikasi sentimen terdiri dari positif dan negatif sedangkan klasifikasi kategori terdiri dari kapabilitas, integritas, dan akseptabilitas. Hasil pengujian pada aplikasi yang dibangun dan padatools RapidMiner memperlihatkan bahwa akurasi dengan term frequency memberikan hasil akurasi yang lebih baik daripada akurasi dengan fitur TF-IDF. Metode Support Vector Machine menghasilkan akurasi performansyang lebih baik daripada metode Naive Bayes baik dalam klasifikasi sentimen maupun dalam klasifikas kategori. Namun demikian, secara keseluruhan penggunaan metode Support Vector Machine dan Naive Baye sama-sama memiliki performansi yang cukup baik untuk melakukan klasifikasi tweet


    Data tweet telah banyak dimanfaatkan dalam penelitian di bidang text mining. Salah satu diantaranya adalah dalam klasifikasi teks. Namun, sebagian besar data tweet merupakan data yang masih kotor dan mengandung banyak noise di dalamnya. Oleh karena itu, pemrosesan awal terhadap tweet sangat penting untuk dilakukan. Salah satu metode pemrosesan awal yang dilakukan untuk mereduksi noise dalam tweet adalah stopword removal. Lebih lanjut penelitian ini akan melakukan perbandingan hasil akurasi antara pemrosesan awal yang melibatkan proses penghapusan stopword dengan permosesan awal yang tanpa melibatkan stopword removal. Hal ini dilakukan untuk mengetahui signifikansi tahapan stopword removal dalam klasifikasi teks berbahasa Indonesia. Dalam penelitian ini, dilakukan dua model pemrosesan awal dimana salah satu proses melibatkan stopword removal dan proses yang lainnya tanpa melakukan stopword removal. Hasil eksperimen menunjukkan bahwa melakukan penghapusan stopword dalam pre-processing mampu meningkatkan performa klasifikasi yang dibuktikan dengan adanya peningkatan akurasi

    Clustering on Twitter: case study Twitter account of higher education institution in Indonesia

    No full text
    Recently, higher education institutions have been using Twitter as one of tools to enhance their communication network. This paper aims to cluster Twitter data retrieved from the official Twitter account of higher education institutions in Indonesia. We expect to obtain a valuable information from the tweet posted. Furthermore, we use Twitter’s hashtag as a basis of clustering. We collect data from n=10 institutions that have an official account on Twitter. The Affinity Propagation algorithm was employed to perform the clustering task. According to the clustering results, we conclude that higher education in Indonesia mostly utilize Twitter to post general information, news, agenda, announcement, information to the new students, and achievement

    Attention-based CNN-BiLSTM for Dialect Identification on Javanese Text

    This study proposes a hybrid deep learning models called attention-based CNN-BiLSTM (ACBiL) for dialect identification on Javanese text. Our ACBiL model comprises of input layer, convolution layer, max pooling layer, batch normalization layer, bidirectional LSTM layer, attention layer, fully connected layer and softmax layer. In the attention layer, we applied a hierarchical attention networks using word and sentence level attention to observe the level of importance from the content. As comparison, we also experimented with other several classical machine learning and deep learning approaches. Among the classical machine learning, the Linear Regression with unigram achieved the best performance with average accuracy of 0.9647. In addition, our observation with the deep learning models outperformed the traditional machine learning models significantly. Our experiments showed that the ACBiL architecture achieved the best performance among the other deep learning methods with the accuracy of 0.9944