Search CORE

251 research outputs found

SENTIMENT ANALYSIS ON E-SPORTS FOR EDUCATION CURRICULUM USING NAIVE BAYES AND SUPPORT VECTOR MACHINE

Author: Alkhalifi Yuris
Ardianto Rian
Gata Windu
Nugraha Fitra Septia
Rivanie Tri
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/07/2020
Field of study

The development of e-sports education is not just playing games, but about start making, development, marketing, research and other forms education aimed at training skills and providing knowledge in fostering character. The opinions expressed by the public can take form support, criticism and input. Very large volume of comments need to be analyzed accurately in order separate positive and negative sentiments. This research was conducted to measure opinions or separate positive and negative sentiments towards e-sports education, so that valuable information can be sought from social media. Data used in this study was obtained by crawling on social media Twitter. This study uses a classification algorithm, Naïve Bayes and Support Vector Machine. Comparison two algorithms produces predictions obtained that the Naïve Bayes algorithm with SMOTE gets accuracy value 70.32%, and AUC value 0.954. While Support Vector Machine with SMOTE gets accuracy value 66.92% and AUC value 0.832. From these results can be concluded that Naïve Bayes algorithm has a higher accuracy compared to Support Vector Machine algorithm, it can be seen that the accuracy difference between naïve Bayes and the vector machine support is 3.4%. Naïve Bayes algorithm can thus better predict the achievement of e-sports for students' learning curriculum

Jurnal Ilmu Komputer dan Informasi

Detecting Popularity of Ideas and Individuals in Online Community

Author: Hsiang Chien-Yi
Publication venue: 'Purdue University (bepress)'
Publication date: 01/08/2018
Field of study

Research in the last decade has prioritized the effects of online texts and online behaviors on user information prediction. However, the previous research overlooks the overall meaning of online texts and more detailed features about users’ online behaviors. The purpose of the research is to detect the adopted ideas, the popularity of ideas, and the popularity of individuals by identifying the overall meaning of online texts and the centrality features based on user’s online interactions within an online community. To gain insights into the research questions, the online discussions on MyStarbucksIdea website is examined in this research. MyStarbucksIdea had launched since 2008 that encouraged people to submit new ideas for improving Starbuck’s products and services. Starbucks had adopted hundreds of ideas from this crowdsourcing platform. Based on the example of the MyStarbucksIdea community, a new document representation approach, Doc2Vec, synthesized with the users’ centrality features was unitized in this research. Additionally, it also is essential to study the surface-level features of online texts, the sentiment features of online texts, and the features of users’ online behaviors to determine the idea adoption as well as the popularity of ideas and individuals in the online community. Furthermore, supervised machine learning approaches, including Logistic Regression, Support Vector Machine, and Random Forest, with the adjustments for the imbalanced classes, served as the classifiers for the experiments. The results of the experiments showed that the classifications of the idea adoption, the popularity of ideas, and the popularity of individuals were all considered successful. The overall meaning of idea texts and user’s centrality features were most accurate in detecting the adopted ideas and the popularity of ideas. The overall meaning of idea texts and the features of users’ online behaviors were most accurate in detecting the popularity of individuals. These results are in accord with the results of the previous studies, which used behavioral and textual features to predict user information and enhance the previous studies\u27 results by providing the new document embedding approach and the centrality features. The models used in this research can become a much-needed tool for the popularity predictions of future research

Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang

Author: Desnelita Yenny
Erlin Erlin
Nasution Nurliana
Suryati Laili
Zoromi Fransiskus
Publication venue: 'STMIK Bumigora Mataram'
Publication date: 31/07/2022
Field of study

Dalam aplikasi machine learning sangat umum ditemukan kumpulan data dalam berbagai tingkat ketidakseimbangan mulai dari ketidakseimbangan kecil, sedang sampai ekstrim. Sebagian besar model machine learning yang dilatih pada data tidak seimbang akan memiliki bias dengan memberikan tingkat akurasi yang tinggi pada kelas mayoritas dan sebaliknya rendah pada kelas minoritas. Tujuan penelitian ini adalah untuk mengevaluasi dampak dari SMOTE (Synthetic Minority Oversampling Technique) pada pengklasifikasi Random Forest untuk memprediksi penyakit jantung. Data berjumlah 299 berasal dari UCI Machine learning Repository digunakan untuk membangun model prediksi berdasarkan 12 variabel independen dan 1 variabel dependen. Kelas minoritas dalam dataset pelatihan di oversampling menggunakan teknik SMOTE (Synthetic Minority Oversampling Technique). Model dievaluasi tidak hanya menggunakan ukuran kinerja Accuracy dan Precision saja, namun juga menggunakan alternatif ukuran kinerja lainnya seperti Sensitivity, F1-score, Specificity, G-Mean dan Youdens Index yang lebih baik digunakan untuk data yang tidak seimbang. Hasil penelitian menunjukkan bahwa teknik SMOTE (Synthetic Minority Oversampling Technique) mampu mengurangi overfitting sekaligus meningkatkan kinerja model Random Forest pada semua indikator. Peningkatan skor Accuracy sebesar 3.45%, Precision 4.8%, Sensitivity 7.1%, F1-score 4.8%, Specificity 2.1%, G-Mean 4.4%, dan Youdens Index 6.3%. Penelitian ini membuktikan bahwa dalam menentukan pengklasifikasi dengan algoritma machine learning seperti Random Forest, kemiringan kelas dalam data perlu diperhitungkan dan diseimbangkan untuk hasil kinerja yang lebih baik

Unfolding the influencing factors and dynamics of overall hotel scores

Author: Botelho Miguel Tavares
Publication venue
Publication date: 11/10/2019
Field of study

The hospitality and tourism industry was boosted by the help of hotel review sites, which consists in an increasing demand on the part of tourists. We extracted more than thirty thousand reviews from Tripadvisor to understand the variations in customers' perceptions of high/low end and chain/independent hotels and on which aspects this variation is most evident. We used sentiment analysis to assign a score to the aspects of each review. We compared machine learning algorithms, namely, random forest, decision tree and decision tree with adaBoost, to predict the overall score. Then, we used the Gini index to understand the aspects that most influence the overall score. Finally, we compared the reviews with temporal windows overtime with Jaccard index to characterize the dynamics of customer satisfaction focusing on three aspects: "Service", "Location" and "Sleep". Correlating the responses of the hotel to the users' reviews, we wanted to demonstrate the impact in the customers' perception of the hotel quality. The best performances were achieved by the decision trees which indicated that "Service" is the most influential aspect for satisfaction, while "Location" and "Sleep" were the aspects considered less important. By identifying the moments of drastic changes, we verified that "Service" is also the most related to the overall score. These analyses allow hotel management to track the trends of tourists' assessment in each category. Generally speaking, a focus on the "Service" should be done. However, an analysis, for a particular hotel, of the dynamics of the overall score to compare with its category would be advantageous.A indústria da hospitalidade e turismo foi impulsionada pela ajuda de sites de avaliações de hotéis, que leva a uma exigencia crescente por parte dos turistas. Extraímos mais de trinta mil avaliações do Tripadvisor para entender as variações nas percepções dos clientes de hotéis de alta/baixa gama e cadeia/independentes e quais os aspectos essa variação é mais evidente. Usámos sentiment analysis para atribuir uma pontuação aos aspectos de cada avaliação. Comparámos algoritmos de aprendizagem automática, nomeadamente, "random forest", "decision tree" e "decision tree with adaBoost", para prever a pontuação geral. Depois, usámos o índice de Gini para entender os aspectos que mais influenciam a pontuação geral. Por fim, comparámos avaliações com as janelas temporais ao longo do tempo com o índice de Jaccard para caracterizar a dinâmica de satisfação do cliente com foco em três aspectos: "Service", "Location" e "Sleep". Ao correlacionar as respostas do hotel com as avaliações, queriamos demonstrar o impacto na percepção dos clientes sobre a qualidade dos hoteis. Os melhores desempenhos foram alcançados pelo decision tree que indicou que "Service" é o aspecto mais influente para satisfação, enquanto que "Location" e "Sleep" foram os aspectos considerados menos importantes. Ao identificar os momentos de mudanças drásticas, constatámos que "Service" também é o mais relacionado à pontuação geral. Estas análises permitem que a gestão dos hoteis acompanhe as tendências da avaliação dos turistas em cada categoria. De um modo geral, um foco no serviço deve ser feito. No entanto, uma análise, para um hotel particular, da dinâmica da pontuação geral para comparar com sua categoria seria vantajosa

Sentiment Analysis of Customers' Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution

Author: Al-Zoubi Ala´ M.
Faris Hossam
Obiedat Ruba
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/02/2022
Field of study

Online media has an increasing presence on the restaurants' activities through social media websites, coinciding with an increase in customers' reviews of these restaurants. These reviews become the main source of information for both customers and decision-makers in this field. Any customer who is seeking such places will check their reviews first, which usually affect their final choice. In addition, customers' experiences can be enhanced by utilizing other customers' suggestions. Consequently, customers' reviews can influence the success of restaurant business since it is considered the final judgment of the overall quality of any restaurant. Thus, decision-makers need to analyze their customers' underlying sentiments in order to meet their expectations and improve the restaurants' services, in terms of food quality, ambiance, price range, and customer service. The number of reviews available for various products and services has dramatically increased these days and so has the need for automated methods to collect and analyze these reviews. Sentiment Analysis (SA) is a field of machine learning that helps analyze and predict the sentiments underlying these reviews. Usually, SA for customers' reviews face imbalanced datasets challenge, as the majority of these sentiments fall into supporters or resistors of the product or service. This work proposes a hybrid approach by combining the SupportVector Machine (SVM) algorithm with Particle Swarm Optimization (PSO) and different oversampling techniques to handle the imbalanced data problem. SVM is applied as a machine learning classi cation technique to predict the sentiments of reviews by optimizing the dataset, which contains different reviews of several restaurants in Jordan. Data were collected from Jeeran, a well-known social network for Arabic reviews. A PSO technique is used to optimize the weights of the features, as well as four different oversampling techniques, namely, the Synthetic Minority Oversampling Technique (SMOTE), SVM-SMOTE, Adaptive Synthetic Sampling (ADASYN) and borderline-SMOTE were examined to produce an optimized dataset and solve the imbalanced problem of the dataset. This study shows that the proposed PSO-SVM approach produces the best results compared to different classiffication techniques in terms of accuracy, F-measure, G-mean and Area Under the Curve (AUC), for different versions of the datasets

Repositorio Institucional Universidad de Granada

Recommended from our members

Comparing Machine Learning and Deep Learning Techniques for Text Analytics: Detecting the Severity of Hate Comments Online

Author: Ioannou A
Marshan A
Mohamed Nizar FN
Spanaki K
Publication venue: Springer Nature
Publication date: 24/11/2023
Field of study

Data Availability: The data used in this work is a public dataset.Copyright © The Author(s) 2023. Social media platforms have become an increasingly popular tool for individuals to share their thoughts and opinions with other people. However, very often people tend to misuse social media posting abusive comments. Abusive and harassing behaviours can have adverse effects on people's lives. This study takes a novel approach to combat harassment in online platforms by detecting the severity of abusive comments, that has not been investigated before. The study compares the performance of machine learning models such as Naïve Bayes, Random Forest, and Support Vector Machine, with deep learning models such as Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (Bi-LSTM). Moreover, in this work we investigate the effect of text pre-processing on the performance of the machine and deep learning models, the feature set for the abusive comments was made using unigrams and bigrams for the machine learning models and word embeddings for the deep learning models. The comparison of the models’ performances showed that the Random Forest with bigrams achieved the best overall performance with an accuracy of (0.94), a precision of (0.91), a recall of (0.94), and an F1 score of (0.92). The study develops an efficient model to detect severity of abusive language in online platforms, offering important implications both to theory and practice.This research didn’t use any fund (public or private)

Brunel University Research Archive

Hybrid approach: naive bayes and sentiment VADER for analyzing sentiment of mobile unboxing video comments

Author: V. D Chaithra
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/10/2019
Field of study

Revolution in social media has attracted the users towards video sharing sites like YouTube. It is the most popular social media site where people view, share and interact by commenting on the videos. There are various types of videos that are shared by the users like songs, movie trailers, news, entertainment etc. Nowadays the most trending videos is the unboxing videos and in particular unboxing of mobile phones which gets more views, likes/dislikes and comments. Analyzing the comments of the mobile unboxing videos provides the opinion of the viewers towards the mobile phone. Studying the sentiment expressed in these comments show if the mobile phone is getting positive or negative feedback. A Hybrid approach combining the lexicon approach Sentiment VADER and machine learning algorithm Naive Bayes is applied on the comments to predict the sentiment. Sentiment VADER has a good impact on the Naive Bayes classifier in predicting the sentiment of the comment. The classifier achieves an accuracy of 79.78% and F1 score of 83.72%

ZENODO

Institute of Advanced Engineering and Science

Enhancing prediction of user stance for social networks rumors

Author: ElKorany Abeer
Ezzat Cherry A.
Khaled Kholoud
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2023
Field of study

The spread of social media has led to a massive change in the way information is dispersed. It provides organizations and individuals wider opportunities of collaboration. But it also causes an emergence of malicious users and attention seekers to spread rumors and fake news. Understanding user stances in rumor posts is very important to identify the veracity of the underlying content as news becomes viral in a few seconds which can lead to mass panic and confusion. In this paper, different machine learning techniques were utilized to enhance the user stance prediction through a conversation thread towards a given rumor on Twitter platform. We utilized both conversation thread features as well as features related to users who participated in this conversation, in order to predict the users’ stances, in terms of supporting, denying, querying, or commenting (SDQC), towards the source tweet. Furthermore, different datasets for the stance-prediction task were explored to handle the data imbalance problem and data augmentation for minority classes was applied to enhance the results. The proposed framework outperforms the state-of-the-art results with macro F1-score of 0.7233

Institute of Advanced Engineering and Science

Interpretable Classification of Wiki-Review Streams

Author: Burguillo-Rial Juan Carlos
García-Méndez Silvia
Leal Fátima
Malheiro Benedita
Publication venue: IEEE
Publication date: 13/12/2023
Field of study

Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90 % values for all evaluation metrics (accuracy, precision, recall, and F-measure).info:eu-repo/semantics/publishedVersio