251 research outputs found

    SENTIMENT ANALYSIS ON E-SPORTS FOR EDUCATION CURRICULUM USING NAIVE BAYES AND SUPPORT VECTOR MACHINE

    Get PDF
    The development of e-sports education is not just playing games, but about start making, development, marketing, research and other forms education aimed at training skills and providing knowledge in fostering character. The opinions expressed by the public can take form support, criticism and input. Very large volume of comments need to be analyzed accurately in order separate positive and negative sentiments. This research was conducted to measure opinions or separate positive and negative sentiments towards e-sports education, so that valuable information can be sought from social media. Data used in this study was obtained by crawling on social media Twitter. This study uses a classification algorithm, NaĂŻve Bayes and Support Vector Machine. Comparison two algorithms produces predictions obtained that the NaĂŻve Bayes algorithm with SMOTE gets accuracy value 70.32%, and AUC value 0.954. While Support Vector Machine with SMOTE gets accuracy value 66.92% and AUC value 0.832. From these results can be concluded that NaĂŻve Bayes algorithm has a higher accuracy compared to Support Vector Machine algorithm, it can be seen that the accuracy difference between naĂŻve Bayes and the vector machine support is 3.4%. NaĂŻve Bayes algorithm can thus better predict the achievement of e-sports for students' learning curriculum

    Detecting Popularity of Ideas and Individuals in Online Community

    Get PDF
    Research in the last decade has prioritized the effects of online texts and online behaviors on user information prediction. However, the previous research overlooks the overall meaning of online texts and more detailed features about users’ online behaviors. The purpose of the research is to detect the adopted ideas, the popularity of ideas, and the popularity of individuals by identifying the overall meaning of online texts and the centrality features based on user’s online interactions within an online community. To gain insights into the research questions, the online discussions on MyStarbucksIdea website is examined in this research. MyStarbucksIdea had launched since 2008 that encouraged people to submit new ideas for improving Starbuck’s products and services. Starbucks had adopted hundreds of ideas from this crowdsourcing platform. Based on the example of the MyStarbucksIdea community, a new document representation approach, Doc2Vec, synthesized with the users’ centrality features was unitized in this research. Additionally, it also is essential to study the surface-level features of online texts, the sentiment features of online texts, and the features of users’ online behaviors to determine the idea adoption as well as the popularity of ideas and individuals in the online community. Furthermore, supervised machine learning approaches, including Logistic Regression, Support Vector Machine, and Random Forest, with the adjustments for the imbalanced classes, served as the classifiers for the experiments. The results of the experiments showed that the classifications of the idea adoption, the popularity of ideas, and the popularity of individuals were all considered successful. The overall meaning of idea texts and user’s centrality features were most accurate in detecting the adopted ideas and the popularity of ideas. The overall meaning of idea texts and the features of users’ online behaviors were most accurate in detecting the popularity of individuals. These results are in accord with the results of the previous studies, which used behavioral and textual features to predict user information and enhance the previous studies\u27 results by providing the new document embedding approach and the centrality features. The models used in this research can become a much-needed tool for the popularity predictions of future research

    Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang

    Get PDF
    Dalam aplikasi machine learning sangat umum ditemukan kumpulan data dalam berbagai tingkat ketidakseimbangan mulai dari ketidakseimbangan kecil, sedang sampai ekstrim. Sebagian besar model machine learning yang dilatih pada data tidak seimbang akan memiliki bias dengan memberikan tingkat akurasi yang tinggi pada kelas mayoritas dan sebaliknya rendah pada kelas minoritas. Tujuan penelitian ini adalah untuk mengevaluasi dampak dari SMOTE (Synthetic Minority Oversampling Technique) pada pengklasifikasi Random Forest untuk memprediksi penyakit jantung. Data berjumlah 299 berasal dari UCI Machine learning Repository digunakan untuk membangun model prediksi berdasarkan 12 variabel independen dan 1 variabel dependen. Kelas minoritas dalam dataset pelatihan di oversampling menggunakan teknik SMOTE (Synthetic Minority Oversampling Technique). Model dievaluasi tidak hanya menggunakan ukuran kinerja Accuracy dan Precision saja, namun juga menggunakan alternatif ukuran kinerja lainnya seperti Sensitivity, F1-score, Specificity, G-Mean dan Youdens Index yang lebih baik digunakan untuk data yang tidak seimbang. Hasil penelitian menunjukkan bahwa teknik SMOTE (Synthetic Minority Oversampling Technique) mampu mengurangi overfitting sekaligus meningkatkan kinerja model Random Forest pada semua indikator. Peningkatan skor Accuracy sebesar 3.45%, Precision 4.8%, Sensitivity 7.1%, F1-score 4.8%, Specificity 2.1%, G-Mean 4.4%, dan Youdens Index 6.3%. Penelitian ini membuktikan bahwa dalam menentukan pengklasifikasi dengan algoritma machine learning seperti Random Forest, kemiringan kelas dalam data perlu diperhitungkan dan diseimbangkan untuk hasil kinerja yang lebih baik

    Unfolding the influencing factors and dynamics of overall hotel scores

    Get PDF
    The hospitality and tourism industry was boosted by the help of hotel review sites, which consists in an increasing demand on the part of tourists. We extracted more than thirty thousand reviews from Tripadvisor to understand the variations in customers' perceptions of high/low end and chain/independent hotels and on which aspects this variation is most evident. We used sentiment analysis to assign a score to the aspects of each review. We compared machine learning algorithms, namely, random forest, decision tree and decision tree with adaBoost, to predict the overall score. Then, we used the Gini index to understand the aspects that most influence the overall score. Finally, we compared the reviews with temporal windows overtime with Jaccard index to characterize the dynamics of customer satisfaction focusing on three aspects: "Service", "Location" and "Sleep". Correlating the responses of the hotel to the users' reviews, we wanted to demonstrate the impact in the customers' perception of the hotel quality. The best performances were achieved by the decision trees which indicated that "Service" is the most influential aspect for satisfaction, while "Location" and "Sleep" were the aspects considered less important. By identifying the moments of drastic changes, we verified that "Service" is also the most related to the overall score. These analyses allow hotel management to track the trends of tourists' assessment in each category. Generally speaking, a focus on the "Service" should be done. However, an analysis, for a particular hotel, of the dynamics of the overall score to compare with its category would be advantageous.A indústria da hospitalidade e turismo foi impulsionada pela ajuda de sites de avaliações de hotéis, que leva a uma exigencia crescente por parte dos turistas. Extraímos mais de trinta mil avaliações do Tripadvisor para entender as variações nas percepções dos clientes de hotéis de alta/baixa gama e cadeia/independentes e quais os aspectos essa variação é mais evidente. Usámos sentiment analysis para atribuir uma pontuação aos aspectos de cada avaliação. Comparámos algoritmos de aprendizagem automática, nomeadamente, "random forest", "decision tree" e "decision tree with adaBoost", para prever a pontuação geral. Depois, usámos o índice de Gini para entender os aspectos que mais influenciam a pontuação geral. Por fim, comparámos avaliações com as janelas temporais ao longo do tempo com o índice de Jaccard para caracterizar a dinâmica de satisfação do cliente com foco em três aspectos: "Service", "Location" e "Sleep". Ao correlacionar as respostas do hotel com as avaliações, queriamos demonstrar o impacto na percepção dos clientes sobre a qualidade dos hoteis. Os melhores desempenhos foram alcançados pelo decision tree que indicou que "Service" é o aspecto mais influente para satisfação, enquanto que "Location" e "Sleep" foram os aspectos considerados menos importantes. Ao identificar os momentos de mudanças drásticas, constatámos que "Service" também é o mais relacionado à pontuação geral. Estas análises permitem que a gestão dos hoteis acompanhe as tendências da avaliação dos turistas em cada categoria. De um modo geral, um foco no serviço deve ser feito. No entanto, uma análise, para um hotel particular, da dinâmica da pontuação geral para comparar com sua categoria seria vantajosa

    Sentiment Analysis of Customers' Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution

    Get PDF
    Online media has an increasing presence on the restaurants' activities through social media websites, coinciding with an increase in customers' reviews of these restaurants. These reviews become the main source of information for both customers and decision-makers in this field. Any customer who is seeking such places will check their reviews first, which usually affect their final choice. In addition, customers' experiences can be enhanced by utilizing other customers' suggestions. Consequently, customers' reviews can influence the success of restaurant business since it is considered the final judgment of the overall quality of any restaurant. Thus, decision-makers need to analyze their customers' underlying sentiments in order to meet their expectations and improve the restaurants' services, in terms of food quality, ambiance, price range, and customer service. The number of reviews available for various products and services has dramatically increased these days and so has the need for automated methods to collect and analyze these reviews. Sentiment Analysis (SA) is a field of machine learning that helps analyze and predict the sentiments underlying these reviews. Usually, SA for customers' reviews face imbalanced datasets challenge, as the majority of these sentiments fall into supporters or resistors of the product or service. This work proposes a hybrid approach by combining the SupportVector Machine (SVM) algorithm with Particle Swarm Optimization (PSO) and different oversampling techniques to handle the imbalanced data problem. SVM is applied as a machine learning classi cation technique to predict the sentiments of reviews by optimizing the dataset, which contains different reviews of several restaurants in Jordan. Data were collected from Jeeran, a well-known social network for Arabic reviews. A PSO technique is used to optimize the weights of the features, as well as four different oversampling techniques, namely, the Synthetic Minority Oversampling Technique (SMOTE), SVM-SMOTE, Adaptive Synthetic Sampling (ADASYN) and borderline-SMOTE were examined to produce an optimized dataset and solve the imbalanced problem of the dataset. This study shows that the proposed PSO-SVM approach produces the best results compared to different classiffication techniques in terms of accuracy, F-measure, G-mean and Area Under the Curve (AUC), for different versions of the datasets

    Hybrid approach: naive bayes and sentiment VADER for analyzing sentiment of mobile unboxing video comments

    Get PDF
    Revolution in social media has attracted the users towards video sharing sites like YouTube. It is the most popular social media site where people view, share and interact by commenting on the videos. There are various types of videos that are shared by the users like songs, movie trailers, news, entertainment etc. Nowadays the most trending videos is the unboxing videos and in particular unboxing of mobile phones which gets more views, likes/dislikes and comments. Analyzing the comments of the mobile unboxing videos provides the opinion of the viewers towards the mobile phone. Studying the sentiment expressed in these comments show if the mobile phone is getting positive or negative feedback. A Hybrid approach combining the lexicon approach Sentiment VADER and machine learning algorithm Naive Bayes is applied on the comments to predict the sentiment. Sentiment VADER has a good impact on the Naive Bayes classifier in predicting the sentiment of the comment. The classifier achieves an accuracy of 79.78% and F1 score of 83.72%

    Enhancing prediction of user stance for social networks rumors

    Get PDF
    The spread of social media has led to a massive change in the way information is dispersed. It provides organizations and individuals wider opportunities of collaboration. But it also causes an emergence of malicious users and attention seekers to spread rumors and fake news. Understanding user stances in rumor posts is very important to identify the veracity of the underlying content as news becomes viral in a few seconds which can lead to mass panic and confusion. In this paper, different machine learning techniques were utilized to enhance the user stance prediction through a conversation thread towards a given rumor on Twitter platform. We utilized both conversation thread features as well as features related to users who participated in this conversation, in order to predict the users’ stances, in terms of supporting, denying, querying, or commenting (SDQC), towards the source tweet. Furthermore, different datasets for the stance-prediction task were explored to handle the data imbalance problem and data augmentation for minority classes was applied to enhance the results. The proposed framework outperforms the state-of-the-art results with macro F1-score of 0.7233

    Interpretable Classification of Wiki-Review Streams

    Get PDF
    Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90 % values for all evaluation metrics (accuracy, precision, recall, and F-measure).info:eu-repo/semantics/publishedVersio
    • …
    corecore