Search CORE

59,402 research outputs found

SENTIMENT LABELING AND TEXT CLASSIFICATION MACHINE LEARNING FOR WHATSAPP GROUP

Author: Defit Sarjon
Susandri Susandri
Tajuddin Muhammad
Publication venue: LPPM Nusa Mandiri
Publication date: 21/08/2023
Field of study

The use of WhatsApp Group (WAG) for communication is increasing nowadays. WAG communication data can be analyzed from various perspectives. However, this data is imported in the form of unstructured text files. The aim of this research is to explore the potential use of the SentiwordNet lexicon for labeling the positive, negative, or neutral sentiment of WAG data from "Alumni94" and training and testing it with machine learning text classification models. The training and testing were conducted on six models, namely Random Forest, Decision Tree, Logistic Regression, K-Nearest Neighbors (KNN), Linear Support Vector Machine (SVM), and Artificial Neural Network. The labeling results indicate that neutral sentiment is the majority with 7588 samples, followed by 324 negative and 1617 positive samples. Among all the models, Random Forest showed better precision and recall, i.e., 83% and 64%. On the other hand, Decision Tree had slightly lower precision and recall, i.e., 80% and 66%, but exhibited a better f-measure of 71%. The accuracy evaluation results of the Random Forest and Decision Tree models showed significant performance compared to others, achieving an accuracy of 89% in classifying new messages. This research demonstrates the potential use of the SentiwordNet lexicon and machine learning in sentiment analysis of WAG data using the Random Forest and Decision Tree model

ejournal.nusamandiri.ac.id (STMIK Nusa Mandiri)

Mobile sentiment analysis

Author: Chambers L.
Gaber M.
Pechenizkiy M.
Tromp E.
Publication venue
Publication date: 10/09/2012
Field of study

Portsmouth University Research Portal (Pure)

Scalable Privacy-Compliant Virality Prediction on Twitter

Author: Kowalczyk Damian Konrad
Larsen Jan
Publication venue
Publication date: 01/01/2019
Field of study

The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

arXiv.org e-Print Archive

Online Research Database In Technology

Sentiment Analysis using an ensemble of Feature Selection Algorithms

Author: Bhagat Manankumar
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2018
Field of study

To determine the opinion of any person experiencing any services or buying any product, the usage of Sentiment Analysis, a continuous research in the field of text mining, is a common practice. It is a process of using computation to identify and categorize opinions expressed in a piece of text. Individuals post their opinion via reviews, tweets, comments or discussions which is our unstructured information. Sentiment analysis gives a general conclusion of audits which benefit clients, individuals or organizations for decision making. The primary point of this paper is to perform an ensemble approach on feature reduction methods identified with natural language processing and performing the analysis based on the results. An ensemble approach is a process of combining two or more methodologies. The feature reduction methods used are Principal Component Analysis (PCA) for feature extraction and Pearson Chi squared statistical test for feature selection. The fundamental commitment of this paper is to experiment whether combined use of cautious feature determination and existing classification methodologies can yield better accuracy

SJSU ScholarWorks

Multilingual Cross-domain Perspectives on Online Hate Speech

Author: Daelemans Walter
De Pauw Guy
De Smedt Tom
Gwóźdź Maja
Jaki Sylvia
Kotzé Eduan
Saoud Leïla
Publication venue
Publication date: 01/01/2018
Field of study

In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.Comment: 24 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen