59,402 research outputs found

    SENTIMENT LABELING AND TEXT CLASSIFICATION MACHINE LEARNING FOR WHATSAPP GROUP

    Get PDF
    The use of WhatsApp Group (WAG) for communication is increasing nowadays. WAG communication data can be analyzed from various perspectives. However, this data is imported in the form of unstructured text files. The aim of this research is to explore the potential use of the SentiwordNet lexicon for labeling the positive, negative, or neutral sentiment of WAG data from "Alumni94" and training and testing it with machine learning text classification models. The training and testing were conducted on six models, namely Random Forest, Decision Tree, Logistic Regression, K-Nearest Neighbors (KNN), Linear Support Vector Machine (SVM), and Artificial Neural Network. The labeling results indicate that neutral sentiment is the majority with 7588 samples, followed by 324 negative and 1617 positive samples. Among all the models, Random Forest showed better precision and recall, i.e., 83% and 64%. On the other hand, Decision Tree had slightly lower precision and recall, i.e., 80% and 66%, but exhibited a better f-measure of 71%. The accuracy evaluation results of the Random Forest and Decision Tree models showed significant performance compared to others, achieving an accuracy of 89% in classifying new messages. This research demonstrates the potential use of the SentiwordNet lexicon and machine learning in sentiment analysis of WAG data using the Random Forest and Decision Tree model

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    Sentiment Analysis using an ensemble of Feature Selection Algorithms

    Get PDF
    To determine the opinion of any person experiencing any services or buying any product, the usage of Sentiment Analysis, a continuous research in the field of text mining, is a common practice. It is a process of using computation to identify and categorize opinions expressed in a piece of text. Individuals post their opinion via reviews, tweets, comments or discussions which is our unstructured information. Sentiment analysis gives a general conclusion of audits which benefit clients, individuals or organizations for decision making. The primary point of this paper is to perform an ensemble approach on feature reduction methods identified with natural language processing and performing the analysis based on the results. An ensemble approach is a process of combining two or more methodologies. The feature reduction methods used are Principal Component Analysis (PCA) for feature extraction and Pearson Chi squared statistical test for feature selection. The fundamental commitment of this paper is to experiment whether combined use of cautious feature determination and existing classification methodologies can yield better accuracy

    Multilingual Cross-domain Perspectives on Online Hate Speech

    Full text link
    In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.Comment: 24 page
    • …
    corecore