Search CORE

6 research outputs found

Sentiment Analysis for Troll Activity Detection on Sina Weibo

Author: Jiang Zidong
Publication venue: SJSU ScholarWorks
Publication date: 20/05/2020
Field of study

The impact of social media on the modern world is difficult to overstate. Virtually all companies and public figures have social media accounts on popular platforms such as Twitter and Facebook. In China, the micro-blogging service provider Sina Weibo is the most popular such service. To overcome negative publicity, Weibo trolls the so called Water Army can be hired to post deceptive comments. In recent years, troll detection and sentiment analysis have been studied, but we are not aware of any research that considers troll detection based on sentiment analysis. In this research, we focus on troll detection via sentiment analysis with other user activity data gathered on the Sina Weibo platform, where the content is mainly in Chinese. We implement techniques for Chinese sentence segmentation, word embeddings, and sentiment score calculations. We employ the resulting techniques to develop and test a sentiment analysis approach for troll detection, based on a variety of machine learning strategies. Experimental results are generated, analyzed and the troll detection model we proposed achieved 89% accuracy for the dataset presented in this research. A Chrome extension is presented that implements our proposed technique, which enables real-time troll detection and troll comments filtering when a user browses Sina Weibo tweets and comments

SJSU ScholarWorks

Mapping (Dis-)Information Flow about the MH17 Plane Crash

Author: Augenstein Isabelle
Golovchenko Yevgeniy
Hartmann Mareike
Publication venue
Publication date: 01/01/2019
Field of study

Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Combating Misinformation on Social Media by Exploiting Post and User-level Information

Author: Mu Yida
Publication venue
Publication date: 01/02/2023
Field of study

Misinformation on social media has far-reaching negative impact on the public and society. Given the large number of real-time posts on social media, traditional manual-based methods of misinformation detection are not viable. Therefore, computational approaches (i.e., data-driven) have been proposed to combat online misinformation. Previous work on computational misinformation analysis has mainly focused on employing natural language processing (NLP) techniques to develop misinformation detection systems at the post level (e.g., using text and propagation network). However, it is also important to exploit information at the user level in social media, as users play a significant role (e.g., post, diffuse, refute, etc.) in spreading misinformation. The main aim of this thesis is to: (i) develop novel methods for analysing the behaviour of users who are likely to share or refute misinformation in social media; and (ii) predict and characterise unreliable stories with high popularity in social media. To this end, we first highlight the limitations in the evaluation protocol in popular rumour detection benchmarks on the post level and propose to evaluate such systems using chronological splits (i.e., considering temporal concept drift). On the user level, we introduce two novel tasks on (i) early detecting Twitter users that are likely to share misinformation before they actually do it; and (ii) identifying and characterising active citizens who refute misinformation in social media. Finally, we develop a new dataset to enable the study on predicting the future popularity (e.g. number of likes, replies, retweets) of false rumour on Weibo

White Rose E-theses Online

Rumor Stance Classification in Online Social Networks: A Survey on the State-of-the-Art, Prospects, and Future Challenges

Author: Dadlani Aresh
Jami Sarina
Maham Behrouz
Sabermahani Mohammad M.
Sahebi Iman
Shariatpanahi Seyed P.
Publication venue
Publication date: 02/08/2022
Field of study

The emergence of the Internet as a ubiquitous technology has facilitated the rapid evolution of social media as the leading virtual platform for communication, content sharing, and information dissemination. In spite of revolutionizing the way news used to be delivered to people, this technology has also brought along with itself inevitable demerits. One such drawback is the spread of rumors facilitated by social media platforms which may provoke doubt and fear upon people. Therefore, the need to debunk rumors before their wide spread has become essential all the more. Over the years, many studies have been conducted to develop effective rumor verification systems. One aspect of such studies focuses on rumor stance classification, which concerns the task of utilizing users' viewpoints about a rumorous post to better predict the veracity of a rumor. Relying on users' stances in rumor verification task has gained great importance, for it has shown significant improvements in the model performances. In this paper, we conduct a comprehensive literature review on rumor stance classification in complex social networks. In particular, we present a thorough description of the approaches and mark the top performances. Moreover, we introduce multiple datasets available for this purpose and highlight their limitations. Finally, some challenges and future directions are discussed to stimulate further relevant research efforts.Comment: 13 pages, 2 figures, journa

arXiv.org e-Print Archive

Automated Assessment of the Aftermath of Typhoons Using Social Media Texts

Author: Chen Zi
Publication venue: UNSW, Sydney
Publication date: 01/01/2021
Field of study

Disasters are one of the major threats to economics and human societies, causing substantial losses of human lives, properties and infrastructures. It has been our persistent endeavors to understand, prevent and reduce such disasters, and the popularization of social media is offering new opportunities to enhance disaster management in a crowd-sourcing approach. However, social media data is also characterized by its undue brevity, intense noise, and informality of language. The existing literature has not completely addressed these disadvantages, otherwise vast manual efforts are devoted to tackling these problems. The major focus of this research is on constructing a holistic framework to exploit social media data in typhoon damage assessment. The scope of this research covers data collection, relevance classification, location extraction and damage assessment while assorted approaches are utilized to overcome the disadvantages of social media data. Moreover, a semi-supervised or unsupervised approach is prioritized in forming the framework to minimize manual intervention. In data collection, query expansion strategy is adopted to optimize the search recall of typhoon-relevant information retrieval. Multiple filtering strategies are developed to screen the keywords and maintain the relevance to search topics in the keyword updates. A classifier based on a convolutional neural network is presented for relevance classification, with hashtags and word clusters as extra input channels to augment the information. In location extraction, a model is constructed by integrating Bidirectional Long Short-Time Memory and Conditional Random Fields. Feature noise correction layers and label smoothing are leveraged to handle the noisy training data. Finally, a multi-instance multi-label classifier identifies the damage relations in four categories, and the damage categories of a message are integrated with the damage descriptions score to obtain damage severity score for the message. A case study is conducted to verify the effectiveness of the framework. The outcomes indicate that the approaches and models developed in this study significantly improve in the classification of social media texts especially under the framework of semi-supervised or unsupervised learning. Moreover, the results of damage assessment from social media data are remarkably consistent with the official statistics, which demonstrates the practicality of the proposed damage scoring scheme

UNSWorks