Search CORE

330 research outputs found

Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks

Author: Hooi Bryan
Wu Jiaying
Publication venue
Publication date: 19/09/2022
Field of study

As social media becomes a hotbed for the spread of misinformation, the crucial task of rumor detection has witnessed promising advances fostered by open-source benchmark datasets. Despite being widely used, we find that these datasets suffer from spurious correlations, which are ignored by existing studies and lead to severe overestimation of existing rumor detection performance. The spurious correlations stem from three causes: (1) event-based data collection and labeling schemes assign the same veracity label to multiple highly similar posts from the same underlying event; (2) merging multiple data sources spuriously relates source identities to veracity labels; and (3) labeling bias. In this paper, we closely investigate three of the most popular rumor detection benchmark datasets (i.e., Twitter15, Twitter16 and PHEME), and propose event-separated rumor detection as a solution to eliminate spurious cues. Under the event-separated setting, we observe that the accuracy of existing state-of-the-art models drops significantly by over 40%, becoming only comparable to a simple neural classifier. To better address this task, we propose Publisher Style Aggregation (PSA), a generalizable approach that aggregates publisher posting records to learn writing style and veracity stance. Extensive experiments demonstrate that our method outperforms existing baselines in terms of effectiveness, efficiency and generalizability.Comment: Accepted to ECML-PKDD 202

arXiv.org e-Print Archive

Twitter data analysis as contribution to strategic foresight-The case of the EU Research Project “Foresight and Modelling for European Health Policy and Regulations” (FRESHER)

Author
Publication venue: Springer
Publication date: 08/12/2016
Field of study

Springer - Publisher Connector

Doctor of Philosophy

Author: Oh Chong Keat
Publication venue: University of Utah
Publication date: 01/05/2013
Field of study

dissertationDue to the popularity of Web 2.0 and Social Media in the last decade, the percolation of user generated content (UGC) has rapidly increased. In the financial realm, this results in the emergence of virtual investing communities (VIC) to the investing public. There is an on-going debate among scholars and practitioners on whether such UGC contain valuable investing information or mainly noise. I investigate two major studies in my dissertation. First I examine the relationship between peer influence and information quality in the context of individual characteristics in stock microblogging. Surprisingly, I discover that the set of individual characteristics that relate to peer influence is not synonymous with those that relate to high information quality. In relating to information quality, influentials who are frequently mentioned by peers due to their name value are likely to possess higher information quality while those who are better at diffusing information via retweets are likely to associate with lower information quality. Second I propose a study to explore predictability of stock microblog dimensions and features over stock price directional movements using data mining classification techniques. I find that author-ticker-day dimension produces the highest predictive accuracy inferring that this dimension is able to capture both relevant author and ticker information as compared to author-day and ticker-day. In addition to these two studies, I also explore two topics: network structure of co-tweeted tickers and sentiment annotation via crowdsourcing. I do this in order to understand and uncover new features as well as new outcome indicators with the objective of improving predictive accuracy of the classification or saliency of the explanatory models. My dissertation work extends the frontier in understanding the relationship between financial UGC, specifically stock microblogging with relevant phenomena as well as predictive outcomes

The University of Utah: J. Willard Marriott Digital Library

VIRAL TOPIC PREDICTION AND DESCRIPTION IN MICROBLOG SOCIAL NETWORKS

Author: BIAN JINGWEN
Publication venue
Publication date: 21/08/2015
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Knowledge Discovery and Complex Network Dynamics in Social Media Space

Author: Baagyere Edward Yellakuor
Hu Xiong
Qin Zhen
Zhiguang Qin
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/05/2015
Field of study

Pattern discovery and correlation in text data have been research hotbed in recent times. However, a composite model that captures patterns and correlations as a quantitative measure in social media space is yet to receive much research attention. The paper therefore analyzed social media data from Twitter about the 2014-FIFA World Cup both as lexical text and a complex network system. Quantitatively it is discovered that the 140 character upper bound in Twitter does not have negative impact on the formation of ideas. For as a lexical text, the following key statistics were confirmed: the distribution of the words in the corpus obeys a Zipf’s law, 3-character length words accounted for almost 22% of the corpus and the distribution of the article "the" also follows a Zipf’s or power-law. Moreover, the three most frequent terms related to the world cup event, that is (url, worldcup, rt) account for about 14.5% of the corpus. In particular, the corpus is modeled as a network, where 12 V"> is the set of vocabularies in the corpus and is the set of bigrams (two words phrases). An algorithm is developed and implemented in python to obtain the bigrams from the corpus. Using concepts from graph theory, the bigram network is analyzed and the results show compelling facts about text network. Firstly, all the characteristics of complex networks known in literature are observed in the bigram network. These include the degree distribution, which is observed to follow power-law with degree exponent value of 2.14. Secondly, the average path length of words is observed to be 4.78, which is within the ”small world” categories. Thirdly, other complex network characteristics such as eigenvector and betweenness centralities metrics are observed within the bigram network both having weak power-law distributions as observed in other complex networks in literature. These findings call for the need to study the topological characteristics of text data and comparing their structural properties to that of known complex network metrics in literature. The results will be of great importance in studying complex systems. Also the application areas of these findings are numerous ranging from information retrieval, data compression to information security. To the best of our knowledge, this is the first work that studied the textual and topological structure of text from social media platform as a complex network and analyzed important topological properties of complex network on it. Keywords: complex network, bigram, media space, Twitter, information scienc

International Institute for Science, Technology and Education (IISTE): E-Journals

Unsupervised learning on social data

Author: Borutta Felix
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 11/03/2020
Field of study

Microbloggers’ motivations in participatory journalism: A cross-cultural study of America and China

Author: Rui Jue
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2014
Field of study

This phenomenological study focuses on the motivations of participatory journalists contributing on microblogs such as Twitter and Weibo. Although online user behavior and motivations have been studied before, few studies have examined motivations of participatory journalists from their own perspective. Moreover, this study is one of the few to explore participatory journalists across different cultures (U.S. and China). The author conducted a total of 13 in-depth interviews with participatory journalists on microblogs from both countries and used a qualitative analysis method to identify the themes and patterns that emerged. Motivations such as earning respect, technology early adoption, self-expression, relationship building, self-enhancement, branding and image building, and financial gain were discussed. De-motivational factors such as time constraints and self-censorship were presented. Motivational differences between the two groups of participants, including what the microblog account represents and the role of participatory journalists, were explained by cultural differences collectivism versus individualism and power distance. Limitations and future research were also discussed

University of Tennessee, Knoxville: Trace