Search CORE

75 research outputs found

Why Does China Allow Freer Social Media? Protests Versus Surveillance And Propaganda

Author: Qin B
Stromberg D
Wu Y
Publication venue: 'American Economic Association'
Publication date: 01/01/2017
Field of study

In this paper, we document basic facts regarding public debates about controversial political issues on Chinese social media. Our documentation is based on a dataset of 13.2 billion blog posts published on Sina Weibo--the most prominent Chinese microblogging platform--during the 2009-2013 period. Our primary finding is that a shockingly large number of posts on highly sensitive topics were published and circulated on social media. For instance, we find millions of posts discussing protests, and these posts are informative in predicting the occurrence of specific events. We find an even larger number of posts with explicit corruption allegations, and that these posts predict future corruption charges of specific individuals. Our findings challenge a popular view that an authoritarian regime would relentlessly censor or even ban social media. Instead, the interaction of an authoritarian government with social media seems more complex.published_or_final_versio

HKU Scholars Hub

EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

Author: A Bruns
AM Azmi
BS Wasike
D Bodoff
D Elsweiler
Hind Almerekhi
J Benhardus
JL Fleiss
JR Landis
K Darwish
M Efron
M Rowe
M Sanderson
Maram Hasanain
Mucahid Kutlu
Reem Suwaileh
RL Brennan
Tamer Elsayed
W Magdy
Zhang Y
Publication venue
Publication date: 21/08/2017
Field of study

This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

arXiv.org e-Print Archive

Qatar University Institutional Repository

Crossref

Detecting Censorable Content on Sina Weibo: A Pilot Study

Author: Feldman Anna State, 6557500
Leberknight Chris
Ng Kei Yin
Publication venue: Montclair State University Digital Commons
Publication date: 09/07/2018
Field of study

This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo

Montclair State University Digital Commons

Analyzing Regrettable Communications on Twitter: Characterizing Deleted Tweets and Their Authors

Author: Bhattacharya Parantapa
Ganguly Niloy
Ghosh Saptarshi
Publication venue
Publication date: 23/12/2022
Field of study

Over 500 million tweets are posted in Twitter each day, out of which about 11% tweets are deleted by the users posting them. This phenomenon of widespread deletion of tweets leads to a number of questions: what kind of content posted by users makes them want to delete them later? %Are all users equally active in deleting their tweets or Are users of certain predispositions more likely to post regrettable tweets, deleting them later? In this paper we provide a detailed characterization of tweets posted and then later deleted by their authors. We collected tweets from over 200 thousand Twitter users during a period of four weeks. Our characterization shows significant personality differences between users who delete their tweets and those who do not. We find that users who delete their tweets are more likely to be extroverted and neurotic while being less conscientious. Also, we find that deleted tweets while containing less information and being less conversational, contain significant indications of regrettable content. Since users of online communication do not have instant social cues (like listener's body language) to gauge the impact of their words, they are often delayed in employing repair strategies. Finally, we build a classifier which takes textual, contextual, as well as user features to predict if a tweet will be deleted or not. The classifier achieves a F1-score of 0.78 and the precision increases when we consider response features of the tweets

arXiv.org e-Print Archive

Recommended from our members

This Account Doesn’t Exist: Tweet Decay and the Politics of Deletion in the Brexit Debate

Author: Bastos M. T.
Publication venue: 'SAGE Publications'
Publication date: 01/05/2021
Field of study

Literature on influence operations has identified metrics that are indicative of social media manipulation, but few studies have explored the lifecycle of low-quality information. We contribute to this literature by reconstructing nearly 3 million messages posted by 1 million users in the last days of the Brexit referendum campaign. While previous studies have found that on average only 4% of tweets disappear, we found that 33% of the tweets leading up to the referendum vote are no longer available. Only about half of the most active accounts that tweeted the referendum continue to operate publicly, and 20% of all accounts are no longer active. We tested whether partisan content was more likely to disappear and found more messages from the Leave campaign that disappeared than the entire universe of tweets affiliated with the Remain campaign. We compare these results with an assorted set of 45 hashtags posted in the same period and find that political campaigns present much higher ratios of user and tweet decay. These results are validated by inspecting 2 million Brexit-related tweets posted over a period of nearly 4 years. The article concludes with an overview of these findings and recommendations for future research

City Research Online

Event Detection in Twitter Using Multi Timing Chained Windows

Author: Mojiri Mohammad Mahdi
Ravanmehr Reza
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/05/2021
Field of study

Twitter is a popular microblogging and social networking service. Twitter posts are continuously generated and well suited for knowledge discovery using different data mining techniques. We present a novel near real-time approach for processing tweets and detecting events. The proposed method, Multi Timing Chained Windows (MTCW), is independent of the language of the tweets. The MTCW defines several Timing Windows and links them to each other like a chain. Indeed, in this chain, the input of the larger window will be the output of the smaller previous one. Using MTCW, the events can be detected over a few minutes. To evaluate this idea, the required dataset has been collected using the Twitter API. The results of evaluations show the accuracy and the effectiveness of our approach compared with other state-of-the-art methods in the event detection in Twitter

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Using social media for social research : an introduction : May 2016

Author
Publication venue: Social Media Research Group
Publication date: 01/01/2016
Field of study

Digital Education Resource Archive