Search CORE

5 research outputs found

Linguistic Characteristics of Censorable Language on SinaWeibo

Author: Feldman Anna
Leberknight Christopher
Ng Kei Yin
Peng Jing
Publication venue: Montclair State University Digital Commons
Publication date: 01/01/2018
Field of study

This paper investigates censorship from a linguistic perspective. We collect a corpus of censored and uncensored posts on a number of topics, build a classifier that predicts censorship decisions independent of discussion topics. Our investigation reveals that the strongest linguistic indicator of censored content of our corpus is its readability

Montclair State University Digital Commons

Linguistic Characteristics of Censorable Language on SinaWeibo

Author: Feldman Anna
Leberknight Chris
Ng Kei Yin
Peng Jing
Publication venue
Publication date: 01/01/2018
Field of study

arXiv.org e-Print Archive

Montclair State University Digital Commons

Detecting Censorable Content on Sina Weibo: A Pilot Study

Author: Feldman Anna State, 6557500
Leberknight Chris
Ng Kei Yin
Publication venue: Montclair State University Digital Commons
Publication date: 09/07/2018
Field of study

This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo

Montclair State University Digital Commons

Mapping (Dis-)Information Flow about the MH17 Plane Crash

Author: Augenstein Isabelle
Golovchenko Yevgeniy
Hartmann Mareike
Publication venue
Publication date: 01/01/2019
Field of study

Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System