92 research outputs found
Who let the trolls out? Towards understanding state-sponsored trolls
Recent evidence has emerged linking coordinated campaigns by state-sponsored actors to manipulate public opinion on the Web. Campaigns revolving around major political events are enacted via mission-focused ?trolls." While trolls are involved in spreading disinformation on social media, there is little understanding of how they operate, what type of content they disseminate, how their strategies evolve over time, and how they influence the Web's in- formation ecosystem. In this paper, we begin to address this gap by analyzing 10M posts by 5.5K Twitter and Reddit users identified as Russian and Iranian state-sponsored trolls. We compare the behavior of each group of state-sponsored trolls with a focus on how their strategies change over time, the different campaigns they embark on, and differences between the trolls operated by Russia and Iran. Among other things, we find: 1) that Russian trolls were pro-Trump while Iranian trolls were anti-Trump; 2) evidence that campaigns undertaken by such actors are influenced by real-world events; and 3) that the behavior of such actors is not consistent over time, hence detection is not straightforward. Using Hawkes Processes, we quantify the influence these accounts have on pushing URLs on four platforms: Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab. In general, Russian trolls were more influential and efficient in pushing URLs to all the other platforms with the exception of /pol/ where Iranians were more influential. Finally, we release our source code to ensure the reproducibility of our results and to encourage other researchers to work on understanding other emerging kinds of state-sponsored troll accounts on Twitter.https://arxiv.org/pdf/1811.03130.pdfAccepted manuscrip
The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans
A new era of Information Warfare has arrived. Various actors, including
state-sponsored ones, are weaponizing information on Online Social Networks to
run false information campaigns with targeted manipulation of public opinion on
specific topics. These false information campaigns can have dire consequences
to the public: mutating their opinions and actions, especially with respect to
critical world events like major elections. Evidently, the problem of false
information on the Web is a crucial one, and needs increased public awareness,
as well as immediate attention from law enforcement agencies, public
institutions, and in particular, the research community. In this paper, we make
a step in this direction by providing a typology of the Web's false information
ecosystem, comprising various types of false information, actors, and their
motives. We report a comprehensive overview of existing research on the false
information ecosystem by identifying several lines of work: 1) how the public
perceives false information; 2) understanding the propagation of false
information; 3) detecting and containing false information on the Web; and 4)
false information on the political stage. In this work, we pay particular
attention to political false information as: 1) it can have dire consequences
to the community (e.g., when election results are mutated) and 2) previous work
show that this type of false information propagates faster and further when
compared to other types of false information. Finally, for each of these lines
of work, we report several future research directions that can help us better
understand and mitigate the emerging problem of false information dissemination
on the Web
Large scale crowdsourcing and characterization of Twitter abusive behavior
In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels.By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset
of 80 thousand tweets, which we make publicly available for further scientific exploration.Accepted manuscrip
Identifying Misinformation on YouTube through Transcript Contextual Analysis with Transformer Models
Misinformation on YouTube is a significant concern, necessitating robust
detection strategies. In this paper, we introduce a novel methodology for video
classification, focusing on the veracity of the content. We convert the
conventional video classification task into a text classification task by
leveraging the textual content derived from the video transcripts. We employ
advanced machine learning techniques like transfer learning to solve the
classification challenge. Our approach incorporates two forms of transfer
learning: (a) fine-tuning base transformer models such as BERT, RoBERTa, and
ELECTRA, and (b) few-shot learning using sentence-transformers MPNet and
RoBERTa-large. We apply the trained models to three datasets: (a) YouTube
Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and
(c) Fake-News dataset (a collection of articles). Including the Fake-News
dataset extended the evaluation of our approach beyond YouTube videos. Using
these datasets, we evaluated the models distinguishing valid information from
misinformation. The fine-tuned models yielded Matthews Correlation
Coefficient>0.81, accuracy>0.90, and F1 score>0.90 in two of three datasets.
Interestingly, the few-shot models outperformed the fine-tuned ones by 20% in
both Accuracy and F1 score for the YouTube Pseudoscience dataset, highlighting
the potential utility of this approach -- especially in the context of limited
training data
What is Gab: A bastion of free speech or an alt-right echo chamber
H2020 Marie Skłodowska-Curie Action
"It is just a flu": assessing the effect of watch history on YouTube's pseudoscientific video recommendations
CNS-1942610 - National Science FoundationAccepted manuscrip
Disturbed YouTube for kids: characterizing and detecting inappropriate videos targeting young children
A large number of the most-subscribed YouTube channels target
children of very young age. Hundreds of toddler-oriented
channels on YouTube feature inoffensive, well produced, and
educational videos. Unfortunately, inappropriate content that
targets this demographic is also common. YouTube’s algorithmic
recommendation system regrettably suggests inappropriate
content because some of it mimics or is derived from otherwise
appropriate content. Considering the risk for early childhood
development, and an increasing trend in toddler’s consumption
of YouTube media, this is a worrisome problem.
In this work, we build a classifier able to discern inappropriate
content that targets toddlers on YouTube with 84:3% accuracy,
and leverage it to perform a first-of-its-kind, large-scale,
quantitative characterization that reveals some of the risks of
YouTube media consumption by young children. Our analysis
reveals that YouTube is still plagued by such disturbing videos
and its currently deployed counter-measures are ineffective in
terms of detecting them in a timely manner. Alarmingly, using
our classifier we show that young children are not only able,
but likely to encounter disturbing videos when they randomly
browse the platform starting from benign videos.Accepted manuscrip
- …