1,312 research outputs found
Who let the trolls out? Towards understanding state-sponsored trolls
Recent evidence has emerged linking coordinated campaigns by state-sponsored actors to manipulate public opinion on the Web. Campaigns revolving around major political events are enacted via mission-focused ?trolls." While trolls are involved in spreading disinformation on social media, there is little understanding of how they operate, what type of content they disseminate, how their strategies evolve over time, and how they influence the Web's in- formation ecosystem. In this paper, we begin to address this gap by analyzing 10M posts by 5.5K Twitter and Reddit users identified as Russian and Iranian state-sponsored trolls. We compare the behavior of each group of state-sponsored trolls with a focus on how their strategies change over time, the different campaigns they embark on, and differences between the trolls operated by Russia and Iran. Among other things, we find: 1) that Russian trolls were pro-Trump while Iranian trolls were anti-Trump; 2) evidence that campaigns undertaken by such actors are influenced by real-world events; and 3) that the behavior of such actors is not consistent over time, hence detection is not straightforward. Using Hawkes Processes, we quantify the influence these accounts have on pushing URLs on four platforms: Twitter, Reddit, 4chan's Politically Incorrect board (/pol/), and Gab. In general, Russian trolls were more influential and efficient in pushing URLs to all the other platforms with the exception of /pol/ where Iranians were more influential. Finally, we release our source code to ensure the reproducibility of our results and to encourage other researchers to work on understanding other emerging kinds of state-sponsored troll accounts on Twitter.https://arxiv.org/pdf/1811.03130.pdfAccepted manuscrip
Seminar Users in the Arabic Twitter Sphere
We introduce the notion of "seminar users", who are social media users
engaged in propaganda in support of a political entity. We develop a framework
that can identify such users with 84.4% precision and 76.1% recall. While our
dataset is from the Arab region, omitting language-specific features has only a
minor impact on classification performance, and thus, our approach could work
for detecting seminar users in other parts of the world and in other languages.
We further explored a controversial political topic to observe the prevalence
and potential potency of such users. In our case study, we found that 25% of
the users engaged in the topic are in fact seminar users and their tweets make
nearly a third of the on-topic tweets. Moreover, they are often successful in
affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
Fully Automated Fact Checking Using External Sources
Given the constantly growing proliferation of false claims online in recent
years, there has been also a growing research interest in automatically
distinguishing false rumors from factually true claims. Here, we propose a
general-purpose framework for fully-automatic fact checking using external
sources, tapping the potential of the entire Web as a knowledge source to
confirm or reject a claim. Our framework uses a deep neural network with LSTM
text encoding to combine semantic kernels with task-specific embeddings that
encode a claim together with pieces of potentially-relevant text fragments from
the Web, taking the source reliability into account. The evaluation results
show good performance on two different tasks and datasets: (i) rumor detection
and (ii) fact checking of the answers to a question in community question
answering forums.Comment: RANLP-201
An Army of Me: Sockpuppets in Online Discussion Communities
In online discussion communities, users can interact and share information
and opinions on a wide variety of topics. However, some users may create
multiple identities, or sockpuppets, and engage in undesired behavior by
deceiving others or manipulating discussions. In this work, we study
sockpuppetry across nine discussion communities, and show that sockpuppets
differ from ordinary users in terms of their posting behavior, linguistic
traits, as well as social network structure. Sockpuppets tend to start fewer
discussions, write shorter posts, use more personal pronouns such as "I", and
have more clustered ego-networks. Further, pairs of sockpuppets controlled by
the same individual are more likely to interact on the same discussion at the
same time than pairs of ordinary users. Our analysis suggests a taxonomy of
deceptive behavior in discussion communities. Pairs of sockpuppets can vary in
their deceptiveness, i.e., whether they pretend to be different users, or their
supportiveness, i.e., if they support arguments of other sockpuppets controlled
by the same user. We apply these findings to a series of prediction tasks,
notably, to identify whether a pair of accounts belongs to the same underlying
user or not. Altogether, this work presents a data-driven view of deception in
online discussion communities and paves the way towards the automatic detection
of sockpuppets.Comment: 26th International World Wide Web conference 2017 (WWW 2017
Detecting Toxicity in News Articles: Application to Bulgarian
Online media aim for reaching ever bigger audience and for attracting ever
longer attention span. This competition creates an environment that rewards
sensational, fake, and toxic news. To help limit their spread and impact, we
propose and develop a news toxicity detector that can recognize various types
of toxic content. While previous research primarily focused on English, here we
target Bulgarian. We created a new dataset by crawling a website that for five
years has been collecting Bulgarian news articles that were manually
categorized into eight toxicity groups. Then we trained a multi-class
classifier with nine categories: eight toxic and one non-toxic. We experimented
with different representations based on ElMo, BERT, and XLM, as well as with a
variety of domain-specific features. Due to the small size of our dataset, we
created a separate model for each feature type, and we ultimately combined
these models into a meta-classifier. The evaluation results show an accuracy of
59.0% and a macro-F1 score of 39.7%, which represent sizable improvements over
the majority-class baseline (Acc=30.3%, macro-F1=5.2%).Comment: Fact-checking, source reliability, political ideology, news media,
Bulgarian, RANLP-2019. arXiv admin note: text overlap with arXiv:1810.0176
- …