1 research outputs found
Real-time Detection of Content Polluters in Partially Observable Twitter Networks
Content polluters, or bots that hijack a conversation for political or
advertising purposes are a known problem for event prediction, election
forecasting and when distinguishing real news from fake news in social media
data. Identifying this type of bot is particularly challenging, with
state-of-the-art methods utilising large volumes of network data as features
for machine learning models. Such datasets are generally not readily available
in typical applications which stream social media data for real-time event
prediction. In this work we develop a methodology to detect content polluters
in social media datasets that are streamed in real-time. Applying our method to
the problem of civil unrest event prediction in Australia, we identify content
polluters from individual tweets, without collecting social network or
historical data from individual accounts. We identify some peculiar
characteristics of these bots in our dataset and propose metrics for
identification of such accounts. We then pose some research questions around
this type of bot detection, including: how good Twitter is at detecting content
polluters and how well state-of-the-art methods perform in detecting bots in
our dataset.Comment: Accepted for publication in WWW '18 Companion: The 2018 Web
Conference Companion, April 23-27, 2018, Lyon, Franc