181 research outputs found
Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection
The arm race between spambots and spambot-detectors is made of several cycles
(or generations): a new wave of spambots is created (and new spam is spread),
new spambot filters are derived and old spambots mutate (or evolve) to new
species. Recently, with the diffusion of the adversarial learning approach, a
new practice is emerging: to manipulate on purpose target samples in order to
make stronger detection models. Here, we manipulate generations of Twitter
social bots, to obtain - and study - their possible future evolutions, with the
aim of eventually deriving more effective detection techniques. In detail, we
propose and experiment with a novel genetic algorithm for the synthesis of
online accounts. The algorithm allows to create synthetic evolved versions of
current state-of-the-art social bots. Results demonstrate that synthetic bots
really escape current detection techniques. However, they give all the needed
elements to improve such techniques, making possible a proactive approach for
the design of social bot detection systems.Comment: This is the pre-final version of a paper accepted @ 11th ACM
Conference on Web Science, June 30-July 3, 2019, Boston, U
Bot Electioneering Volume: Visualizing Social Bot Activity During Elections
It has been widely recognized that automated bots may have a significant
impact on the outcomes of national events. It is important to raise public
awareness about the threat of bots on social media during these important
events, such as the 2018 US midterm election. To this end, we deployed a web
application to help the public explore the activities of likely bots on Twitter
on a daily basis. The application, called Bot Electioneering Volume (BEV),
reports on the level of likely bot activities and visualizes the topics
targeted by them. With this paper we release our code base for the BEV
framework, with the goal of facilitating future efforts to combat malicious
bots on social media.Comment: 3 pages, 3 figures. In submissio
Should we agree to disagree about Twitter's bot problem?
Bots, simply defined as accounts controlled by automation, can be used as a
weapon for online manipulation and pose a threat to the health of platforms.
Researchers have studied online platforms to detect, estimate, and characterize
bot accounts. Concerns about the prevalence of bots were raised following Elon
Musk's bid to acquire Twitter. Twitter's recent estimate that 5\% of
monetizable daily active users being bot accounts raised questions about their
methodology. This estimate is based on a specific number of active users and
relies on Twitter's criteria for bot accounts. In this work, we want to stress
that crucial questions need to be answered in order to make a proper estimation
and compare different methodologies. We argue how assumptions on bot-likely
behavior, the detection approach, and the population inspected can affect the
estimation of the percentage of bots on Twitter. Finally, we emphasize the
responsibility of platforms to be vigilant, transparent, and unbiased in
dealing with threats that may affect their users.Comment: 22 pages, 5 figure
Arming the public with artificial intelligence to counter social bots
The increased relevance of social media in our daily life has been
accompanied by efforts to manipulate online conversations and opinions.
Deceptive social bots -- automated or semi-automated accounts designed to
impersonate humans -- have been successfully exploited for these kinds of
abuse. Researchers have responded by developing AI tools to arm the public in
the fight against social bots. Here we review the literature on different types
of bots, their impact, and detection methods. We use the case study of
Botometer, a popular bot detection tool developed at Indiana University, to
illustrate how people interact with AI countermeasures. A user experience
survey suggests that bot detection has become an integral part of the social
media experience for many users. However, barriers in interpreting the output
of AI tools can lead to fundamental misunderstandings. The arms race between
machine learning methods to develop sophisticated bots and effective
countermeasures makes it necessary to update the training data and features of
detection tools. We again use the Botometer case to illustrate both algorithmic
and interpretability improvements of bot scores, designed to meet user
expectations. We conclude by discussing how future AI developments may affect
the fight between malicious bots and the public.Comment: Published in Human Behavior and Emerging Technologie
Oportunidades, riesgos y aplicaciones de la inteligencia de fuentes abiertas en la ciberseguridad y la ciberdefensa
The intelligence gathering has transformed significantly in the digital age. A qualitative leap within this domain is the sophistication of Open Source Intelligence (OSINT), a paradigm that exploits publicly available information for planned and strategic objectives.
The main purpose of this PhD thesis is to motivate, justify and demonstrate OSINT as a reference paradigm that should complement the present and future of both civilian cybersecurity solutions and cyberdefence national and international strategies. The first objective concerns the critical examination and evaluation of the state of OSINT under the current digital revolution and the growth of Big Data and Artificial Intelligence (AI). The second objective is geared toward categorizing security and privacy risks associated with OSINT. The third objective focuses on leveraging the OSINT advantages in practical use cases by designing and implementing OSINT techniques to counter online threats, particularly those from social networks. The fourth objective embarks on exploring the Dark web through the lens of OSINT, identifying and evaluating existing techniques for discovering Tor onion addresses, those that enable the access to Dark sites hosted in the Tor network, which could facilitate the monitoring of underground sites.
To achieve these objectives, we follow a methodology with clearly ordered steps. Firstly, a rigorous review of the existing literature addresses the first objective, focusing on the state of OSINT, its applications, and its challenges. This serves to identify existing research gaps and establish a solid foundation for an updated view of OSINT. Consequently, a critical part of the methodology involves assessing the potential security and privacy risks that could emerge from the misuse of OSINT by cybercriminals, including using AI to enhance cyberattacks, fulfilling the second objective. Thirdly, to provide practical evidence regarding the power of OSINT, we work in a Twitter use case in the context of the 2019 Spanish general election, designing and implementing OSINT methods to understand the behaviour and impact of automated accounts. Through AI and social media analysis, this process aims to detect social bots in the wild for further behaviour characterization and impact assessment, thus covering the third objective. The last effort is dedicated to the Dark web, reviewing different works in the literature related to the Tor network to identify and characterize the techniques for gathering onion addresses essential for accessing anonymous websites, completing the fourth objective. This comprehensive methodology led to the publication of five remarkable scientific papers in peer-reviewed journals, collectively forming the basis of this PhD thesis.
As main conclusions, this PhD thesis underlines the immense potential of OSINT as a strategic tool for problem-solving across many sectors. In the age of Big Data and AI, OSINT aids in deriving insights from vast, complex information sources such as social networks, online documents, web pages and even the corners of the Deep and Dark web. The practical use cases developed in this PhD thesis prove that incorporating OSINT into cybersecurity and cyberdefence is increasingly valuable. Social Media Intelligence (SOCMINT) helps to characterize social bots in disinformation contexts, which, in conjunction with AI, returns sophisticated results, such as the sentiment of organic content generated in social media or the political alignment of automated accounts. On the other hand, the Dark Web Intelligence (DARKINT) enables gathering the links of anonymous Dark web sites. However, we also expose in this PhD thesis that the development of OSINT carries its share of risks. Open data can be exploited for social engineering, spear-phishing, profiling, deception, blackmail, spreading disinformation or launching personalized attacks. Hence, the adoption of legal and ethical practices is also important.La recolección de inteligencia ha sufrido una transformación significativa durante la era digital. En particular, podemos destacar el auge y sofisticicación de la Inteligencia de Fuentes Abiertas (OSINT, por sus siglas en inglés de Open Source Intelligence), paradigma que recolecta y analiza la información públicamente disponible para objetivos estratégicos y planificados.
El cometido principal de esta tesis doctoral es motivar, justificar y demostrar que OSINT es un paradigma de referencia para complementar el presente y futuro de las soluciones de ciberseguridad civiles y las estrategias de ciberdefensa nacionales e internacionales. El primer objetivo es examinar y evaluar el estado de OSINT en el contexto actual de revolución digital y crecimiento del Big Data y la Inteligencia Artificial (IA). El segundo objetivo está orientado a categorizar los riesgos de seguridad y privacidad asociados con OSINT. El tercer objetivo se centra en aprovechar las ventajas de OSINT en casos de uso prácticos, diseñando e implementando técnicas de OSINT para contrarrestar amenazas online, particularmente aquellas provenientes de las redes sociales. El cuarto objetivo es explorar la Dark web, buscando identificar y evaluar técnicas existentes para descubrir las direcciones aleatorias de las páginas alojadas en la red Tor.
Para alcanzar estos objetivos seguimos una metodología con pasos ordenados. Primero, para abordar el primer objetivo, realizamos una revisión rigurosa de la literatura existente, centrándonos en el estado de OSINT, sus aplicaciones y sus desafíos. A continuación, en relación con el segundo objetivo, evaluamos los posibles riesgos de seguridad y privacidad que podrían surgir del mal uso de OSINT por parte de ciberdelincuentes, incluido el uso de IA para mejorar los ciberataques. En tercer lugar, para proporcionar evidencia práctica sobre el poder de OSINT, trabajamos en un caso de uso de Twitter en el contexto de las elecciones generales españolas de 2019, diseñando e implementando métodos de OSINT para entender el comportamiento y el impacto de las cuentas automatizadas. A través de la IA y el análisis de redes sociales, buscamos detectar bots sociales en Twitter para una posterior caracterización del comportamiento y evaluación del impacto, cubriendo así el tercer objetivo. Luego, dedicamos otra parte de la tesis al cuarto objetivo relacionado con la Dark web, revisando diferentes trabajos en la literatura de la red Tor para identificar y caracterizar las técnicas para recopilar direcciones onion, esenciales para acceder a sitios web anónimos de la red Tor. Esta metodología llevó a la publicación de cinco destacados artículos científicos en revistas revisadas por pares, formando colectivamente la base de esta tesis doctoral.
Como principales conclusiones, esta tesis doctoral subraya el inmenso potencial de OSINT como herramienta estratégica para resolver problemas en muchos sectores. En la era de Big Data e IA, OSINT extrae conocimiento a partir de grandes y complejas fuentes de información en abierto como redes sociales, documentos online, páginas web, e incluso en la Deep y Dark web. Por otro lado, los casos prácticos desarrollados evidencian que la incorporación de OSINT en ciberseguridad y ciberdefensa es cada vez más valiosa. La Inteligencia de Redes Sociales (SOCMINT, por sus siglas en inglés Social Media Intelligence) ayuda a caracterizar bots sociales en contextos de desinformación. Por su parte, la Inteligencia de la Web Oscura (DARKINT, por sus siglas en inglés Dark Web Intelligence) permite recopilar enlaces de sitios anónimos de la Dark web. Sin embargo, esta tesis expone como el desarrollo de OSINT lleva consigo una serie de riesgos. Los datos abiertos pueden ser explotados para ingeniería social, spear-phishing, perfilado, engaño, chantaje, difusión de desinformación o lanzamiento de ataques personalizados. Por lo tanto, la adopción de prácticas legales y éticas es también imprescindible
A Decade of Social Bot Detection
On the morning of November 9th 2016, the world woke up to the shocking
outcome of the US Presidential elections: Donald Trump was the 45th President
of the United States of America. An unexpected event that still has tremendous
consequences all over the world. Today, we know that a minority of social bots,
automated social media accounts mimicking humans, played a central role in
spreading divisive messages and disinformation, possibly contributing to
Trump's victory. In the aftermath of the 2016 US elections, the world started
to realize the gravity of widespread deception in social media. Following
Trump's exploit, we witnessed to the emergence of a strident dissonance between
the multitude of efforts for detecting and removing bots, and the increasing
effects that these malicious actors seem to have on our societies. This paradox
opens a burning question: What strategies should we enforce in order to stop
this social bot pandemic? In these times, during the run-up to the 2020 US
elections, the question appears as more crucial than ever. What stroke social,
political and economic analysts after 2016, deception and automation, has been
however a matter of study for computer scientists since at least 2010. In this
work, we briefly survey the first decade of research in social bot detection.
Via a longitudinal analysis, we discuss the main trends of research in the
fight against bots, the major results that were achieved, and the factors that
make this never-ending battle so challenging. Capitalizing on lessons learned
from our extensive analysis, we suggest possible innovations that could give us
the upper hand against deception and manipulation. Studying a decade of
endeavours at social bot detection can also inform strategies for detecting and
mitigating the effects of other, more recent, forms of online deception, such
as strategic information operations and political trolls.Comment: Forthcoming in Communications of the AC
A Deep Learning Approach for Robust Detection of Bots in Twitter Using Transformers
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksDuring the last decades, the volume of multimedia content posted in social networks has grown exponentially and such information is immediately propagated and consumed by a significant number of users. In this scenario, the disruption of fake news providers and bot accounts for spreading propaganda information as well as sensitive content throughout the network has fostered applied researh to automatically measure the reliability of social networks accounts via Artificial Intelligence (AI). In this paper, we present a multilingual approach for addressing the bot identification task in Twitter via Deep learning (DL) approaches to support end-users when checking the credibility of a certain Twitter account. To do so, several experiments were conducted using state-of-the-art Multilingual Language Models to generate an encoding of the text-based features of the user account that are later on concatenated with the rest of the metadata to build a potential input vector on top of a Dense Network denoted as Bot-DenseNet. Consequently, this paper assesses the language constraint from previous studies where the encoding of the user account only considered either the metadatainformation or the metadata information together with some basic semantic text features. Moreover, the Bot-DenseNet produces a low-dimensional representation of the user account which can be used for any application within the Information Retrieval (IR) framewor
ПРОБЛЕМЫ СОЦИАЛЬНОЙ ЭФФЕКТИВНОСТИ И ЗАЩИТЫ ПРАВ ЧЕЛОВЕКА ПРИ ИСПОЛЬЗОВАНИИ ИСКУССТВЕННОГО ИНТЕЛЛЕКТА В РАМКАХ СОЦИАЛЬНОГО СКОРИНГА
Статья посвящена исследованию социальной эффективности и защиты прав человека при использовании искусственного интеллекта в рамках социального скоринга. Социальный скоринг порождает ряд рисков – несанкционированного сбора персональных данных, вторжения в частную жизнь, дискриминации того или иного человека или социальной группы, незаконного использования персональных данных в рекламных и иных коммерческих целях.Автор полагает, что официальное использование социального скоринга государством или другими субъектами должно обязательно строиться на принципах информирования о проведении социального скоринга и использовании персональных данных граждан, получения согласия на социальный скоринг, а также открытости и доступности информации об основаниях использования системы; принципах ее работы; персональных данных и иной информации о гражданах, используемой для скоринга; возможностях корректировки данных и оспаривания результатов скоринга. Важно запрещать неофициальное, скрытое использование социального скоринга и устанавливать ответственность за нарушение этих требований и нарушение прав человека и гражданина.Делается вывод, что риск использования сформированных в рамках социального скоринга данных для влияния на принятие человеком того или иного решения, обусловливает необходимость законодательного ограничения таких действий, если только сам человек явным образом не выразил желание, чтобы полученная информация использовалась для таких целей
Uncovering Coordinated Networks on Social Media
Coordinated campaigns are used to influence and manipulate social media
platforms and their users, a critical challenge to the free exchange of
information online. Here we introduce a general network-based framework to
uncover groups of accounts that are likely coordinated. The proposed method
construct coordination networks based on arbitrary behavioral traces shared
among accounts. We present five case studies of influence campaigns in the
diverse contexts of U.S. elections, Hong Kong protests, the Syrian civil war,
and cryptocurrencies. In each of these cases, we detect networks of coordinated
Twitter accounts by examining their identities, images, hashtag sequences,
retweets, and temporal patterns. The proposed framework proves to be broadly
applicable to uncover different kinds of coordination across information
warfare scenarios
- …