23 research outputs found
Disinformation 2.0 in the Age of AI: A Cybersecurity Perspective
With the explosive advancement of AI technologies in recent years, the scene
of the disinformation research is also expected to rapidly change. In this
viewpoint article, in particular, we first present the notion of
"disinformation 2.0" in the age of AI where disinformation would become more
targeted and personalized, its content becomes very difficult to distinguish
from real news, and its creation and dissemination become more accelerated by
AI. Then, we discuss how disinformation 2.0 and cybersecurity fit and a
possible layered countermeasure to address the threat in disinformation 2.0 in
a holistic manner
Detecting Social Media Manipulation in Low-Resource Languages
Social media have been deliberately used for malicious purposes, including
political manipulation and disinformation. Most research focuses on
high-resource languages. However, malicious actors share content across
countries and languages, including low-resource ones. Here, we investigate
whether and to what extent malicious actors can be detected in low-resource
language settings. We discovered that a high number of accounts posting in
Tagalog were suspended as part of Twitter's crackdown on interference
operations after the 2016 US Presidential election. By combining text embedding
and transfer learning, our framework can detect, with promising accuracy,
malicious users posting in Tagalog without any prior knowledge or training on
malicious content in that language. We first learn an embedding model for each
language, namely a high-resource language (English) and a low-resource one
(Tagalog), independently. Then, we learn a mapping between the two latent
spaces to transfer the detection model. We demonstrate that the proposed
approach significantly outperforms state-of-the-art models, including BERT, and
yields marked advantages in settings with very limited training data-the norm
when dealing with detecting malicious activity in online platforms
How Does Twitter Account Moderation Work? Dynamics of Account Creation and Suspension During Major Geopolitical Events
Social media moderation policies are often at the center of public debate,
and their implementation and enactment are sometimes surrounded by a veil of
mystery. Unsurprisingly, due to limited platform transparency and data access,
relatively little research has been devoted to characterizing moderation
dynamics, especially in the context of controversial events and the platform
activity associated with them. Here, we study the dynamics of account creation
and suspension on Twitter during two global political events: Russia's invasion
of Ukraine and the 2022 French Presidential election. Leveraging a large-scale
dataset of 270M tweets shared by 16M users in multiple languages over several
months, we identify peaks of suspicious account creation and suspension, and we
characterize behaviours that more frequently lead to account suspension. We
show how large numbers of accounts get suspended within days from their
creation. Suspended accounts tend to mostly interact with legitimate users, as
opposed to other suspicious accounts, often making unwarranted and excessive
use of reply and mention features, and predominantly sharing spam and harmful
content. While we are only able to speculate about the specific causes leading
to a given account suspension, our findings shed light on patterns of platform
abuse and subsequent moderation during major events
Factuality Challenges in the Era of Large Language Models
The emergence of tools based on Large Language Models (LLMs), such as
OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered
immense public attention. These incredibly useful, natural-sounding tools mark
significant advances in natural language generation, yet they exhibit a
propensity to generate false, erroneous, or misleading content -- commonly
referred to as "hallucinations." Moreover, LLMs can be exploited for malicious
applications, such as generating false but credible-sounding content and
profiles at scale. This poses a significant challenge to society in terms of
the potential deception of users and the increasing dissemination of inaccurate
information. In light of these risks, we explore the kinds of technological
innovations, regulatory reforms, and AI literacy initiatives needed from
fact-checkers, news organizations, and the broader research and policy
communities. By identifying the risks, the imminent threats, and some viable
solutions, we seek to shed light on navigating various aspects of veracity in
the era of generative AI.Comment: Our article offers a comprehensive examination of the challenges and
risks associated with Large Language Models (LLMs), focusing on their
potential impact on the veracity of information in today's digital landscap
Spotting political social bots in Twitter: A use case of the 2019 Spanish general election
While social media has been proved as an exceptionally useful tool to
interact with other people and massively and quickly spread helpful
information, its great potential has been ill-intentionally leveraged as well
to distort political elections and manipulate constituents. In the paper at
hand, we analyzed the presence and behavior of social bots on Twitter in the
context of the November 2019 Spanish general election. Throughout our study, we
classified involved users as social bots or humans, and examined their
interactions from a quantitative (i.e., amount of traffic generated and
existing relations) and qualitative (i.e., user's political affinity and
sentiment towards the most important parties) perspectives. Results
demonstrated that a non-negligible amount of those bots actively participated
in the election, supporting each of the five principal political parties
Online Deep Learning from Doubly-Streaming Data
This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. A plausible idea to deal with such data streams is to establish a relationship between the old and new feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional multimedia data with complex feature interplay, which suffers a tradeoff between onlineness, which biases shallow learners, and expressiveness, which requires deep models. Motivated by this, we propose a novel OLD3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building an intermediate feature mapping relationship. A key trait of OLD3S is to treat the model capacity as a learnable semantics, aiming to yield optimal model depth and parameters jointly in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analysis and empirical studies substantiate the viability and effectiveness of our proposed approach. The code is available online at https://github.com/X1aoLian/OLD3S