27,535 research outputs found
Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models
The advent of social media has given rise to numerous ethical challenges,
with hate speech among the most significant concerns. Researchers are
attempting to tackle this problem by leveraging hate-speech detection and
employing language models to automatically moderate content and promote civil
discourse. Unfortunately, recent studies have revealed that hate-speech
detection systems can be misled by adversarial attacks, raising concerns about
their resilience. While previous research has separately addressed the
robustness of these models under adversarial attacks and their
interpretability, there has been no comprehensive study exploring their
intersection. The novelty of our work lies in combining these two critical
aspects, leveraging interpretability to identify potential vulnerabilities and
enabling the design of targeted adversarial attacks. We present a comprehensive
and comparative analysis of adversarial robustness exhibited by various
hate-speech detection models. Our study evaluates the resilience of these
models against adversarial attacks using explainability techniques. To gain
insights into the models' decision-making processes, we employ the Local
Interpretable Model-agnostic Explanations (LIME) framework. Based on the
explainability results obtained by LIME, we devise and execute targeted attacks
on the text by leveraging the TextAttack tool. Our findings enhance the
understanding of the vulnerabilities and strengths exhibited by
state-of-the-art hate-speech detection models. This work underscores the
importance of incorporating explainability in the development and evaluation of
such models to enhance their resilience against adversarial attacks.
Ultimately, this work paves the way for creating more robust and reliable
hate-speech detection systems, fostering safer online environments and
promoting ethical discourse on social media platforms
CEPS Task Force on Artificial Intelligence and Cybersecurity Technology, Governance and Policy Challenges Task Force Evaluation of the HLEG Trustworthy AI Assessment List (Pilot Version). CEPS Task Force Report 22 January 2020
The Centre for European Policy Studies launched a Task Force on Artificial Intelligence (AI) and
Cybersecurity in September 2019. The goal of this Task Force is to bring attention to the market,
technical, ethical and governance challenges posed by the intersection of AI and cybersecurity,
focusing both on AI for cybersecurity but also cybersecurity for AI. The Task Force is multi-stakeholder
by design and composed of academics, industry players from various sectors, policymakers and civil
society.
The Task Force is currently discussing issues such as the state and evolution of the application of AI
in cybersecurity and cybersecurity for AI; the debate on the role that AI could play in the dynamics
between cyber attackers and defenders; the increasing need for sharing information on threats and
how to deal with the vulnerabilities of AI-enabled systems; options for policy experimentation; and
possible EU policy measures to ease the adoption of AI in cybersecurity in Europe.
As part of such activities, this report aims at assessing the High-Level Expert Group (HLEG) on AI Ethics
Guidelines for Trustworthy AI, presented on April 8, 2019. In particular, this report analyses and
makes suggestions on the Trustworthy AI Assessment List (Pilot version), a non-exhaustive list aimed
at helping the public and the private sector in operationalising Trustworthy AI. The list is composed
of 131 items that are supposed to guide AI designers and developers throughout the process of
design, development, and deployment of AI, although not intended as guidance to ensure
compliance with the applicable laws. The list is in its piloting phase and is currently undergoing a
revision that will be finalised in early 2020.
This report would like to contribute to this revision by addressing in particular the interplay between
AI and cybersecurity. This evaluation has been made according to specific criteria: whether and how
the items of the Assessment List refer to existing legislation (e.g. GDPR, EU Charter of Fundamental
Rights); whether they refer to moral principles (but not laws); whether they consider that AI attacks
are fundamentally different from traditional cyberattacks; whether they are compatible with
different risk levels; whether they are flexible enough in terms of clear/easy measurement,
implementation by AI developers and SMEs; and overall, whether they are likely to create obstacles
for the industry.
The HLEG is a diverse group, with more than 50 members representing different stakeholders, such
as think tanks, academia, EU Agencies, civil society, and industry, who were given the difficult task of
producing a simple checklist for a complex issue. The public engagement exercise looks successful
overall in that more than 450 stakeholders have signed in and are contributing to the process.
The next sections of this report present the items listed by the HLEG followed by the analysis and
suggestions raised by the Task Force (see list of the members of the Task Force in Annex 1)
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
On Security and Sparsity of Linear Classifiers for Adversarial Settings
Machine-learning techniques are widely used in security-related applications,
like spam and malware detection. However, in such settings, they have been
shown to be vulnerable to adversarial attacks, including the deliberate
manipulation of data at test time to evade detection. In this work, we focus on
the vulnerability of linear classifiers to evasion attacks. This can be
considered a relevant problem, as linear classifiers have been increasingly
used in embedded systems and mobile devices for their low processing time and
memory requirements. We exploit recent findings in robust optimization to
investigate the link between regularization and security of linear classifiers,
depending on the type of attack. We also analyze the relationship between the
sparsity of feature weights, which is desirable for reducing processing cost,
and the security of linear classifiers. We further propose a novel octagonal
regularizer that allows us to achieve a proper trade-off between them. Finally,
we empirically show how this regularizer can improve classifier security and
sparsity in real-world application examples including spam and malware
detection
- …