Search CORE

27,535 research outputs found

Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models

Author: Dorr Bonnie J.
Harish Suhas
Kumbam Pranath Reddy
Perera Ian
Syed Sohaib Uddin
Thamminedi Prashanth
Publication venue
Publication date: 29/05/2023
Field of study

The advent of social media has given rise to numerous ethical challenges, with hate speech among the most significant concerns. Researchers are attempting to tackle this problem by leveraging hate-speech detection and employing language models to automatically moderate content and promote civil discourse. Unfortunately, recent studies have revealed that hate-speech detection systems can be misled by adversarial attacks, raising concerns about their resilience. While previous research has separately addressed the robustness of these models under adversarial attacks and their interpretability, there has been no comprehensive study exploring their intersection. The novelty of our work lies in combining these two critical aspects, leveraging interpretability to identify potential vulnerabilities and enabling the design of targeted adversarial attacks. We present a comprehensive and comparative analysis of adversarial robustness exhibited by various hate-speech detection models. Our study evaluates the resilience of these models against adversarial attacks using explainability techniques. To gain insights into the models' decision-making processes, we employ the Local Interpretable Model-agnostic Explanations (LIME) framework. Based on the explainability results obtained by LIME, we devise and execute targeted attacks on the text by leveraging the TextAttack tool. Our findings enhance the understanding of the vulnerabilities and strengths exhibited by state-of-the-art hate-speech detection models. This work underscores the importance of incorporating explainability in the development and evaluation of such models to enhance their resilience against adversarial attacks. Ultimately, this work paves the way for creating more robust and reliable hate-speech detection systems, fostering safer online environments and promoting ethical discourse on social media platforms

arXiv.org e-Print Archive

CEPS Task Force on Artificial Intelligence and Cybersecurity Technology, Governance and Policy Challenges Task Force Evaluation of the HLEG Trustworthy AI Assessment List (Pilot Version). CEPS Task Force Report 22 January 2020

Author: Fantin Stephano
Ferreira Afonso
Pupillo Lorenzo
Publication venue
Publication date: 01/01/2020
Field of study

The Centre for European Policy Studies launched a Task Force on Artificial Intelligence (AI) and Cybersecurity in September 2019. The goal of this Task Force is to bring attention to the market, technical, ethical and governance challenges posed by the intersection of AI and cybersecurity, focusing both on AI for cybersecurity but also cybersecurity for AI. The Task Force is multi-stakeholder by design and composed of academics, industry players from various sectors, policymakers and civil society. The Task Force is currently discussing issues such as the state and evolution of the application of AI in cybersecurity and cybersecurity for AI; the debate on the role that AI could play in the dynamics between cyber attackers and defenders; the increasing need for sharing information on threats and how to deal with the vulnerabilities of AI-enabled systems; options for policy experimentation; and possible EU policy measures to ease the adoption of AI in cybersecurity in Europe. As part of such activities, this report aims at assessing the High-Level Expert Group (HLEG) on AI Ethics Guidelines for Trustworthy AI, presented on April 8, 2019. In particular, this report analyses and makes suggestions on the Trustworthy AI Assessment List (Pilot version), a non-exhaustive list aimed at helping the public and the private sector in operationalising Trustworthy AI. The list is composed of 131 items that are supposed to guide AI designers and developers throughout the process of design, development, and deployment of AI, although not intended as guidance to ensure compliance with the applicable laws. The list is in its piloting phase and is currently undergoing a revision that will be finalised in early 2020. This report would like to contribute to this revision by addressing in particular the interplay between AI and cybersecurity. This evaluation has been made according to specific criteria: whether and how the items of the Assessment List refer to existing legislation (e.g. GDPR, EU Charter of Fundamental Rights); whether they refer to moral principles (but not laws); whether they consider that AI attacks are fundamentally different from traditional cyberattacks; whether they are compatible with different risk levels; whether they are flexible enough in terms of clear/easy measurement, implementation by AI developers and SMEs; and overall, whether they are likely to create obstacles for the industry. The HLEG is a diverse group, with more than 50 members representing different stakeholders, such as think tanks, academia, EU Agencies, civil society, and industry, who were given the difficult task of producing a simple checklist for a complex issue. The public engagement exercise looks successful overall in that more than 450 stakeholders have signed in and are contributing to the process. The next sections of this report present the items listed by the HLEG followed by the analysis and suggestions raised by the Task Force (see list of the members of the Task Force in Annex 1)

Archive of European Integration

Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks

Author: Biggio Battista
Giacinto Giorgio
Maiorca Davide
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such threats, but the existence of an arms race in adversarial settings has recently challenged such systems. In this work, we focus on malware embedded in PDF files as a representative case of such an arms race. We start by providing a comprehensive taxonomy of the different approaches used to generate PDF malware, and of the corresponding learning-based detection systems. We then categorize threats specifically targeted against learning-based PDF malware detectors, using a well-established framework in the field of adversarial machine learning. This framework allows us to categorize known vulnerabilities of learning-based PDF malware detectors and to identify novel attacks that may threaten such systems, along with the potential defense mechanisms that can mitigate the impact of such threats. We conclude the paper by discussing how such findings highlight promising research directions towards tackling the more general challenge of designing robust malware detectors in adversarial settings

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

On Security and Sparsity of Linear Classifiers for Adversarial Settings

Author: B Biggio
B Biggio
B Biggio
B Biggio
C Cortes
D Maiorca
F Sebastiani
F Zhang
H Xu
H Zou
R Bondell
S Sra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Machine-learning techniques are widely used in security-related applications, like spam and malware detection. However, in such settings, they have been shown to be vulnerable to adversarial attacks, including the deliberate manipulation of data at test time to evade detection. In this work, we focus on the vulnerability of linear classifiers to evasion attacks. This can be considered a relevant problem, as linear classifiers have been increasingly used in embedded systems and mobile devices for their low processing time and memory requirements. We exploit recent findings in robust optimization to investigate the link between regularization and security of linear classifiers, depending on the type of attack. We also analyze the relationship between the sparsity of feature weights, which is desirable for reducing processing cost, and the security of linear classifiers. We further propose a novel octagonal regularizer that allows us to achieve a proper trade-off between them. Finally, we empirically show how this regularizer can improve classifier security and sparsity in real-world application examples including spam and malware detection

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova