9 research outputs found
The Ethical Need for Watermarks in Machine-Generated Language
Watermarks should be introduced in the natural language outputs of AI systems
in order to maintain the distinction between human and machine-generated text.
The ethical imperative to not blur this distinction arises from the asemantic
nature of large language models and from human projections of emotional and
cognitive states on machines, possibly leading to manipulation, spreading
falsehoods or emotional distress. Enforcing this distinction requires
unintrusive, yet easily accessible marks of the machine origin. We propose to
implement a code based on equidistant letter sequences. While no such code
exists in human-written texts, its appearance in machine-generated ones would
prove helpful for ethical reasons
Extracted BERT Model Leaks More Information than You Think!
The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulating fine-tuned BERT-based models as APIs. Due to significant commercial interest, there has been a surge of attempts to steal remote services via model extraction. Although previous works have made progress in defending against model extraction attacks, there has been little discussion on their performance in preventing privacy leakage. This work bridges this gap by launching an attribute inference attack against the extracted BERT model. Our extensive experiments reveal that model extraction can cause severe privacy leakage even when victim models are facilitated with advanced defensive strategies
Single-Node Attack for Fooling Graph Neural Networks
Graph neural networks (GNNs) have shown broad applicability in a variety of
domains. Some of these domains, such as social networks and product
recommendations, are fertile ground for malicious users and behavior. In this
paper, we show that GNNs are vulnerable to the extremely limited scenario of a
single-node adversarial example, where the node cannot be picked by the
attacker. That is, an attacker can force the GNN to classify any target node to
a chosen label by only slightly perturbing another single arbitrary node in the
graph, even when not being able to pick that specific attacker node. When the
adversary is allowed to pick a specific attacker node, the attack is even more
effective. We show that this attack is effective across various GNN types, such
as GraphSAGE, GCN, GAT, and GIN, across a variety of real-world datasets, and
as a targeted and a non-targeted attack. Our code is available at
https://github.com/benfinkelshtein/SINGLE
Red Teaming Language Model Detectors with Language Models
The prevalence and strong capability of large language models (LLMs) present
significant safety and ethical risks if exploited by malicious users. To
prevent the potentially deceptive usage of LLMs, recent works have proposed
algorithms to detect LLM-generated text and protect LLMs. In this paper, we
investigate the robustness and reliability of these LLM detectors under
adversarial attacks. We study two types of attack strategies: 1) replacing
certain words in an LLM's output with their synonyms given the context; 2)
automatically searching for an instructional prompt to alter the writing style
of the generation. In both strategies, we leverage an auxiliary LLM to generate
the word replacements or the instructional prompt. Different from previous
works, we consider a challenging setting where the auxiliary LLM can also be
protected by a detector. Experiments reveal that our attacks effectively
compromise the performance of all detectors in the study with plausible
generations, underscoring the urgent need to improve the robustness of
LLM-generated text detection systems.Comment: Preprint. Accepted by TAC
Recommended from our members
CROSS: a framework for cyber risk optimisation in smart homes
This work introduces a decision support framework, called Cyber Risk Optimiser for Smart homeS (CROSS), which advises both smart home users and smart home service providers on how to select an optimal portfolio of cyber security controls to counteract cyber attacks in a smart home including traditional cyber attacks and adversarial machine learning attacks. CROSS is based on a multi-objective bi-level two-stage optimisation. In stage-one optimisation, the problem is modelled as a multi-leader-follower game that considers both security and economic objectives, where the provider selects a security portfolio to protect both itself and its users, while rational attackers target the weakest path. Stage-two optimisation is a Stackelberg security game that focuses on additional user security controls under the remit of smart home users. While CROSS can potentially be applied to other similar use cases, in this paper, our aim is to address threats against artificial intelligence (AI) applications as the use of AI in smart Internet of Things (IoT) devices introduces new cyber threats to home environments. Specifically, we have implemented and assessed CROSS in a smart heating use case in a prototypical AI-enabled IoT environment that combines characteristics and vulnerabilities currently present on existing commercial off-the-shelf (COTS) devices, demonstrating the selection of optimal decisions