1,919 research outputs found
Stealing Links from Graph Neural Networks
Graph data, such as chemical networks and social networks, may be deemed
confidential/private because the data owner often spends lots of resources
collecting the data or the data contains sensitive information, e.g., social
relationships. Recently, neural networks were extended to graph data, which are
known as graph neural networks (GNNs). Due to their superior performance, GNNs
have many applications, such as healthcare analytics, recommender systems, and
fraud detection. In this work, we propose the first attacks to steal a graph
from the outputs of a GNN model that is trained on the graph. Specifically,
given a black-box access to a GNN model, our attacks can infer whether there
exists a link between any pair of nodes in the graph used to train the model.
We call our attacks link stealing attacks. We propose a threat model to
systematically characterize an adversary's background knowledge along three
dimensions which in total leads to a comprehensive taxonomy of 8 different link
stealing attacks. We propose multiple novel methods to realize these 8 attacks.
Extensive experiments on 8 real-world datasets show that our attacks are
effective at stealing links, e.g., AUC (area under the ROC curve) is above 0.95
in multiple cases. Our results indicate that the outputs of a GNN model reveal
rich information about the structure of the graph used to train the model.Comment: To appear in the 30th Usenix Security Symposium, August 2021,
Vancouver, B.C., Canad
Uncertainty-Matching Graph Neural Networks to Defend Against Poisoning Attacks
Graph Neural Networks (GNNs), a generalization of neural networks to
graph-structured data, are often implemented using message passes between
entities of a graph. While GNNs are effective for node classification, link
prediction and graph classification, they are vulnerable to adversarial
attacks, i.e., a small perturbation to the structure can lead to a non-trivial
performance degradation. In this work, we propose Uncertainty Matching GNN
(UM-GNN), that is aimed at improving the robustness of GNN models, particularly
against poisoning attacks to the graph structure, by leveraging epistemic
uncertainties from the message passing framework. More specifically, we propose
to build a surrogate predictor that does not directly access the graph
structure, but systematically extracts reliable knowledge from a standard GNN
through a novel uncertainty-matching strategy. Interestingly, this uncoupling
makes UM-GNN immune to evasion attacks by design, and achieves significantly
improved robustness against poisoning attacks. Using empirical studies with
standard benchmarks and a suite of global and target attacks, we demonstrate
the effectiveness of UM-GNN, when compared to existing baselines including the
state-of-the-art robust GCN
10 Security and Privacy Problems in Self-Supervised Learning
Self-supervised learning has achieved revolutionary progress in the past
several years and is commonly believed to be a promising approach for
general-purpose AI. In particular, self-supervised learning aims to pre-train
an encoder using a large amount of unlabeled data. The pre-trained encoder is
like an "operating system" of the AI ecosystem. Specifically, the encoder can
be used as a feature extractor for many downstream tasks with little or no
labeled training data. Existing studies on self-supervised learning mainly
focused on pre-training a better encoder to improve its performance on
downstream tasks in non-adversarial settings, leaving its security and privacy
in adversarial settings largely unexplored. A security or privacy issue of a
pre-trained encoder leads to a single point of failure for the AI ecosystem. In
this book chapter, we discuss 10 basic security and privacy problems for the
pre-trained encoders in self-supervised learning, including six confidentiality
problems, three integrity problems, and one availability problem. For each
problem, we discuss potential opportunities and challenges. We hope our book
chapter will inspire future research on the security and privacy of
self-supervised learning.Comment: A book chapte
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning
The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones, assuming that it is sufficiently representative of the data that will be encountered at test time. This assumption is challenged by the threat of poisoning, an attack that manipulates the training data to compromise the model's performance at test time. Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 100 papers published in the field in the last 15 years. We start by categorizing the current threat models and attacks, and then organize existing defenses accordingly. While we focus mostly on computer-vision applications, we argue that our systematization also encompasses state-of-the-art attacks and defenses for other data modalities. Finally, we discuss existing resources for research in poisoning, and shed light on the current limitations and open research questions in this research field
- …