11 research outputs found
Query-Efficient Black-Box Attack by Active Learning
Deep neural network (DNN) as a popular machine learning model is found to be
vulnerable to adversarial attack. This attack constructs adversarial examples
by adding small perturbations to the raw input, while appearing unmodified to
human eyes but will be misclassified by a well-trained classifier. In this
paper, we focus on the black-box attack setting where attackers have almost no
access to the underlying models. To conduct black-box attack, a popular
approach aims to train a substitute model based on the information queried from
the target DNN. The substitute model can then be attacked using existing
white-box attack approaches, and the generated adversarial examples will be
used to attack the target DNN. Despite its encouraging results, this approach
suffers from poor query efficiency, i.e., attackers usually needs to query a
huge amount of times to collect enough information for training an accurate
substitute model. To this end, we first utilize state-of-the-art white-box
attack methods to generate samples for querying, and then introduce an active
learning strategy to significantly reduce the number of queries needed.
Besides, we also propose a diversity criterion to avoid the sampling bias. Our
extensive experimental results on MNIST and CIFAR-10 show that the proposed
method can reduce more than of queries while preserve attacking success
rates and obtain an accurate substitute model which is more than similar
with the target oracle.Comment: 9 page
A geometry-inspired decision-based attack
Deep neural networks have recently achieved tremendous success in image
classification. Recent studies have however shown that they are easily misled
into incorrect classification decisions by adversarial examples. Adversaries
can even craft attacks by querying the model in black-box settings, where no
information about the model is released except its final decision. Such
decision-based attacks usually require lots of queries, while real-world image
recognition systems might actually restrict the number of queries. In this
paper, we propose qFool, a novel decision-based attack algorithm that can
generate adversarial examples using a small number of queries. The qFool method
can drastically reduce the number of queries compared to previous
decision-based attacks while reaching the same quality of adversarial examples.
We also enhance our method by constraining adversarial perturbations in
low-frequency subspace, which can make qFool even more computationally
efficient. Altogether, we manage to fool commercial image recognition systems
with a small number of queries, which demonstrates the actual effectiveness of
our new algorithm in practice
Projection & Probability-Driven Black-Box Attack
Generating adversarial examples in a black-box setting retains a significant
challenge with vast practical application prospects. In particular, existing
black-box attacks suffer from the need for excessive queries, as it is
non-trivial to find an appropriate direction to optimize in the
high-dimensional space. In this paper, we propose Projection &
Probability-driven Black-box Attack (PPBA) to tackle this problem by reducing
the solution space and providing better optimization. For reducing the solution
space, we first model the adversarial perturbation optimization problem as a
process of recovering frequency-sparse perturbations with compressed sensing,
under the setting that random noise in the low-frequency space is more likely
to be adversarial. We then propose a simple method to construct a low-frequency
constrained sensing matrix, which works as a plug-and-play projection matrix to
reduce the dimensionality. Such a sensing matrix is shown to be flexible enough
to be integrated into existing methods like NES and Bandits. For better
optimization, we perform a random walk with a probability-driven strategy,
which utilizes all queries over the whole progress to make full use of the
sensing matrix for a less query budget. Extensive experiments show that our
method requires at most 24% fewer queries with a higher attack success rate
compared with state-of-the-art approaches. Finally, the attack method is
evaluated on the real-world online service, i.e., Google Cloud Vision API,
which further demonstrates our practical potentials.Comment: CVPR202
Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss
Recent studies have highlighted that deep neural networks (DNNs) are
vulnerable to adversarial examples. In this paper, we improve the robustness of
DNNs by utilizing techniques of Distance Metric Learning. Specifically, we
incorporate Triplet Loss, one of the most popular Distance Metric Learning
methods, into the framework of adversarial training. Our proposed algorithm,
Adversarial Training with Triplet Loss (ATL), substitutes the adversarial
example against the current model for the anchor of triplet loss to effectively
smooth the classification boundary. Furthermore, we propose an ensemble version
of ATL, which aggregates different attack methods and model structures for
better defense effects. Our empirical studies verify that the proposed approach
can significantly improve the robustness of DNNs without sacrificing accuracy.
Finally, we demonstrate that our specially designed triplet loss can also be
used as a regularization term to enhance other defense methods
Stealing Black-Box Functionality Using The Deep Neural Tree Architecture
This paper makes a substantial step towards cloning the functionality of
black-box models by introducing a Machine learning (ML) architecture named Deep
Neural Trees (DNTs). This new architecture can learn to separate different
tasks of the black-box model, and clone its task-specific behavior. We propose
to train the DNT using an active learning algorithm to obtain faster and more
sample-efficient training. In contrast to prior work, we study a complex
"victim" black-box model based solely on input-output interactions, while at
the same time the attacker and the victim model may have completely different
internal architectures. The attacker is a ML based algorithm whereas the victim
is a generally unknown module, such as a multi-purpose digital chip, complex
analog circuit, mechanical system, software logic or a hybrid of these. The
trained DNT module not only can function as the attacked module, but also
provides some level of explainability to the cloned model due to the tree-like
nature of the proposed architecture.Comment: 8 pages, 7 figures, 1 tabl
Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete Space
In this paper, we focus on reducing the labeled data size for sentence
learning. We argue that real-time uncertainty sampling of active learning is
time-consuming, and delayed uncertainty sampling may lead to the ineffective
sampling problem. We propose the adversarial uncertainty sampling in discrete
space, in which sentences are mapped into the popular pre-trained language
model encoding space. Our proposed approach can work in real-time and is more
efficient than traditional uncertainty sampling. Experimental results on five
datasets show that our proposed approach outperforms strong baselines and can
achieve better uncertainty sampling effectiveness with acceptable running time.Comment: 10 pages, 3 figures, 4 table
Extraction of Complex DNN Models: Real Threat or Boogeyman?
Recently, machine learning (ML) has introduced advanced solutions to many
domains. Since ML models provide business advantage to model owners, protecting
intellectual property of ML models has emerged as an important consideration.
Confidentiality of ML models can be protected by exposing them to clients only
via prediction APIs. However, model extraction attacks can steal the
functionality of ML models using the information leaked to clients through the
results returned via the API. In this work, we question whether model
extraction is a serious threat to complex, real-life ML models. We evaluate the
current state-of-the-art model extraction attack (Knockoff nets) against
complex models. We reproduce and confirm the results in the original paper. But
we also show that the performance of this attack can be limited by several
factors, including ML model architecture and the granularity of API response.
Furthermore, we introduce a defense based on distinguishing queries used for
Knockoff nets from benign queries. Despite the limitations of the Knockoff
nets, we show that a more realistic adversary can effectively steal complex ML
models and evade known defenses.Comment: 16 pages, 1 figure, Accepted for publication in AAAI-20 Workshop on
Engineering Dependable and Secure Machine Learning Systems (AAAI-EDSMLS 2020
Making targeted black-box evasion attacks effective and efficient
We investigate how an adversary can optimally use its query budget for
targeted evasion attacks against deep neural networks in a black-box setting.
We formalize the problem setting and systematically evaluate what benefits the
adversary can gain by using substitute models. We show that there is an
exploration-exploitation tradeoff in that query efficiency comes at the cost of
effectiveness. We present two new attack strategies for using substitute models
and show that they are as effective as previous query-only techniques but
require significantly fewer queries, by up to three orders of magnitude. We
also show that an agile adversary capable of switching through different attack
techniques can achieve pareto-optimal efficiency. We demonstrate our attack
against Google Cloud Vision showing that the difficulty of black-box attacks
against real-world prediction APIs is significantly easier than previously
thought (requiring approximately 500 queries instead of approximately 20,000 as
in previous works).Comment: 12 pages, 10 figure
Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries
We study adversarial examples in a black-box setting where the adversary only
has API access to the target model and each query is expensive. Prior work on
black-box adversarial examples follows one of two main strategies: (1) transfer
attacks use white-box attacks on local models to find candidate adversarial
examples that transfer to the target model, and (2) optimization-based attacks
use queries to the target model and apply optimization techniques to search for
adversarial examples. We propose hybrid attacks that combine both strategies,
using candidate adversarial examples from local models as starting points for
optimization-based attacks and using labels learned in optimization-based
attacks to tune local models for finding transfer candidates. We empirically
demonstrate on the MNIST, CIFAR10, and ImageNet datasets that our hybrid attack
strategy reduces cost and improves success rates. We also introduce a seed
prioritization strategy which enables attackers to focus their resources on the
most promising seeds. Combining hybrid attacks with our seed prioritization
strategy enables batch attacks that can reliably find adversarial examples with
only a handful of queries.Comment: USENIX Security 2020 camera-ready version, Code available at:
https://github.com/suyeecav/Hybrid-Attac
Security and Privacy for Artificial Intelligence: Opportunities and Challenges
The increased adoption of Artificial Intelligence (AI) presents an
opportunity to solve many socio-economic and environmental challenges; however,
this cannot happen without securing AI-enabled technologies. In recent years,
most AI models are vulnerable to advanced and sophisticated hacking techniques.
This challenge has motivated concerted research efforts into adversarial AI,
with the aim of developing robust machine and deep learning models that are
resilient to different types of adversarial scenarios. In this paper, we
present a holistic cyber security review that demonstrates adversarial attacks
against AI applications, including aspects such as adversarial knowledge and
capabilities, as well as existing methods for generating adversarial examples
and existing cyber defence models. We explain mathematical AI models,
especially new variants of reinforcement and federated learning, to demonstrate
how attack vectors would exploit vulnerabilities of AI models. We also propose
a systematic framework for demonstrating attack techniques against AI
applications and reviewed several cyber defences that would protect AI
applications against those attacks. We also highlight the importance of
understanding the adversarial goals and their capabilities, especially the
recent attacks against industry applications, to develop adaptive defences that
assess to secure AI applications. Finally, we describe the main challenges and
future research directions in the domain of security and privacy of AI
technologies