773 research outputs found
On the Design of Black-box Adversarial Examples by Leveraging Gradient-free Optimization and Operator Splitting Method
Robust machine learning is currently one of the most prominent topics which
could potentially help shaping a future of advanced AI platforms that not only
perform well in average cases but also in worst cases or adverse situations.
Despite the long-term vision, however, existing studies on black-box
adversarial attacks are still restricted to very specific settings of threat
models (e.g., single distortion metric and restrictive assumption on target
model's feedback to queries) and/or suffer from prohibitively high query
complexity. To push for further advances in this field, we introduce a general
framework based on an operator splitting method, the alternating direction
method of multipliers (ADMM) to devise efficient, robust black-box attacks that
work with various distortion metrics and feedback settings without incurring
high query complexity. Due to the black-box nature of the threat model, the
proposed ADMM solution framework is integrated with zeroth-order (ZO)
optimization and Bayesian optimization (BO), and thus is applicable to the
gradient-free regime. This results in two new black-box adversarial attack
generation methods, ZO-ADMM and BO-ADMM. Our empirical evaluations on image
classification datasets show that our proposed approaches have much lower
function query complexities compared to state-of-the-art attack methods, but
achieve very competitive attack success rates.Comment: accepted by ICCV 201
Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent
Despite the great achievements of the modern deep neural networks (DNNs), the
vulnerability/robustness of state-of-the-art DNNs raises security concerns in
many application domains requiring high reliability. Various adversarial
attacks are proposed to sabotage the learning performance of DNN models. Among
those, the black-box adversarial attack methods have received special
attentions owing to their practicality and simplicity. Black-box attacks
usually prefer less queries in order to maintain stealthy and low costs.
However, most of the current black-box attack methods adopt the first-order
gradient descent method, which may come with certain deficiencies such as
relatively slow convergence and high sensitivity to hyper-parameter settings.
In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD)
method to design the adversarial attacks, which incorporates the zeroth-order
gradient estimation technique catering to the black-box attack scenario and the
second-order natural gradient descent to achieve higher query efficiency. The
empirical evaluations on image classification datasets demonstrate that ZO-NGD
can obtain significantly lower model query complexities compared with
state-of-the-art attack methods.Comment: accepted by AAAI 202
You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle
Deep learning achieves state-of-the-art results in many tasks in computer
vision and natural language processing. However, recent works have shown that
deep networks can be vulnerable to adversarial perturbations, which raised a
serious robustness issue of deep networks. Adversarial training, typically
formulated as a robust optimization problem, is an effective way of improving
the robustness of deep networks. A major drawback of existing adversarial
training algorithms is the computational overhead of the generation of
adversarial examples, typically far greater than that of the network training.
This leads to the unbearable overall computational cost of adversarial
training. In this paper, we show that adversarial training can be cast as a
discrete time differential game. Through analyzing the Pontryagin's Maximal
Principle (PMP) of the problem, we observe that the adversary update is only
coupled with the parameters of the first layer of the network. This inspires us
to restrict most of the forward and back propagation within the first layer of
the network during adversary updates. This effectively reduces the total number
of full forward and backward propagation to only one for each group of
adversary updates. Therefore, we refer to this algorithm YOPO (You Only
Propagate Once). Numerical experiments demonstrate that YOPO can achieve
comparable defense accuracy with approximately 1/5 ~ 1/4 GPU time of the
projected gradient descent (PGD) algorithm. Our codes are available at
https://https://github.com/a1600012888/YOPO-You-Only-Propagate-Once.Comment: Accepted as a conference paper at NeurIPS 201
SoK: Pitfalls in Evaluating Black-Box Attacks
Numerous works study black-box attacks on image classifiers. However, these
works make different assumptions on the adversary's knowledge and current
literature lacks a cohesive organization centered around the threat model. To
systematize knowledge in this area, we propose a taxonomy over the threat space
spanning the axes of feedback granularity, the access of interactive queries,
and the quality and quantity of the auxiliary data available to the attacker.
Our new taxonomy provides three key insights. 1) Despite extensive literature,
numerous under-explored threat spaces exist, which cannot be trivially solved
by adapting techniques from well-explored settings. We demonstrate this by
establishing a new state-of-the-art in the less-studied setting of access to
top-k confidence scores by adapting techniques from well-explored settings of
accessing the complete confidence vector, but show how it still falls short of
the more restrictive setting that only obtains the prediction label,
highlighting the need for more research. 2) Identification the threat model of
different attacks uncovers stronger baselines that challenge prior
state-of-the-art claims. We demonstrate this by enhancing an initially weaker
baseline (under interactive query access) via surrogate models, effectively
overturning claims in the respective paper. 3) Our taxonomy reveals
interactions between attacker knowledge that connect well to related areas,
such as model inversion and extraction attacks. We discuss how advances in
other areas can enable potentially stronger black-box attacks. Finally, we
emphasize the need for a more realistic assessment of attack success by
factoring in local attack runtime. This approach reveals the potential for
certain attacks to achieve notably higher success rates and the need to
evaluate attacks in diverse and harder settings, highlighting the need for
better selection criteria
Semantically Controllable Generation of Physical Scenes with Explicit Knowledge
Deep Generative Models (DGMs) are known for their superior capability in
generating realistic data. Extending purely data-driven approaches, recent
specialized DGMs may satisfy additional controllable requirements such as
embedding a traffic sign in a driving scene, by manipulating patterns
\textit{implicitly} in the neuron or feature level. In this paper, we introduce
a novel method to incorporate domain knowledge \textit{explicitly} in the
generation process to achieve semantically controllable scene generation. We
categorize our knowledge into two types to be consistent with the composition
of natural scenes, where the first type represents the property of objects and
the second type represents the relationship among objects. We then propose a
tree-structured generative model to learn complex scene representation, whose
nodes and edges are naturally corresponding to the two types of knowledge
respectively. Knowledge can be explicitly integrated to enable semantically
controllable scene generation by imposing semantic rules on properties of nodes
and edges in the tree structure. We construct a synthetic example to illustrate
the controllability and explainability of our method in a clean setting. We
further extend the synthetic example to realistic autonomous vehicle driving
environments and conduct extensive experiments to show that our method
efficiently identifies adversarial traffic scenes against different
state-of-the-art 3D point cloud segmentation models satisfying the traffic
rules specified as the explicit knowledge.Comment: 14 pages, 6 figures. Under revie
- …