550 research outputs found
Single Image Backdoor Inversion via Robust Smoothed Classifiers
Backdoor inversion, a central step in many backdoor defenses, is a
reverse-engineering process to recover the hidden backdoor trigger inserted
into a machine learning model. Existing approaches tackle this problem by
searching for a backdoor pattern that is able to flip a set of clean images
into the target class, while the exact size needed of this support set is
rarely investigated. In this work, we present a new approach for backdoor
inversion, which is able to recover the hidden backdoor with as few as a single
image. Insipired by recent advances in adversarial robustness, our method
SmoothInv starts from a single clean image, and then performs projected
gradient descent towards the target class on a robust smoothed version of the
original backdoored classifier. We find that backdoor patterns emerge naturally
from such optimization process. Compared to existing backdoor inversion
methods, SmoothInv introduces minimum optimization variables and does not
require complex regularization schemes. We perform a comprehensive quantitative
and qualitative study on backdoored classifiers obtained from existing backdoor
attacks. We demonstrate that SmoothInv consistently recovers successful
backdoors from single images: for backdoored ImageNet classifiers, our
reconstructed backdoors have close to 100% attack success rates. We also show
that they maintain high fidelity to the underlying true backdoors. Last, we
propose and analyze two countermeasures to our approach and show that SmoothInv
remains robust in the face of an adaptive attacker. Our code is available at
https://github.com/locuslab/smoothinv.Comment: CVPR 2023. v2: improved writin
Pareto Adversarial Robustness: Balancing Spatial Robustness and Sensitivity-based Robustness
Adversarial robustness, which mainly contains sensitivity-based robustness
and spatial robustness, plays an integral part in the robust generalization. In
this paper, we endeavor to design strategies to achieve universal adversarial
robustness. To hit this target, we firstly investigate the less-studied spatial
robustness and then integrate existing spatial robustness methods by
incorporating both local and global spatial vulnerability into one spatial
attack and adversarial training. Based on this exploration, we further present
a comprehensive relationship between natural accuracy, sensitivity-based and
different spatial robustness, supported by the strong evidence from the
perspective of robust representation. More importantly, in order to balance
these mutual impacts of different robustness into one unified framework, we
incorporate \textit{Pareto criterion} into the adversarial robustness analysis,
yielding a novel strategy called \textit{Pareto Adversarial Training} towards
universal robustness. The resulting Pareto front, the set of optimal solutions,
provides the set of optimal balance among natural accuracy and different
adversarial robustness, shedding light on solutions towards universal
robustness in the future. To the best of our knowledge, we are the first to
consider the universal adversarial robustness via multi-objective optimization
A Simple and Effective Pruning Approach for Large Language Models
As their size increases, Large Languages Models (LLMs) are natural candidates
for network pruning methods: approaches that drop a subset of network weights
while striving to preserve performance. Existing methods, however, require
either retraining, which is rarely affordable for billion-scale LLMs, or
solving a weight reconstruction problem reliant on second-order information,
which may also be computationally expensive. In this paper, we introduce a
novel, straightforward yet effective pruning method, termed Wanda (Pruning by
Weights and activations), designed to induce sparsity in pretrained LLMs.
Motivated by the recent observation of emergent large magnitude features in
LLMs, our approach prune weights with the smallest magnitudes multiplied by the
corresponding input activations, on a per-output basis. Notably, Wanda requires
no retraining or weight update, and the pruned LLM can be used as is. We
conduct a thorough evaluation of our method on LLaMA across various language
benchmarks. Wanda significantly outperforms the established baseline of
magnitude pruning and competes favorably against recent methods involving
intensive weight update. Code is available at
https://github.com/locuslab/wanda.Comment: Technical Repor
- …