19 research outputs found
SoK: Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge
Adversarial examples are malicious inputs to machine learning models that
trigger a misclassification. This type of attack has been studied for close to
a decade, and we find that there is a lack of study and formalization of
adversary knowledge when mounting attacks. This has yielded a complex space of
attack research with hard-to-compare threat models and attacks. We focus on the
image classification domain and provide a theoretical framework to study
adversary knowledge inspired by work in order theory. We present an adversarial
example game, inspired by cryptographic games, to standardize attacks. We
survey recent attacks in the image classification domain and classify their
adversary's knowledge in our framework. From this systematization, we compile
results that both confirm existing beliefs about adversary knowledge, such as
the potency of information about the attacked model as well as allow us to
derive new conclusions on the difficulty associated with the white-box and
transferable threat models, for example, that transferable attacks might not be
as difficult as previously thought
Analyzing Adversarial Examples: A Framework to Study Adversary Knowledge
Adversarial examples are malicious inputs to trained machine learning models supplied to trigger a misclassification. This type of attack has been studied for close to a decade, and we find that there is a lack of study and formalization of adversary knowledge when mounting attacks. This has yielded a complex space of attack research with hard-to-compare threat models and attacks. We solve this in the image classification domain by providing a theoretical framework to study adversary knowledge inspired by work in order theory. We present an adversarial example game, based on cryptographic games, to standardize attack procedures. We survey recent attacks in the image classification domain that showcase the current state of adversarial example research. Together with our formalization, we compile results that both confirm existing beliefs about adversary knowledge, such as the potency of information about the attacked model as well as allow us to derive new conclusions on the difficulty associated with the white-box and transferable threat models, for example, transferable attacks might not be as difficult as previously thought
Leveraging Optimization for Adaptive Attacks on Image Watermarks
Untrustworthy users can misuse image generators to synthesize high-quality
deepfakes and engage in unethical activities. Watermarking deters misuse by
marking generated content with a hidden message, enabling its detection using a
secret watermarking key. A core security property of watermarking is
robustness, which states that an attacker can only evade detection by
substantially degrading image quality. Assessing robustness requires designing
an adaptive attack for the specific watermarking algorithm. When evaluating
watermarking algorithms and their (adaptive) attacks, it is challenging to
determine whether an adaptive attack is optimal, i.e., the best possible
attack. We solve this problem by defining an objective function and then
approach adaptive attacks as an optimization problem. The core idea of our
adaptive attacks is to replicate secret watermarking keys locally by creating
surrogate keys that are differentiable and can be used to optimize the attack's
parameters. We demonstrate for Stable Diffusion models that such an attacker
can break all five surveyed watermarking methods at no visible degradation in
image quality. Optimizing our attacks is efficient and requires less than 1 GPU
hour to reduce the detection accuracy to 6.3% or less. Our findings emphasize
the need for more rigorous robustness testing against adaptive, learnable
attackers.Comment: ICLR'2
Appendicularian ecophysiology I: Food concentration dependent clearance rate, assimilation efficiency, growth and reproduction of Oikopleura dioica
International audienc