356 research outputs found
Certifiable Black-Box Attack: Ensuring Provably Successful Attack for Adversarial Examples
Black-box adversarial attacks have shown strong potential to subvert machine
learning models. Existing black-box adversarial attacks craft the adversarial
examples by iteratively querying the target model and/or leveraging the
transferability of a local surrogate model. Whether such attack can succeed
remains unknown to the adversary when empirically designing the attack. In this
paper, to our best knowledge, we take the first step to study a new paradigm of
adversarial attacks -- certifiable black-box attack that can guarantee the
attack success rate of the crafted adversarial examples. Specifically, we
revise the randomized smoothing to establish novel theories for ensuring the
attack success rate of the adversarial examples. To craft the adversarial
examples with the certifiable attack success rate (CASR) guarantee, we design
several novel techniques, including a randomized query method to query the
target model, an initialization method with smoothed self-supervised
perturbation to derive certifiable adversarial examples, and a geometric
shifting method to reduce the perturbation size of the certifiable adversarial
examples for better imperceptibility. We have comprehensively evaluated the
performance of the certifiable black-box attack on CIFAR10 and ImageNet
datasets against different levels of defenses. Both theoretical and
experimental results have validated the effectiveness of the proposed
certifiable attack
Robust Recommender System: A Survey and Future Directions
With the rapid growth of information, recommender systems have become
integral for providing personalized suggestions and overcoming information
overload. However, their practical deployment often encounters "dirty" data,
where noise or malicious information can lead to abnormal recommendations.
Research on improving recommender systems' robustness against such dirty data
has thus gained significant attention. This survey provides a comprehensive
review of recent work on recommender systems' robustness. We first present a
taxonomy to organize current techniques for withstanding malicious attacks and
natural noise. We then explore state-of-the-art methods in each category,
including fraudster detection, adversarial training, certifiable robust
training against malicious attacks, and regularization, purification,
self-supervised learning against natural noise. Additionally, we summarize
evaluation metrics and common datasets used to assess robustness. We discuss
robustness across varying recommendation scenarios and its interplay with other
properties like accuracy, interpretability, privacy, and fairness. Finally, we
delve into open issues and future research directions in this emerging field.
Our goal is to equip readers with a holistic understanding of robust
recommender systems and spotlight pathways for future research and development
Interpreting Adversarially Trained Convolutional Neural Networks
We attempt to interpret how adversarially trained convolutional neural
networks (AT-CNNs) recognize objects. We design systematic approaches to
interpret AT-CNNs in both qualitative and quantitative ways and compare them
with normally trained models. Surprisingly, we find that adversarial training
alleviates the texture bias of standard CNNs when trained on object recognition
tasks, and helps CNNs learn a more shape-biased representation. We validate our
hypothesis from two aspects. First, we compare the salience maps of AT-CNNs and
standard CNNs on clean images and images under different transformations. The
comparison could visually show that the prediction of the two types of CNNs is
sensitive to dramatically different types of features. Second, to achieve
quantitative verification, we construct additional test datasets that destroy
either textures or shapes, such as style-transferred version of clean data,
saturated images and patch-shuffled ones, and then evaluate the classification
accuracy of AT-CNNs and normal CNNs on these datasets. Our findings shed some
light on why AT-CNNs are more robust than those normally trained ones and
contribute to a better understanding of adversarial training over CNNs from an
interpretation perspective.Comment: To apper in ICML1
- …