47 research outputs found
Downstream-agnostic Adversarial Examples
Self-supervised learning usually uses a large amount of unlabeled data to
pre-train an encoder which can be used as a general-purpose feature extractor,
such that downstream users only need to perform fine-tuning operations to enjoy
the benefit of "large model". Despite this promising prospect, the security of
pre-trained encoder has not been thoroughly investigated yet, especially when
the pre-trained encoder is publicly available for commercial use.
In this paper, we propose AdvEncoder, the first framework for generating
downstream-agnostic universal adversarial examples based on the pre-trained
encoder. AdvEncoder aims to construct a universal adversarial perturbation or
patch for a set of natural images that can fool all the downstream tasks
inheriting the victim pre-trained encoder. Unlike traditional adversarial
example works, the pre-trained encoder only outputs feature vectors rather than
classification labels. Therefore, we first exploit the high frequency component
information of the image to guide the generation of adversarial examples. Then
we design a generative attack framework to construct adversarial
perturbations/patches by learning the distribution of the attack surrogate
dataset to improve their attack success rates and transferability. Our results
show that an attacker can successfully attack downstream tasks without knowing
either the pre-training dataset or the downstream dataset. We also tailor four
defenses for pre-trained encoders, the results of which further prove the
attack ability of AdvEncoder.Comment: This paper has been accepted by the International Conference on
Computer Vision (ICCV '23, October 2--6, 2023, Paris, France
Robustness and Interpretability of Neural Networks’ Predictions under Adversarial Attacks
Le reti neurali profonde (DNNs) sono potenti modelli predittivi, che superano le capacità umane in una varietà di task. Imparano sistemi decisionali complessi e flessibili dai dati a disposizione e raggiungono prestazioni eccezionali in molteplici campi di apprendimento automatico, dalle applicazioni dell'intelligenza artificiale, come il riconoscimento di immagini, parole e testi, alle scienze più tradizionali, tra cui medicina, fisica e biologia. Nonostante i risultati eccezionali, le prestazioni elevate e l’alta precisione predittiva non sono sufficienti per le applicazioni nel mondo reale, specialmente in ambienti critici per la sicurezza, dove l'utilizzo dei DNNs è fortemente limitato dalla loro natura black-box. Vi è una crescente necessità di comprendere come vengono eseguite le predizioni, fornire stime di incertezza, garantire robustezza agli attacchi avversari e prevenire comportamenti indesiderati.
Anche le migliori architetture sono vulnerabili a piccole perturbazioni nei dati di input, note come attacchi avversari: manipolazioni malevole degli input che sono percettivamente indistinguibili dai campioni originali ma sono in grado di ingannare il modello in predizioni errate. In questo lavoro, dimostriamo che tale fragilità è correlata alla geometria del manifold dei dati ed è quindi probabile che sia una caratteristica intrinseca delle predizioni dei DNNs. Questa
condizione suggerisce una possibile direzione al fine di ottenere robustezza agli attacchi: studiamo la geometria degli attacchi avversari nel limite di un numero infinito di dati e di pesi per le reti neurali Bayesiane, dimostrando che, in questo limite, sono immuni agli attacchi avversari gradient-based. Inoltre, proponiamo alcune tecniche di training per migliorare la robustezza delle architetture deterministiche. In particolare, osserviamo sperimentalmente che ensembles di reti neurali addestrati su proiezioni casuali degli input originali in spazi basso-dimensionali sono più resistenti agli attacchi.
Successivamente, ci concentriamo sul problema dell'interpretabilità delle predizioni delle reti nel contesto delle saliency-based explanations. Analizziamo la stabilità delle explanations soggette ad attacchi avversari e dimostriamo che, nel limite di un numero infinito di dati e di pesi, le interpretazioni Bayesiane sono più stabili di quelle fornite dalle reti deterministiche. Confermiamo questo comportamento in modo sperimentale nel regime di un numero finito di dati.
Infine, introduciamo il concetto di attacco avversario alle sequenze di amminoacidi per protein Language Models (LM). I modelli di Deep Learning per la predizione della struttura delle proteine, come AlphaFold2, sfruttano le architetture Transformer e il loro meccanismo di attention per catturare le proprietà strutturali e funzionali delle sequenze di amminoacidi. Nonostante l'elevata precisione delle predizioni, perturbazioni biologicamente piccole delle sequenze di input, o anche mutazioni di un singolo amminoacido, possono portare a strutture 3D sostanzialmente diverse. Al contempo, i protein LMs sono insensibili alle mutazioni che inducono misfolding o disfunzione (ad esempio le missense mutations). In particolare, le predizioni delle coordinate 3D non rivelano l'effetto di unfolding indotto da queste mutazioni. Pertanto, esiste un'evidente incoerenza tra l'importanza biologica delle mutazioni e il conseguente cambiamento nella predizione strutturale. Ispirati da questo problema, introduciamo il concetto di perturbazione avversaria delle sequenze proteiche negli embedding continui dei protein LMs. Il nostro metodo utilizza i valori di attention per rilevare le posizioni degli amminoacidi più vulnerabili nelle sequenze di input. Le mutazioni avversarie sono biologicamente diverse dalle sequenze di riferimento e sono in grado di alterare in modo significativo le strutture 3D.Deep Neural Networks (DNNs) are powerful predictive models, exceeding human capabilities in a variety of tasks. They learn complex and flexible decision systems from the available data and achieve exceptional performances in multiple machine learning fields, spanning from applications in artificial intelligence, such as image, speech and text recognition, to the more traditional sciences, including medicine, physics and biology. Despite the outstanding achievements, high performance and high predictive accuracy are not sufficient for real-world applications, especially in safety-critical settings, where the usage of DNNs is severely limited by their black-box nature. There is an increasing need to understand how predictions are performed, to provide uncertainty estimates, to guarantee robustness to malicious attacks and to prevent unwanted behaviours.
State-of-the-art DNNs are vulnerable to small perturbations in the input data, known as adversarial attacks: maliciously crafted manipulations of the inputs that are perceptually indistinguishable from the original samples but are capable of fooling the model into incorrect predictions. In this work, we prove that such brittleness is related to the geometry of the data manifold and is therefore likely to be an intrinsic feature of DNNs’ predictions. This negative
condition suggests a possible direction to overcome such limitation: we study the geometry of adversarial attacks in the large-data, overparameterized limit for Bayesian Neural Networks and prove that, in this limit, they are immune to gradient-based adversarial attacks. Furthermore, we propose some training techniques to improve the adversarial robustness of deterministic architectures. In particular, we experimentally observe that ensembles of NNs trained on random projections of the original inputs into lower dimensional spaces are more resilient to the attacks.
Next, we focus on the problem of interpretability of NNs’ predictions in the setting of saliency-based explanations. We analyze the stability of the explanations under adversarial attacks on the inputs and we prove that, in the large-data and overparameterized limit, Bayesian interpretations are more stable than those provided by deterministic networks. We validate this behaviour in multiple experimental settings in the finite data regime.
Finally, we introduce the concept of adversarial perturbations of amino acid sequences for protein Language Models (LMs). Deep Learning models for protein structure prediction, such as AlphaFold2, leverage Transformer architectures and their attention mechanism to capture structural and functional properties of amino acid sequences. Despite the high accuracy of predictions, biologically small perturbations of the input sequences, or even single point mutations, can lead to substantially different 3d structures. On the other hand, protein language models are insensitive to mutations that induce misfolding or dysfunction (e.g. missense mutations). Precisely, predictions of the 3d coordinates do not reveal the structure-disruptive effect of these mutations. Therefore, there is an evident inconsistency between the biological importance of mutations and the resulting change in structural prediction. Inspired by this problem, we introduce the concept of adversarial perturbation of protein sequences in continuous embedding spaces of protein language models. Our method relies on attention scores to detect the most vulnerable amino acid positions in the input sequences. Adversarial mutations are biologically diverse from their references and are able to significantly alter the resulting 3D structures
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey
Adversarial attacks and defenses in machine learning and deep neural network
have been gaining significant attention due to the rapidly growing applications
of deep learning in the Internet and relevant scenarios. This survey provides a
comprehensive overview of the recent advancements in the field of adversarial
attack and defense techniques, with a focus on deep neural network-based
classification models. Specifically, we conduct a comprehensive classification
of recent adversarial attack methods and state-of-the-art adversarial defense
techniques based on attack principles, and present them in visually appealing
tables and tree diagrams. This is based on a rigorous evaluation of the
existing works, including an analysis of their strengths and limitations. We
also categorize the methods into counter-attack detection and robustness
enhancement, with a specific focus on regularization-based methods for
enhancing robustness. New avenues of attack are also explored, including
search-based, decision-based, drop-based, and physical-world attacks, and a
hierarchical classification of the latest defense methods is provided,
highlighting the challenges of balancing training costs with performance,
maintaining clean accuracy, overcoming the effect of gradient masking, and
ensuring method transferability. At last, the lessons learned and open
challenges are summarized with future research opportunities recommended.Comment: 46 pages, 21 figure
Safe and Robust Watermark Injection with a Single OoD Image
Training a high-performance deep neural network requires large amounts of
data and computational resources. Protecting the intellectual property (IP) and
commercial ownership of a deep model is challenging yet increasingly crucial. A
major stream of watermarking strategies implants verifiable backdoor triggers
by poisoning training samples, but these are often unrealistic due to data
privacy and safety concerns and are vulnerable to minor model changes such as
fine-tuning. To overcome these challenges, we propose a safe and robust
backdoor-based watermark injection technique that leverages the diverse
knowledge from a single out-of-distribution (OoD) image, which serves as a
secret key for IP verification. The independence of training data makes it
agnostic to third-party promises of IP security. We induce robustness via
random perturbation of model parameters during watermark injection to defend
against common watermark removal attacks, including fine-tuning, pruning, and
model extraction. Our experimental results demonstrate that the proposed
watermarking approach is not only time- and sample-efficient without training
data, but also robust against the watermark removal attacks above
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Deep neural networks (DNNs) are vulnerable to a class of attacks called
"backdoor attacks", which create an association between a backdoor trigger and
a target label the attacker is interested in exploiting. A backdoored DNN
performs well on clean test images, yet persistently predicts an
attacker-defined label for any sample in the presence of the backdoor trigger.
Although backdoor attacks have been extensively studied in the image domain,
there are very few works that explore such attacks in the video domain, and
they tend to conclude that image backdoor attacks are less effective in the
video domain. In this work, we revisit the traditional backdoor threat model
and incorporate additional video-related aspects to that model. We show that
poisoned-label image backdoor attacks could be extended temporally in two ways,
statically and dynamically, leading to highly effective attacks in the video
domain. In addition, we explore natural video backdoors to highlight the
seriousness of this vulnerability in the video domain. And, for the first time,
we study multi-modal (audiovisual) backdoor attacks against video action
recognition models, where we show that attacking a single modality is enough
for achieving a high attack success rate
Distilling Cognitive Backdoor Patterns within an Image
This paper proposes a simple method to distill and detect backdoor patterns
within an image: \emph{Cognitive Distillation} (CD). The idea is to extract the
"minimal essence" from an input image responsible for the model's prediction.
CD optimizes an input mask to extract a small pattern from the input image that
can lead to the same model output (i.e., logits or deep features). The
extracted pattern can help understand the cognitive mechanism of a model on
clean vs. backdoor images and is thus called a \emph{Cognitive Pattern} (CP).
Using CD and the distilled CPs, we uncover an interesting phenomenon of
backdoor attacks: despite the various forms and sizes of trigger patterns used
by different attacks, the CPs of backdoor samples are all surprisingly and
suspiciously small. One thus can leverage the learned mask to detect and remove
backdoor examples from poisoned training datasets. We conduct extensive
experiments to show that CD can robustly detect a wide range of advanced
backdoor attacks. We also show that CD can potentially be applied to help
detect potential biases from face datasets. Code is available at
\url{https://github.com/HanxunH/CognitiveDistillation}.Comment: ICLR202
Recommended from our members
Robust Machine Learning by Integrating Context
Intelligent software has the potential to transform our society. It is becoming the building block for many systems in the real world. However, despite the excellent performance of machine learning models on benchmarks, state-of-the-art methods like neural networks often fail once they encounter realistic settings. Since neural networks often learn correlations without reasoning with the right signals and knowledge, they fail when facing shifting distributions, unforeseen corruptions, and worst-case scenarios. Since neural networks are black-box models, they are not interpretable or trusted by the user. We need to build robust models for machine learning to be confidently and responsibly deployed in the most critical applications and systems.
In this dissertation, I introduce our robust machine learning systems advancements by tightly integrating context into algorithms. The context has two aspects: the intrinsic structure of natural data, and the extrinsic structure from domain knowledge. Both are crucial: By capitalizing on the intrinsic structure in natural data, my work has shown that we can create robust machine learning systems, even in the worst case, an analytical result that also enjoys strong empirical gains.
Through integrating external knowledge, such as the association between tasks and causal structure, my framework can instruct models to use the right signals for inference, enabling new opportunities for controllable and interpretable models.
This thesis consists of three parts. In the first part, I aim to cover three works that use the intrinsic structure as a constraint to achieve robust inference. I present our framework that performs test-time optimization to respect the natural constraint, which is captured by self-supervised tasks. I illustrate that test-time optimization improves out-of-distribution generalization and adversarial robustness. Besides the inference algorithm, I show that intrinsic structure through discrete representations also improves out-of-distribution robustness.
In the second part of the thesis, I then detail my work using external domain knowledge. I first introduce using causal structure from external domain knowledge to improve domain generalization robustness. I then show how the association of multiple tasks and regularization objectives helps robustness.
In the final part of this dissertation, I show three works on trustworthy and reliable foundation models, a general-purpose model that will be the foundation for many AI applications. I show a framework that uses context to secure, interpret, and control foundation models
Deep Intellectual Property: A Survey
With the widespread application in industrial manufacturing and commercial
services, well-trained deep neural networks (DNNs) are becoming increasingly
valuable and crucial assets due to the tremendous training cost and excellent
generalization performance. These trained models can be utilized by users
without much expert knowledge benefiting from the emerging ''Machine Learning
as a Service'' (MLaaS) paradigm. However, this paradigm also exposes the
expensive models to various potential threats like model stealing and abuse. As
an urgent requirement to defend against these threats, Deep Intellectual
Property (DeepIP), to protect private training data, painstakingly-tuned
hyperparameters, or costly learned model weights, has been the consensus of
both industry and academia. To this end, numerous approaches have been proposed
to achieve this goal in recent years, especially to prevent or discover model
stealing and unauthorized redistribution. Given this period of rapid evolution,
the goal of this paper is to provide a comprehensive survey of the recent
achievements in this field. More than 190 research contributions are included
in this survey, covering many aspects of Deep IP Protection:
challenges/threats, invasive solutions (watermarking), non-invasive solutions
(fingerprinting), evaluation metrics, and performance. We finish the survey by
identifying promising directions for future research.Comment: 38 pages, 12 figure
Efficient Defense Against Model Stealing Attacks on Convolutional Neural Networks
Model stealing attacks have become a serious concern for deep learning
models, where an attacker can steal a trained model by querying its black-box
API. This can lead to intellectual property theft and other security and
privacy risks. The current state-of-the-art defenses against model stealing
attacks suggest adding perturbations to the prediction probabilities. However,
they suffer from heavy computations and make impracticable assumptions about
the adversary. They often require the training of auxiliary models. This can be
time-consuming and resource-intensive which hinders the deployment of these
defenses in real-world applications. In this paper, we propose a simple yet
effective and efficient defense alternative. We introduce a heuristic approach
to perturb the output probabilities. The proposed defense can be easily
integrated into models without additional training. We show that our defense is
effective in defending against three state-of-the-art stealing attacks. We
evaluate our approach on large and quantized (i.e., compressed) Convolutional
Neural Networks (CNNs) trained on several vision datasets. Our technique
outperforms the state-of-the-art defenses with a faster inference
latency without requiring any additional model and with a low impact on the
model's performance. We validate that our defense is also effective for
quantized CNNs targeting edge devices.Comment: Accepted for publication at 2023 International Conference on Machine
Learning and Applications (ICMLA
Launching a Robust Backdoor Attack under Capability Constrained Scenarios
As deep neural networks continue to be used in critical domains, concerns
over their security have emerged. Deep learning models are vulnerable to
backdoor attacks due to the lack of transparency. A poisoned backdoor model may
perform normally in routine environments, but exhibit malicious behavior when
the input contains a trigger. Current research on backdoor attacks focuses on
improving the stealthiness of triggers, and most approaches require strong
attacker capabilities, such as knowledge of the model structure or control over
the training process. These attacks are impractical since in most cases the
attacker's capabilities are limited. Additionally, the issue of model
robustness has not received adequate attention. For instance, model
distillation is commonly used to streamline model size as the number of
parameters grows exponentially, and most of previous backdoor attacks failed
after model distillation; the image augmentation operations can destroy the
trigger and thus disable the backdoor. This study explores the implementation
of black-box backdoor attacks within capability constraints. An attacker can
carry out such attacks by acting as either an image annotator or an image
provider, without involvement in the training process or knowledge of the
target model's structure. Through the design of a backdoor trigger, our attack
remains effective after model distillation and image augmentation, making it
more threatening and practical. Our experimental results demonstrate that our
method achieves a high attack success rate in black-box scenarios and evades
state-of-the-art backdoor defenses.Comment: 9 pages, 6 figure