8 research outputs found
Hardening RGB-D Object Recognition Systems against Adversarial Patch Attacks
RGB-D object recognition systems improve their predictive performances by
fusing color and depth information, outperforming neural network architectures
that rely solely on colors. While RGB-D systems are expected to be more robust
to adversarial examples than RGB-only systems, they have also been proven to be
highly vulnerable. Their robustness is similar even when the adversarial
examples are generated by altering only the original images' colors. Different
works highlighted the vulnerability of RGB-D systems; however, there is a
lacking of technical explanations for this weakness. Hence, in our work, we
bridge this gap by investigating the learned deep representation of RGB-D
systems, discovering that color features make the function learned by the
network more complex and, thus, more sensitive to small perturbations. To
mitigate this problem, we propose a defense based on a detection mechanism that
makes RGB-D systems more robust against adversarial examples. We empirically
show that this defense improves the performances of RGB-D systems against
adversarial examples even when they are computed ad-hoc to circumvent this
detection mechanism, and that is also more effective than adversarial training.Comment: Accepted for publication in the Information Sciences journa
Increasing the Confidence of Deep Neural Networks by Coverage Analysis
The great performance of machine learning algorithms and deep neural networks
in several perception and control tasks is pushing the industry to adopt such
technologies in safety-critical applications, as autonomous robots and
self-driving vehicles. At present, however, several issues need to be solved to
make deep learning methods more trustworthy, predictable, safe, and secure
against adversarial attacks. Although several methods have been proposed to
improve the trustworthiness of deep neural networks, most of them are tailored
for specific classes of adversarial examples, hence failing to detect other
corner cases or unsafe inputs that heavily deviate from the training samples.
This paper presents a lightweight monitoring architecture based on coverage
paradigms to enhance the model robustness against different unsafe inputs. In
particular, four coverage analysis methods are proposed and tested in the
architecture for evaluating multiple detection logics. Experimental results
show that the proposed approach is effective in detecting both powerful
adversarial examples and out-of-distribution inputs, introducing limited
extra-execution time and memory requirements
Two Heads are Better than One: Towards Better Adversarial Robustness by Combining Transduction and Rejection
Both transduction and rejection have emerged as important techniques for
defending against adversarial perturbations. A recent work by Tram\`er showed
that, in the rejection-only case (no transduction), a strong rejection-solution
can be turned into a strong (but computationally inefficient) non-rejection
solution. This detector-to-classifier reduction has been mostly applied to give
evidence that certain claims of strong selective-model solutions are
susceptible, leaving the benefits of rejection unclear. On the other hand, a
recent work by Goldwasser et al. showed that rejection combined with
transduction can give provable guarantees (for certain problems) that cannot be
achieved otherwise. Nevertheless, under recent strong adversarial attacks
(GMSA, which has been shown to be much more effective than AutoAttack against
transduction), Goldwasser et al.'s work was shown to have low performance in a
practical deep-learning setting. In this paper, we take a step towards
realizing the promise of transduction+rejection in more realistic scenarios.
Theoretically, we show that a novel application of Tram\`er's
classifier-to-detector technique in the transductive setting can give
significantly improved sample-complexity for robust generalization. While our
theoretical construction is computationally inefficient, it guides us to
identify an efficient transductive algorithm to learn a selective model.
Extensive experiments using state of the art attacks (AutoAttack, GMSA) show
that our solutions provide significantly better robust accuracy
Machine Learning with a Reject Option: A survey
Machine learning models always make a prediction, even when it is likely to
be inaccurate. This behavior should be avoided in many decision support
applications, where mistakes can have severe consequences. Albeit already
studied in 1970, machine learning with rejection recently gained interest. This
machine learning subfield enables machine learning models to abstain from
making a prediction when likely to make a mistake.
This survey aims to provide an overview on machine learning with rejection.
We introduce the conditions leading to two types of rejection, ambiguity and
novelty rejection, which we carefully formalize. Moreover, we review and
categorize strategies to evaluate a model's predictive and rejective quality.
Additionally, we define the existing architectures for models with rejection
and describe the standard techniques for learning such models. Finally, we
provide examples of relevant application domains and show how machine learning
with rejection relates to other machine learning research areas
MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples Detectors
This paper has been accepted to appear in the Proceedings of the 2022 European Conference on Machine Learning and Data Mining (ECML-PKDD), 19th to the 23rd of September, Grenoble, FranceInternational audienceDetection of adversarial examples has been a hot topic in the last years due to its importance for safely deploying machine learning algorithms in critical applications. However, the detection methods are generally validated by assuming a single implicitly known attack strategy, which does not necessarily account for real-life threats. Indeed, this can lead to an overoptimistic assessment of the detectors' performance and may induce some bias in the comparison between competing detection schemes. We propose a novel multi-armed framework, called MEAD, for evaluating detectors based on several attack strategies to overcome this limitation. Among them, we make use of three new objectives to generate attacks. The proposed performance metric is based on the worst-case scenario: detection is successful if and only if all different attacks are correctly recognized. Empirically, we show the effectiveness of our approach. Moreover, the poor performance obtained for state-of-the-art detectors opens a new exciting line of research
Evaluating Adversarial Robustness of Detection-based Defenses against Adversarial Examples
Machine Learning algorithms provide astonishing performance in a wide range of tasks, including sensitive and critical applications. On the other hand, it has been shown that they are vulnerable to adversarial attacks, a set of techniques that violate the integrity, confidentiality, or availability of such systems. In particular, one of the most studied phenomena concerns adversarial examples, i.e., input samples that are carefully manipulated to alter the model output. In the last decade, the research community put a strong effort into this field, proposing new evasion attacks and methods to defend against them.
With this thesis, we propose different approaches that can be applied to Deep Neural Networks to detect and reject adversarial examples that present an anomalous distribution with respect to training data.
The first leverages the domain knowledge of the relationships among the considered classes integrated through a framework in which first-order logic knowledge is converted into constraints and injected into a semi-supervised learning problem. Within this setting, the classifier is able to reject samples that violate the domain knowledge constraints. This approach can be applied in both single and multi-label classification settings.
The second one is based on a Deep Neural Rejection (DNR) mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers. To this end, we exploit RBF SVM classifiers, which provide decreasing confidence values as samples move away from the training data distribution.
Despite technical differences, this approach shares a common backbone structure with other proposed methods that we formalize in a unifying framework. As all of them require comparing input samples against an oversized number of reference prototypes, possibly at different representation layers, they suffer from the same drawback, i.e., high computational overhead and memory usage, that makes these approaches unusable in real applications. To overcome this limitation, we introduce FADER (Fast Adversarial Example Rejection), a technique for speeding up detection-based methods by employing RBF networks as detectors: by fixing the number of required prototypes, their runtime complexity can be controlled.
All proposed methods are evaluated in both black-box and white-box settings, i.e., against an attacker unaware of the defense mechanism, and against an attacker who knows the defense and adapts the attack algorithm to bypass it, respectively.
Our experimental evaluation shows that the proposed methods increase the robustness of the defended models and help detect adversarial examples effectively, especially when the attacker does not know the underlying detection system
Deep neural rejection against adversarial examples
Despite the impressive performances reported by deep neural networks in different application domains, they remain largely vulnerable to adversarial examples, i.e., input samples that are carefully perturbed to cause misclassification at test time. In this work, we propose a deep neural rejection mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers. With respect to competing approaches, our method does not require generating adversarial examples at training time, and it is less computationally demanding. To properly evaluate our method, we define an adaptive white-box attack that is aware of the defense mechanism and aims to bypass it. Under this worst-case setting, we empirically show that our approach outperforms previously proposed methods that detect adversarial examples by only analyzing the feature representation provided by the output network layer