52,747 research outputs found

    Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning

    Full text link
    Recently, it has been widely known that deep neural networks are highly vulnerable and easily broken by adversarial attacks. To mitigate the adversarial vulnerability, many defense algorithms have been proposed. Recently, to improve adversarial robustness, many works try to enhance feature representation by imposing more direct supervision on the discriminative feature. However, existing approaches lack an understanding of learning adversarially robust feature representation. In this paper, we propose a novel training framework called Robust Proxy Learning. In the proposed method, the model explicitly learns robust feature representations with robust proxies. To this end, firstly, we demonstrate that we can generate class-representative robust features by adding class-wise robust perturbations. Then, we use the class representative features as robust proxies. With the class-wise robust features, the model explicitly learns adversarially robust features through the proposed robust proxy learning framework. Through extensive experiments, we verify that we can manually generate robust features, and our proposed learning framework could increase the robustness of the DNNs.Comment: Accepted at IEEE Transactions on Information Forensics and Security (TIFS

    Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training

    Full text link
    Adversarial training is a standard method to train deep neural networks to be robust to adversarial perturbation. Similar to surprising clean generalization\textit{clean generalization} ability in the standard deep learning setting, neural networks trained by adversarial training also generalize well for unseen clean data\textit{unseen clean data}. However, in constrast with clean generalization, while adversarial training method is able to achieve low robust training error\textit{robust training error}, there still exists a significant robust generalization gap\textit{robust generalization gap}, which promotes us exploring what mechanism leads to both clean generalization and robust overfitting (CGRO)\textit{clean generalization and robust overfitting (CGRO)} during learning process. In this paper, we provide a theoretical understanding of this CGRO phenomenon in adversarial training. First, we propose a theoretical framework of adversarial training, where we analyze feature learning process\textit{feature learning process} to explain how adversarial training leads network learner to CGRO regime. Specifically, we prove that, under our patch-structured dataset, the CNN model provably partially learns the true feature but exactly memorizes the spurious features from training-adversarial examples, which thus results in clean generalization and robust overfitting. For more general data assumption, we then show the efficiency of CGRO classifier from the perspective of representation complexity\textit{representation complexity}. On the empirical side, to verify our theoretical analysis in real-world vision dataset, we investigate the dynamics of loss landscape\textit{dynamics of loss landscape} during training. Moreover, inspired by our experiments, we prove a robust generalization bound based on global flatness\textit{global flatness} of loss landscape, which may be an independent interest.Comment: 27 pages, comments welcom

    Adversarial Momentum-Contrastive Pre-Training for Robust Feature Extraction

    Full text link
    Recently proposed adversarial self-supervised learning methods usually require big batches and long training epochs to extract robust features, which is not friendly in practical application. In this paper, we present a novel adversarial momentum-contrastive learning approach that leverages two memory banks to track the invariant features across different mini-batches. These memory banks can be efficiently incorporated into each iteration and help the network to learn more robust feature representations with smaller batches and far fewer epochs. Furthermore, after fine-tuning on the classification tasks, the proposed approach can meet or exceed the performance of some state-of-the-art supervised baselines on real world datasets. Our code is available at \url{https://github.com/MTandHJ/amoc}.Comment: 16 pages;6 figures; preprin

    Evaluating the Robustness of Off-Road Autonomous Driving Segmentation against Adversarial Attacks: A Dataset-Centric analysis

    Full text link
    This study investigates the vulnerability of semantic segmentation models to adversarial input perturbations, in the domain of off-road autonomous driving. Despite good performance in generic conditions, the state-of-the-art classifiers are often susceptible to (even) small perturbations, ultimately resulting in inaccurate predictions with high confidence. Prior research has directed their focus on making models more robust by modifying the architecture and training with noisy input images, but has not explored the influence of datasets in adversarial attacks. Our study aims to address this gap by examining the impact of non-robust features in off-road datasets and comparing the effects of adversarial attacks on different segmentation network architectures. To enable this, a robust dataset is created consisting of only robust features and training the networks on this robustified dataset. We present both qualitative and quantitative analysis of our findings, which have important implications on improving the robustness of machine learning models in off-road autonomous driving applications. Additionally, this work contributes to the safe navigation of autonomous robot Unimog U5023 in rough off-road unstructured environments by evaluating the robustness of segmentation outputs. The code is publicly available at https://github.com/rohtkumar/adversarial_attacks_ on_segmentationComment: 8 page

    Robust Ranking Explanations

    Full text link
    Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using p\ell_p-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.Comment: Accepted to IMLH (Interpretable ML in Healthcare) workshop at ICML 2023. arXiv admin note: substantial text overlap with arXiv:2212.1410
    corecore