92 research outputs found

    Machine Learning in Adversarial Environments

    Full text link
    Machine Learning, especially Deep Neural Nets (DNNs), has achieved great success in a variety of applications. Unlike classical algorithms that could be formally analyzed, there is less understanding of neural network-based learning algorithms. This lack of understanding through either formal methods or empirical observations results in potential vulnerabilities that could be exploited by adversaries. This also hinders the deployment and adoption of learning methods in security-critical systems. Recent works have demonstrated that DNNs are vulnerable to carefully crafted adversarial perturbations. We refer to data instances with added adversarial perturbations as “adversarial examples”. Such adversarial examples can mislead DNNs to produce adversary-selected results. Furthermore, it can cause a DNN system to misbehavior in unexpected and potentially dangerous ways. In this context, in this thesis, we focus on studying the security problem of current DNNs from the viewpoints of both attack and defense. First, we explore the space of attacks against DNNs during the test time. We revisit the integrity of Lp regime and propose a new and rigorous threat model of adversarial examples. Based on this new threat model, we present the technique to generate adversarial examples in the digital space. Second, we study the physical consequence of adversarial examples in the 3D and physical spaces. We first study the vulnerabilities of various vision systems by simulating the photo0taken process by using the physical renderer. To further explore the physical consequence in the real world, we select the safety-critical application of autonomous driving as the target system and study the vulnerability of the LiDAR-perceptual module. These studies show the potentially severe consequences of adversarial examples and raise awareness on its risks. Last but not least, we develop solutions to defend against adversarial examples. We propose a consistency-check based method to detect adversarial examples by leveraging property of either the learning model or the data. We show two examples in the segmentation task (leveraging learning model) and video data (leveraging the data), respectively.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162944/1/xiaocw_1.pd

    MeshAdv: Adversarial Meshes for Visual Recognition

    Full text link
    Highly expressive models such as deep neural networks (DNNs) have been widely applied to various applications. However, recent studies show that DNNs are vulnerable to adversarial examples, which are carefully crafted inputs aiming to mislead the predictions. Currently, the majority of these studies have focused on perturbation added to image pixels, while such manipulation is not physically realistic. Some works have tried to overcome this limitation by attaching printable 2D patches or painting patterns onto surfaces, but can be potentially defended because 3D shape features are intact. In this paper, we propose meshAdv to generate "adversarial 3D meshes" from objects that have rich shape features but minimal textural variation. To manipulate the shape or texture of the objects, we make use of a differentiable renderer to compute accurate shading on the shape and propagate the gradient. Extensive experiments show that the generated 3D meshes are effective in attacking both classifiers and object detectors. We evaluate the attack under different viewpoints. In addition, we design a pipeline to perform black-box attack on a photorealistic renderer with unknown rendering parameters.Comment: Published in IEEE CVPR201

    DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

    Full text link
    With the advancement of Large Language Models (LLMs), significant progress has been made in code generation, enabling LLMs to transform natural language into programming code. These Code LLMs have been widely accepted by massive users and organizations. However, a dangerous nature is hidden in the code, which is the existence of fatal vulnerabilities. While some LLM providers have attempted to address these issues by aligning with human guidance, these efforts fall short of making Code LLMs practical and robust. Without a deep understanding of the performance of the LLMs under the practical worst cases, it would be concerning to apply them to various real-world applications. In this paper, we answer the critical issue: Are existing Code LLMs immune to generating vulnerable code? If not, what is the possible maximum severity of this issue in practical deployment scenarios? In this paper, we introduce DeceptPrompt, a novel algorithm that can generate adversarial natural language instructions that drive the Code LLMs to generate functionality correct code with vulnerabilities. DeceptPrompt is achieved through a systematic evolution-based algorithm with a fine grain loss design. The unique advantage of DeceptPrompt enables us to find natural prefix/suffix with totally benign and non-directional semantic meaning, meanwhile, having great power in inducing the Code LLMs to generate vulnerable code. This feature can enable us to conduct the almost-worstcase red-teaming on these LLMs in a real scenario, where users are using natural language. Our extensive experiments and analyses on DeceptPrompt not only validate the effectiveness of our approach but also shed light on the huge weakness of LLMs in the code generation task. When applying the optimized prefix/suffix, the attack success rate (ASR) will improve by average 50% compared with no prefix/suffix applying

    Generating Adversarial Examples with Adversarial Networks

    Full text link
    Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.Comment: Accepted to IJCAI201
    • …