85 research outputs found

    Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

    Full text link
    Many recent works on knowledge distillation have provided ways to transfer the knowledge of a trained network for improving the learning process of a new one, but finding a good technique for knowledge distillation is still an open problem. In this paper, we provide a new perspective based on a decision boundary, which is one of the most important component of a classifier. The generalization performance of a classifier is closely related to the adequacy of its decision boundary, so a good classifier bears a good decision boundary. Therefore, transferring information closely related to the decision boundary can be a good attempt for knowledge distillation. To realize this goal, we utilize an adversarial attack to discover samples supporting a decision boundary. Based on this idea, to transfer more accurate information about the decision boundary, the proposed algorithm trains a student classifier based on the adversarial samples supporting the decision boundary. Experiments show that the proposed method indeed improves knowledge distillation and achieves the state-of-the-arts performance.Comment: Accepted to AAAI 201

    Formalizing evasion attacks against machine learning security detectors

    Get PDF
    Recent work has shown that adversarial examples can bypass machine learning-based threat detectors relying on static analysis by applying minimal perturbations. To preserve malicious functionality, previous attacks either apply trivial manipulations (e.g. padding), potentially limiting their effectiveness, or require running computationally-demanding validation steps to discard adversarial variants that do not correctly execute in sandbox environments. While machine learning systems for detecting SQL injections have been proposed in the literature, no attacks have been tested against the proposed solutions to assess the effectiveness and robustness of these methods. In this thesis, we overcome these limitations by developing RAMEn, a unifying framework that (i) can express attacks for different domains, (ii) generalizes previous attacks against machine learning models, and (iii) uses functions that preserve the functionality of manipulated objects. We provide new attacks for both Windows malware and SQL injection detection scenarios by exploiting the format used for representing these objects. To show the efficacy of RAMEn, we provide experimental results of our strategies in both white-box and black-box settings. The white-box attacks against Windows malware detectors show that it takes only the 2% of the input size of the target to evade detection with ease. To further speed up the black-box attacks, we overcome the issues mentioned before by presenting a novel family of black-box attacks that are both query-efficient and functionality-preserving, as they rely on the injection of benign content, which will never be executed, either at the end of the malicious file, or within some newly-created sections, encoded in an algorithm called GAMMA. We also evaluate whether GAMMA transfers to other commercial antivirus solutions, and surprisingly find that it can evade many commercial antivirus engines. For evading SQLi detectors, we create WAF-A-MoLE, a mutational fuzzer that that exploits random mutations of the input samples, keeping alive only the most promising ones. WAF-A-MoLE is capable of defeating detectors built with different architectures by using the novel practical manipulations we have proposed. To facilitate reproducibility and future work, we open-source our framework and corresponding attack implementations. We conclude by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts naturally into the learning process

    Generative Methods, Meta-learning, and Meta-heuristics for Robust Cyber Defense

    Get PDF
    Cyberspace is the digital communications network that supports the internet of battlefield things (IoBT), the model by which defense-centric sensors, computers, actuators and humans are digitally connected. A secure IoBT infrastructure facilitates real time implementation of the observe, orient, decide, act (OODA) loop across distributed subsystems. Successful hacking efforts by cyber criminals and strategic adversaries suggest that cyber systems such as the IoBT are not secure. Three lines of effort demonstrate a path towards a more robust IoBT. First, a baseline data set of enterprise cyber network traffic was collected and modelled with generative methods allowing the generation of realistic, synthetic cyber data. Next, adversarial examples of cyber packets were algorithmically crafted to fool network intrusion detection systems while maintaining packet functionality. Finally, a framework is presented that uses meta-learning to combine the predictive power of various weak models. This resulted in a meta-model that outperforms all baseline classifiers with respect to overall accuracy of packets, and adversarial example detection rate. The National Defense Strategy underscores cybersecurity as an imperative to defend the homeland and maintain a military advantage in the information age. This research provides both academic perspective and applied techniques to to further the cybersecurity posture of the Department of Defense into the information age

    Tools for responsible decision-making in machine learning

    Get PDF
    Machine learning algorithms are increasingly used by decision making systems that affect individual lives in a wide variety of ways. Consequently, in recent years concerns have been raised about the social and ethical implications of using such algorithms. Particular concerns include issues surrounding privacy, fairness, and transparency in decision systems. This dissertation introduces new tools and measures for improving the social desirability of data-driven decision systems, and consists of two main parts. The first part provides a useful tool for an important class of decision making algorithms: collaborative filtering in recommender systems. In particular, it introduces the idea of improving socially relevant properties of a recommender system by augmenting the input with additional training data, an approach which is inspired by prior work on data poisoning attacks and adapts them to generate `antidote data' for social good. We provide an algorithmic framework for this strategy and show that it can efficiently improve the polarization and fairness metrics of factorization-based recommender systems. In the second part, we focus on fairness notions that incorporate data inputs used by decision systems. In particular, we draw attention to `data minimization', an existing principle in data protection regulations that restricts a system to use the minimal information that is necessary for performing the task at hand. First, we propose an operationalization for this principle that is based on classification accuracy, and we show how a natural dependence of accuracy on data inputs can be expressed as a trade-off between fair-inputs and fair-outputs. Next, we address the problem of auditing black- box prediction models for data minimization compliance. For this problem, we suggest a metric for data minimization that is based on model instability under simple imputations, and we extend its applicability from a finite sample model to a distributional setting by introducing a probabilistic data minimization guarantee. Finally, assuming limited system queries, we formulate the problem of allocating a query budget to simple imputations for investigating model instability as a multi-armed bandit framework, for which we design efficient exploration strategies
    corecore