1,140 research outputs found

    Tracking Cyber Adversaries with Adaptive Indicators of Compromise

    Full text link
    A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expressions (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities. In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naive solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naive solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science & Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas, Nevada, US

    Tracking Cyber Adversaries with Adaptive Indicators of Compromise

    Full text link
    A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expressions (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities. In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naive solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naive solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science & Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas, Nevada, US

    Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks

    Full text link
    Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such threats, but the existence of an arms race in adversarial settings has recently challenged such systems. In this work, we focus on malware embedded in PDF files as a representative case of such an arms race. We start by providing a comprehensive taxonomy of the different approaches used to generate PDF malware, and of the corresponding learning-based detection systems. We then categorize threats specifically targeted against learning-based PDF malware detectors, using a well-established framework in the field of adversarial machine learning. This framework allows us to categorize known vulnerabilities of learning-based PDF malware detectors and to identify novel attacks that may threaten such systems, along with the potential defense mechanisms that can mitigate the impact of such threats. We conclude the paper by discussing how such findings highlight promising research directions towards tackling the more general challenge of designing robust malware detectors in adversarial settings

    Formalizing evasion attacks against machine learning security detectors

    Get PDF
    Recent work has shown that adversarial examples can bypass machine learning-based threat detectors relying on static analysis by applying minimal perturbations. To preserve malicious functionality, previous attacks either apply trivial manipulations (e.g. padding), potentially limiting their effectiveness, or require running computationally-demanding validation steps to discard adversarial variants that do not correctly execute in sandbox environments. While machine learning systems for detecting SQL injections have been proposed in the literature, no attacks have been tested against the proposed solutions to assess the effectiveness and robustness of these methods. In this thesis, we overcome these limitations by developing RAMEn, a unifying framework that (i) can express attacks for different domains, (ii) generalizes previous attacks against machine learning models, and (iii) uses functions that preserve the functionality of manipulated objects. We provide new attacks for both Windows malware and SQL injection detection scenarios by exploiting the format used for representing these objects. To show the efficacy of RAMEn, we provide experimental results of our strategies in both white-box and black-box settings. The white-box attacks against Windows malware detectors show that it takes only the 2% of the input size of the target to evade detection with ease. To further speed up the black-box attacks, we overcome the issues mentioned before by presenting a novel family of black-box attacks that are both query-efficient and functionality-preserving, as they rely on the injection of benign content, which will never be executed, either at the end of the malicious file, or within some newly-created sections, encoded in an algorithm called GAMMA. We also evaluate whether GAMMA transfers to other commercial antivirus solutions, and surprisingly find that it can evade many commercial antivirus engines. For evading SQLi detectors, we create WAF-A-MoLE, a mutational fuzzer that that exploits random mutations of the input samples, keeping alive only the most promising ones. WAF-A-MoLE is capable of defeating detectors built with different architectures by using the novel practical manipulations we have proposed. To facilitate reproducibility and future work, we open-source our framework and corresponding attack implementations. We conclude by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts naturally into the learning process

    MDEA: Malware Detection with Evolutionary Adversarial Learning

    Full text link
    Malware detection have used machine learning to detect malware in programs. These applications take in raw or processed binary data to neural network models to classify as benign or malicious files. Even though this approach has proven effective against dynamic changes, such as encrypting, obfuscating and packing techniques, it is vulnerable to specific evasion attacks where that small changes in the input data cause misclassification at test time. This paper proposes a new approach: MDEA, an Adversarial Malware Detection model uses evolutionary optimization to create attack samples to make the network robust against evasion attacks. By retraining the model with the evolved malware samples, its performance improves a significant margin.Comment: 8 pages, 6 figure

    A NEAT Approach to Malware Classification

    Get PDF
    Current malware detection software often relies on machine learning, which is seen as an improvement over signature-based techniques. Problems with a machine learning based approach can arise when malware writers modify their code with the intent to evade detection. This leads to a cat and mouse situation where new models must constantly be trained to detect new malware variants. In this research, we experiment with genetic algorithms as a means of evolving machine learning models to detect malware. Genetic algorithms, which simulate natural selection, provide a way for models to adapt to continuous changes in a malware families, and thereby improve detection rates. Specifically, we use the Neuro-Evolution of Augmenting Topologies (NEAT) algorithm to optimize machine learning classifiers based on decision trees and neural networks. We compare the performance of our NEAT approach to standard models, including random forest and support vector machines

    An Adaptive Feature Centric XG Boost Ensemble Classifier Model for Improved Malware Detection and Classification

    Get PDF
    Machine learning (ML) is often used to solve the problem of malware detection and classification and various machine learning approaches are adapted to the problem of malware classification; still  acquiring poor performance by the way of feature selection, and classification. To manage the issue, an efficient Adaptive Feature Centric XG Boost Ensemble Learner Classifier “AFC-XG Boost” novel algorithm is presented in this paper. The proposed model has been designed to handle varying data sets of malware detection obtained from Kaggle data set. The model turns the process of XG Boost classifier in several stages to optimize the performance. At preprocessing stage, the data set given has been noise removed, normalized and tamper removed using Feature Base Optimizer “FBO” algorithm. The FBO would normalize the data points as well as performs noise removal according to the feature values and their base information. Similarly, the performance of standard XG Boost has been optimized by adapting Feature selection using Class Based Principle Component Analysis “CBPCA” algorithm, which performs feature selection according to the fitness of any feature for different classes. Based on the selected features, the method generates regression tree for each feature considered. Based on the generated trees, the method performs classification by computing Tree Level Ensemble Similarity “TLES” and Class Level Ensemble Similarity “CLES”. Using both method computes the value of Class Match Similarity “CMS” based on which the malware has been classified. The proposed approach achieves 97% accuracy in malware detection and classification with the less time complexity of 34 seconds for 75000 sample
    • …
    corecore