6 research outputs found

    The arms race: adversarial search defeats entropy used to detect malware

    Get PDF
    Malware creators have been getting their way for too long now. String-based similarity measures can leverage ground truth in a scalable way and can operate at a level of abstraction that is difficult to combat from the code level. At the string level, information theory and, specifically, entropy play an important role related to detecting patterns altered by concealment strategies, such as polymorphism or encryption. Controlling the entropy levels in different parts of a disk resident executable allows an analyst to detect malware or a black hat to evade the detection. This paper shows these two perspectives into two scalable entropy-based tools: EnTS and EEE. EnTS, the detection tool, shows the effectiveness of detecting entropy patterns, achieving 100% precision with 82% accuracy. It outperforms VirusTotal for accuracy on combined Kaggle and VirusShare malware. EEE, the evasion tool, shows the effectiveness of entropy as a concealment strategy, attacking binary-based state of the art detectors. It learns their detection patterns in up to 8 generations of its search process, and increments their false negative rate from range 0–9%, up to the range 90–98.7%

    Picking on the family: disrupting android malware triage by forcing misclassification

    Get PDF
    Machine learning classification algorithms are widely applied to different malware analysis problems because of their proven abilities to learn from examples and perform relatively well with little human input. Use cases include the labelling of malicious samples according to families during triage of suspected malware. However, automated algorithms are vulnerable to attacks. An attacker could carefully manipulate the sample to force the algorithm to produce a particular output. In this paper we discuss one such attack on Android malware classifiers. We design and implement a prototype tool, called IagoDroid, that takes as input a malware sample and a target family, and modifies the sample to cause it to be classified as belonging to this family while preserving its original semantics. Our technique relies on a search process that generates variants of the original sample without modifying their semantics. We tested IagoDroid against RevealDroid, a recent, open source, Android malware classifier based on a variety of static features. IagoDroid successfully forces misclassification for 28 of the 29 representative malware families present in the DREBIN dataset. Remarkably, it does so by modifying just a single feature of the original malware. On average, it finds the first evasive sample in the first search iteration, and converges to a 100% evasive population within 4 iterations. Finally, we introduce RevealDroid*, a more robust classifier that implements several techniques proposed in other adversarial learning domains. Our experiments suggest that RevealDroid* can correctly detect up to 99% of the variants generated by IagoDroid

    Adversarial Classification: An Ensemble-based approach

    No full text
    Spam has been studied and dealt with extensively in the email, web and, recently, the blog domain. Recent work has addressed the problem of non- stationarity of data us-ing ensemble based approaches. Adversarial classification has been handled by retraining base classifiers using labeled samples obtained from the ensemble. However, frequent re-training is expensive. The need is to dynamically determine when the classifiers should be retrained and to retrain only those classifiers that are performing poorly. We show how mu-tual agreement between classifiers can be use to reduce retraining time, measure runtime performance, and keep track of the weakest performing classifier. We back our research with experimental results using real life data from blogs as a special case of spam

    Adversarial Classification: An Ensemble-based approach

    No full text
    Spam has been studied and dealt with extensively in the email, web and, recently, the blog domain. Recent work has addressed the problem of non- stationarity of data us-ing ensemble based approaches. Adversarial classification has been handled by retraining base classifiers using labeled samples obtained from the ensemble. However, frequent re-training is expensive. The need is to dynamically determine when the classifiers should be retrained and to retrain only those classifiers that are performing poorly. We show how mu-tual agreement between classifiers can be use to reduce retraining time, measure runtime performance, and keep track of the weakest performing classifier. We back our research with experimental results using real life data from blogs as a special case of spam
    corecore