11 research outputs found
Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries
Recent work has shown that deep-learning algorithms for malware detection are also susceptible to adversarial examples, i.e., carefully-crafted perturbations to input malware that enable misleading classification. Although this has questioned their suitability for this task, it is not yet clear why such algorithms are easily fooled also in this particular application domain. In this work, we take a first step to tackle this issue by leveraging explainable machine-learning algorithms developed to interpret the black-box decisions of deep neural networks. In particular, we use an explainable technique known as feature attribution to identify the most influential input features contributing to each decision, and adapt it to provide meaningful explanations to the classification of malware binaries. In this case, we find that a recently-proposed convolutional neural network does not learn any meaningful characteristic for malware detection from the data and text sections of executable files, but rather tends to learn to discriminate between benign and malware samples based on the characteristics found in the file header. Based on this finding, we propose a novel attack algorithm that generates adversarial malware binaries by only changing few tens of bytes in the file header. With respect to the other state-of-the-art attack algorithms, our attack does not require injecting any padding bytes at the end of the file, and it is much more efficient, as it requires manipulating much fewer bytes
Explaining vulnerabilities of deep learning to adversarial malware binaries
Recent work has shown that deep-learning algorithms for malware detection are
also susceptible to adversarial examples, i.e., carefully-crafted perturbations
to input malware that enable misleading classification. Although this has
questioned their suitability for this task, it is not yet clear why such
algorithms are easily fooled also in this particular application domain. In
this work, we take a first step to tackle this issue by leveraging explainable
machine-learning algorithms developed to interpret the black-box decisions of
deep neural networks. In particular, we use an explainable technique known as
feature attribution to identify the most influential input features
contributing to each decision, and adapt it to provide meaningful explanations
to the classification of malware binaries. In this case, we find that a
recently-proposed convolutional neural network does not learn any meaningful
characteristic for malware detection from the data and text sections of
executable files, but rather tends to learn to discriminate between benign and
malware samples based on the characteristics found in the file header. Based on
this finding, we propose a novel attack algorithm that generates adversarial
malware binaries by only changing few tens of bytes in the file header. With
respect to the other state-of-the-art attack algorithms, our attack does not
require injecting any padding bytes at the end of the file, and it is much more
efficient, as it requires manipulating much fewer bytes
Analysis of Android malware detection techniques: a systematic review
The emergence and rapid development in complexity and popularity of Android mobile phones has created proportionate destructive effects from the world of cyber-attack. Android based device platform is experiencing great threats from different attack angles such as DoS, Botnets, phishing, social engineering, malware and others. Among these threats, malware attacks on android phones has become a daily occurrence. This is due to the fact that Android has millions of user, high computational abilities, popularity, and other essential attributes. These factors influence cybercriminals (especially malware writers) to focus on Android for financial gain, political interest, and revenge. This calls for effective techniques that could detect these malicious applications on android devices. The aim of this paper is to provide a systematic review of the malware detection techniques used for android devices. The results show that most detection techniques are not very effective to detect zero-day malware and other variants that deploy obfuscation to evade detection. The critical appraisal of the study identified some of the limitations in the detection techniques that need improvement for better detection
Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection
Recent works within machine learning have been tackling inputs of
ever-increasing size, with cybersecurity presenting sequence classification
problems of particularly extreme lengths. In the case of Windows executable
malware detection, inputs may exceed MB, which corresponds to a time
series with steps. To date, the closest approach to handling
such a task is MalConv, a convolutional neural network capable of processing up
to steps. The memory of CNNs has prevented
further application of CNNs to malware. In this work, we develop a new approach
to temporal max pooling that makes the required memory invariant to the
sequence length . This makes MalConv more memory efficient, and
up to faster to train on its original dataset, while removing the
input length restrictions to MalConv. We re-invest these gains into improving
the MalConv architecture by developing a new Global Channel Gating design,
giving us an attention mechanism capable of learning feature interactions
across 100 million time steps in an efficient manner, a capability lacked by
the original MalConv CNN. Our implementation can be found at
https://github.com/NeuromorphicComputationResearchProgram/MalConv2Comment: To appear in AAAI 202
Formalizing evasion attacks against machine learning security detectors
Recent work has shown that adversarial examples can bypass machine learning-based threat detectors relying on static analysis by applying minimal perturbations.
To preserve malicious functionality, previous attacks either apply trivial manipulations (e.g. padding), potentially limiting their effectiveness, or require running computationally-demanding validation steps to discard adversarial variants that do not correctly execute in sandbox environments.
While machine learning systems for detecting SQL injections have been proposed in the literature, no attacks have been tested against the proposed solutions to assess the effectiveness and robustness of these methods.
In this thesis, we overcome these limitations by developing RAMEn, a unifying framework that (i) can express attacks for different domains, (ii) generalizes previous attacks against machine learning models, and (iii) uses functions that preserve the functionality of manipulated objects.
We provide new attacks for both Windows malware and SQL injection detection scenarios by exploiting the format used for representing these objects.
To show the efficacy of RAMEn, we provide experimental results of our strategies in both white-box and black-box settings.
The white-box attacks against Windows malware detectors show that it takes only the 2% of the input size of the target to evade detection with ease.
To further speed up the black-box attacks, we overcome the issues mentioned before by presenting a novel family of black-box attacks that are both query-efficient and functionality-preserving, as they rely on the injection of benign content, which will never be executed, either at the end of the malicious file, or within some newly-created sections, encoded in an algorithm called GAMMA.
We also evaluate whether GAMMA transfers to other commercial antivirus solutions, and surprisingly find that it can evade many commercial antivirus engines.
For evading SQLi detectors, we create WAF-A-MoLE, a mutational fuzzer that that exploits random mutations of the input samples, keeping alive only the most promising ones.
WAF-A-MoLE is capable of defeating detectors built with different architectures by using the novel practical manipulations we have proposed.
To facilitate reproducibility and future work, we open-source our framework and corresponding attack implementations.
We conclude by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts naturally into the learning process