12 research outputs found
Simple Substitution Distance and Metamorphic Detection
To evade signature-based detection, metamorphic viruses transform their code before infecting a new system. Software similarity measures are potentially useful as a means of detecting metamorphic malware. We can compare a given file to a known sample of malware and compute their similarity—if they are sufficiently similar, we classify the file as malware of the same family. The goal of this project is to analyze an opcode-based software similarity measure inspired by simple substitution cipher cryptanalysis
Metamorphic Malware Detection Based on Support Vector Machine Classification of Malware Sub-Signatures
Achieving accurate and efficient metamorphic malware detection remains a challenge. Metamorphic malware is able to mutate and alter its code structure in each infection, with some vital functionality and codesegment remain unchanged. We exploit these unchanged features for detecting metamorphic malware detection using Support Vector Machine(SVM) classifier. n-gram features are extracted directly from sample malware binaries to avoid disassembly, which are then masked with the extracted Snort signature n-grams. These masked features reduce considerably the number of selected n-gram features. Our method is capable to accurately detect metamorphic malware with ~99 % accuracy and low false positive rate. The proposed method is also superior than commercially available anti-viruses in detecting metamorphicmalware
Effective methods to detect metamorphic malware: A systematic review
The succeeding code for metamorphic Malware is routinely rewritten to
remain stealthy and undetected within infected environments. This characteristic is
maintained by means of encryption and decryption methods, obfuscation through
garbage code insertion, code transformation and registry modification which makes
detection very challenging. The main objective of this study is to contribute an
evidence-based narrative demonstrating the effectiveness of recent proposals. Sixteen
primary studies were included in this analysis based on a pre-defined protocol. The
majority of the reviewed detection methods used Opcode, Control Flow Graph (CFG)
and API Call Graph. Key challenges facing the detection of metamorphic malware
include code obfuscation, lack of dynamic capabilities to analyse code and application
difficulty. Methods were further analysed on the basis of their approach, limitation,
empirical evidence and key parameters such as dataset, Detection Rate (DR) and
False Positive Rate (FPR)
Similarity-based Android Malware Detection Using Hamming Distance of Static Binary Features
In this paper, we develop four malware detection methods using Hamming
distance to find similarity between samples which are first nearest neighbors
(FNN), all nearest neighbors (ANN), weighted all nearest neighbors (WANN), and
k-medoid based nearest neighbors (KMNN). In our proposed methods, we can
trigger the alarm if we detect an Android app is malicious. Hence, our
solutions help us to avoid the spread of detected malware on a broader scale.
We provide a detailed description of the proposed detection methods and related
algorithms. We include an extensive analysis to asses the suitability of our
proposed similarity-based detection methods. In this way, we perform our
experiments on three datasets, including benign and malware Android apps like
Drebin, Contagio, and Genome. Thus, to corroborate the actual effectiveness of
our classifier, we carry out performance comparisons with some state-of-the-art
classification and malware detection algorithms, namely Mixed and Separated
solutions, the program dissimilarity measure based on entropy (PDME) and the
FalDroid algorithms. We test our experiments in a different type of features:
API, intent, and permission features on these three datasets. The results
confirm that accuracy rates of proposed algorithms are more than 90% and in
some cases (i.e., considering API features) are more than 99%, and are
comparable with existing state-of-the-art solutions.Comment: 20 pages, 8 figures, 11 tables, FGCS Elsevier journa
An enhanced performance model for metamorphic computer virus classification and detectioN
Metamorphic computer virus employs various code mutation techniques to change its code to become new generations. These generations have similar behavior and functionality and yet, they could not be detected by most commercial antivirus because their solutions depend on a signature database and make use of string signature-based detection methods. However, the antivirus detection engine can be avoided by metamorphism techniques. The purpose of this study is to develop a performance model based on computer virus classification and detection. The model would also be able to examine portable executable files that would classify and detect metamorphic computer viruses. A Hidden Markov Model implemented on portable executable files was employed to classify and detect the metamorphic viruses. This proposed model that produce common virus statistical patterns was evaluated by comparing the results with previous related works and famous commercial antiviruses. This was done by investigating the metamorphic computer viruses and their features, and the existing classifications and detection methods. Specifically, this model was applied on binary format of portable executable files and it was able to classify if the files belonged to a virus family. Besides that, the performance of the model, practically implemented and tested, was also evaluated based on detection rate and overall accuracy. The findings indicated that the proposed model is able to classify and detect the metamorphic virus variants in portable executable file format with a high average of 99.7% detection rate. The implementation of the model is proven useful and applicable for antivirus programs
Machine learning classification for advanced malware detection
This introductory document discusses topics related to malware detection via the application
of machine learning algorithms. It is intended as a supplement to the published work
submitted (a complete list of which can be found in Table 1) and outlines the motivation
behind the experiments.
The document begins with the following sections:
• Section 2 presents a preliminary discussion of the research methodology employed.
• Section 3 presents the background analysis of malware detection in general, and the
use of machine learning.
• Section 4 provides a brief introduction of the most common machine learning
algorithms in current use.
The remaining sections present the main body of the experimental work, which lead to the
conclusions in Section 10.
• Section 5 analyzes different initialization strategies for machine learning models, with
a view to ensuring that the most effective training and testing strategy is employed.
Following this, a purely dynamic approach is proposed, which results in perfect
classification of the samples against benign files, and therefore provides a baseline
against which the performance of subsequent static approaches can be compared.
• Section 6 introduces the static-based tests, beginning with the challenging problem of
zero-day detection samples, i.e. malware samples for which not enough data has been
gathered yet to train the machine learning models.
• Section 7 describes the testing of several different approaches to static malware
detection. During these tests, the effectiveness of these algorithms is analyzed and
compared with other means of classification.
7
• Section 8 proposes and compares techniques to boost the detection accuracy by
combining the scores obtained from other detection algorithms, with a view to
improving static classification scores and thus reach the perfect detection obtained
with dynamic features.
• Section 9 tests the effectiveness of generic malware models by assessing the detection
effectiveness of a generic malware model trained on several different families. The
experiments are intended to introduce a more realistic scenario where a single,
comprehensive, machine learning model is used to detect several families. This
Section shows the difficulty to build a single model to detect several malware families