408 research outputs found

    From Malware Samples to Fractal Images: A New Paradigm for Classification. (Version 2.0, Previous version paper name: Have you ever seen malware?)

    Full text link
    To date, a large number of research papers have been written on the classification of malware, its identification, classification into different families and the distinction between malware and goodware. These works have been based on captured malware samples and have attempted to analyse malware and goodware using various techniques, including techniques from the field of artificial intelligence. For example, neural networks have played a significant role in these classification methods. Some of this work also deals with analysing malware using its visualisation. These works usually convert malware samples capturing the structure of malware into image structures, which are then the object of image processing. In this paper, we propose a very unconventional and novel approach to malware visualisation based on dynamic behaviour analysis, with the idea that the images, which are visually very interesting, are then used to classify malware concerning goodware. Our approach opens an extensive topic for future discussion and provides many new directions for research in malware analysis and classification, as discussed in conclusion. The results of the presented experiments are based on a database of 6 589 997 goodware, 827 853 potentially unwanted applications and 4 174 203 malware samples provided by ESET and selected experimental data (images, generating polynomial formulas and software generating images) are available on GitHub for interested readers. Thus, this paper is not a comprehensive compact study that reports the results obtained from comparative experiments but rather attempts to show a new direction in the field of visualisation with possible applications in malware analysis.Comment: This paper is under review; the section describing conversion from malware structure to fractal figure is temporarily erased here to protect our idea. It will be replaced by a full version when accepte

    Analysis and evaluation of SafeDroid v2.0, a framework for detecting malicious Android applications

    Get PDF
    Android smartphones have become a vital component of the daily routine of millions of people, running a plethora of applications available in the official and alternative marketplaces. Although there are many security mechanisms to scan and filter malicious applications, malware is still able to reach the devices of many end-users. In this paper, we introduce the SafeDroid v2.0 framework, that is a flexible, robust, and versatile open-source solution for statically analysing Android applications, based on machine learning techniques. The main goal of our work, besides the automated production of fully sufficient prediction and classification models in terms of maximum accuracy scores and minimum negative errors, is to offer an out-of-the-box framework that can be employed by the Android security researchers to efficiently experiment to find effective solutions: the SafeDroid v2.0 framework makes it possible to test many different combinations of machine learning classifiers, with a high degree of freedom and flexibility in the choice of features to consider, such as dataset balance and dataset selection. The framework also provides a server, for generating experiment reports, and an Android application, for the verification of the produced models in real-life scenarios. An extensive campaign of experiments is also presented to show how it is possible to efficiently find competitive solutions: the results of our experiments confirm that SafeDroid v2.0 can reach very good performances, even with highly unbalanced dataset inputs and always with a very limited overhead

    Neural malware detection

    Get PDF
    At the heart of today’s malware problem lies theoretically infinite diversity created by metamorphism. The majority of conventional machine learning techniques tackle the problem with the assumptions that a sufficiently large number of training samples exist and that the training set is independent and identically distributed. However, the lack of semantic features combined with the models under these wrong assumptions result largely in overfitting with many false positives against real world samples, resulting in systems being left vulnerable to various adversarial attacks. A key observation is that modern malware authors write a script that automatically generates an arbitrarily large number of diverse samples that share similar characteristics in program logic, which is a very cost-effective way to evade detection with minimum effort. Given that many malware campaigns follow this paradigm of economic malware manufacturing model, the samples within a campaign are likely to share coherent semantic characteristics. This opens up a possibility of one-to-many detection. Therefore, it is crucial to capture this non-linear metamorphic pattern unique to the campaign in order to detect these seemingly diverse but identically rooted variants. To address these issues, this dissertation proposes novel deep learning models, including generative static malware outbreak detection model, generative dynamic malware detection model using spatio-temporal isomorphic dynamic features, and instruction cognitive malware detection. A comparative study on metamorphic threats is also conducted as part of the thesis. Generative adversarial autoencoder (AAE) over convolutional network with global average pooling is introduced as a fundamental deep learning framework for malware detection, which captures highly complex non-linear metamorphism through translation invariancy and local variation insensitivity. Generative Adversarial Network (GAN) used as a part of the framework enables oneshot training where semantically isomorphic malware campaigns are identified by a single malware instance sampled from the very initial outbreak. This is a major innovation because, to the best of our knowledge, no approach has been found to this challenging training objective against the malware distribution that consists of a large number of very sparse groups artificially driven by arms race between attackers and defenders. In addition, we propose a novel method that extracts instruction cognitive representation from uninterpreted raw binary executables, which can be used for oneto- many malware detection via one-shot training against frequency spectrum of the Transformer’s encoded latent representation. The method works regardless of the presence of diverse malware variations while remaining resilient to adversarial attacks that mostly use random perturbation against raw binaries. Comprehensive performance analyses including mathematical formulations and experimental evaluations are provided, with the proposed deep learning framework for malware detection exhibiting a superior performance over conventional machine learning methods. The methods proposed in this thesis are applicable to a variety of threat environments here artificially formed sparse distributions arise at the cyber battle fronts.Doctor of Philosoph

    A Real-Time and Adaptive-Learning Malware Detection Method Based on API-Pair Graph

    Get PDF
    The detection of malware have developed for many years, and the appearance of new machine learning and deep learning techniques have improved the effect of detectors. However, most of current researches have focused on the general features of malware and ignored the development of the malware themselves, so that the features could be useless with the time passed as well as the advance of malware techniques. Besides, the detection methods based on machine learning are mainly static detection and analysis, while the study of real-time detection of malware is relatively rare. In this article, we proposed a new model that could detect malware real-time in principle and learn new features adaptively. Firstly, a new data structure of API-Pair was adopted, and the constructed data was trained with Maximum Entropy model, which could satisfy the goal of weighting and adaptive learning. Then a clustering was practised to filter relatively unrelated and confusing features. Moreover, a detector based on Lont Short Term Memory Network (LSTM) was devised to achieve the goal of real-time detection. Finally, a series of experiments were designed to verify our method. The experimental results showed that our model could obtain the highest accuracy of 99.07% in general tests and keep the accuracies above 97% with the development of malware; the results also proved the feasibility of our model in real-time detection through the simulation experiment, and robustness against a typical adversarial attack

    Image-based malware classification: A space filling curve approach

    Get PDF
    Anti-virus (AV) software is effective at distinguishing between benign and malicious programs yet lack the ability to effectively classify malware into their respective family classes. AV vendors receive considerably large volumes of malicious programs daily and so classification is crucial to quickly identify variants of existing malware that would otherwise have to be manually examined. This paper proposes a novel method of visualizing and classifying malware using Space-Filling Curves (SFC\u27s) in order to improve the limitations of AV tools. The classification models produced were evaluated on previously unseen samples and showed promising results, with precision, recall and accuracy scores of 82%, 80% and 83% respectively. Furthermore, a comparative assessment with previous research and current AV technologies revealed that the method presented her was robust, outperforming most commercial and open-source AV scanner software programs

    MDFRCNN: Malware Detection using Faster Region Proposals Convolution Neural Network

    Get PDF
    Technological advancement of smart devices has opened up a new trend: Internet of Everything (IoE), where all devices are connected to the web. Large scale networking benefits the community by increasing connectivity and giving control of physical devices. On the other hand, there exists an increased ‘Threat’ of an ‘Attack’. Attackers are targeting these devices, as it may provide an easier ‘backdoor entry to the users’ network’.MALicious softWARE (MalWare) is a major threat to user security. Fast and accurate detection of malware attacks are the sine qua non of IoE, where large scale networking is involved. The paper proposes use of a visualization technique where the disassembled malware code is converted into gray images, as well as use of Image Similarity based Statistical Parameters (ISSP) such as Normalized Cross correlation (NCC), Average difference (AD), Maximum difference (MaxD), Singular Structural Similarity Index Module (SSIM), Laplacian Mean Square Error (LMSE), MSE and PSNR. A vector consisting of gray image with statistical parameters is trained using a Faster Region proposals Convolution Neural Network (F-RCNN) classifier. The experiment results are promising as the proposed method includes ISSP with F-RCNN training. Overall training time of learning the semantics of higher-level malicious behaviors is less. Identification of malware (testing phase) is also performed in less time. The fusion of image and statistical parameter enhances system performance with greater accuracy. The benchmark database from Microsoft Malware Classification challenge has been used to analyze system performance, which is available on the Kaggle website. An overall average classification accuracy of 98.12% is achieved by the proposed method

    Image-based malware classification hybrid framework based on space-filling curves

    Get PDF
    There exists a never-ending “arms race” between malware analysts and adversarial malicious code developers as malevolent programs evolve and countermeasures are developed to detect and eradicate them. Malware has become more complex in its intent and capabilities over time, which has prompted the need for constant improvement in detection and defence methods. Of particular concern are the anti-analysis obfuscation techniques, such as packing and encryption, that are employed by malware developers to evade detection and thwart the analysis process. In such cases, malware is generally impervious to basic analysis methods and so analysts must use more invasive techniques to extract signatures for classification, which are inevitably not scalable due to their complexity. In this article, we present a hybrid framework for malware classification designed to overcome the challenges incurred by current approaches. The framework incorporates novel static and dynamic malware analysis methods, where static malware executables and dynamic process memory dumps are converted to images mapped through space-filling curves, from which visual features are extracted for classification. The framework is less invasive than traditional analysis methods in that there is no reverse engineering required, nor does it suffer from the obfuscation limitations of static analysis. On a dataset of 13,599 obfuscated and non-obfuscated malware samples from 23 families, the framework outperformed both static and dynamic standalone methods with precision, recall and accuracy scores of 97.6%, 97.6% and 97.6% respectively

    Analysis of Feature Categories for Malware Visualization

    Get PDF
    It is important to know which features are more effective for certain visualization types. Furthermore, selecting an appropriate visualization tool plays a key role in descriptive, diagnostic, predictive and prescriptive analytics. Moreover, analyzing the activities of malicious scripts or codes is dependent on the extracted features. In this paper, the authors focused on reviewing and classifying the most common extracted features that have been used for malware visualization based on specified categories. This study examines the features categories and its usefulness for effective malware visualization. Additionally, it focuses on the common extracted features that have been used in the malware visualization domain. Therefore, the conducted literature review finding revealed that the features could be categorized into four main categories, namely, static, dynamic, hybrid, and application metadata. The contribution of this research paper is about feature selection for illustrating which features are effective with which visualization tools for malware visualization
    • 

    corecore