33 research outputs found
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
We propose a novel method to detect and visualize malware through image classification. The executable binaries are represented as grayscale images obtained from the count of N-grams (N=2) of bytes in the Discrete Cosine Transform (DCT) domain and a neural network is trained for malware detection. A shallow neural network is trained for classification, and its accuracy is compared with deep-network architectures such as ResNet that are trained using transfer learning. Neither dis-assembly nor behavioral analysis of malware is required for these methods. Motivated by the visual similarity of these images for different malware families, we compare our deep neural network models with standard image features like GIST descriptors to evaluate the performance. A joint feature measure is proposed to combine different features using error analysis to get an accurate ensemble model for improved classification performance. A new dataset called MaleX which contains around 1 million malware and benign Windows executable samples is created for large-scale malware detection and classification experiments. Experimental results are quite promising with 96% binary classification accuracy on MaleX. The proposed model is also able to generalize well on larger unseen malware samples and the results compare favorably with state-of-the-art static analysis-based malware detection algorithms
MalGrid: Visualization Of Binary Features In Large Malware Corpora
The number of malware is constantly on the rise. Though most new malware are
modifications of existing ones, their sheer number is quite overwhelming. In
this paper, we present a novel system to visualize and map millions of malware
to points in a 2-dimensional (2D) spatial grid. This enables visualizing
relationships within large malware datasets that can be used to develop triage
solutions to screen different malware rapidly and provide situational
awareness. Our approach links two visualizations within an interactive display.
Our first view is a spatial point-based visualization of similarity among the
samples based on a reduced dimensional projection of binary feature
representations of malware. Our second spatial grid-based view provides a
better insight into similarities and differences between selected malware
samples in terms of the binary-based visual representations they share. We also
provide a case study where the effect of packing on the malware data is
correlated with the complexity of the packing algorithm.Comment: Submitted version - MILCOM 2022 IEEE Military Communications
Conference. The high-quality images in this paper can be found on Github
(https://github.com/Mayachitra-Inc/MalGrid
Disarming visualization-based approaches in malware detection systems
Visualization-based approaches have recently been used in conjunction with signature-based techniques to detect variants of malware files. Indeed, it is sufficient to modify some byte of executable files to modify the signature and, thus, to elude a signature-based detector. In this paper, we design a GAN-based architecture that allows an attacker to generate variants of a malware in which the malware patterns found by visualization-based approaches are hidden, thus producing a new version of the malware that is not detected by both signature-based and visualization-based techniques. The experiments carried out on a well-known malware dataset show a success rate of 100% in generating new variants of malware files that are not detected from the state-of-the-art visualization-based technique
Explainable Malware Detection System Using Transformers-Based Transfer Learning and Multi-Model Visual Representation
Android has become the leading mobile ecosystem because of its accessibility and adaptability. It has also become the primary target of widespread malicious apps. This situation needs the immediate implementation of an effective malware detection system. In this study, an explainable malware detection system was proposed using transfer learning and malware visual features. For effective malware detection, our technique leverages both textual and visual features. First, a pre-trained model called the Bidirectional Encoder Representations from Transformers (BERT) model was designed to extract the trained textual features. Second, the malware-to-image conversion algorithm was proposed to transform the network byte streams into a visual representation. In addition, the FAST (Features from Accelerated Segment Test) extractor and BRIEF (Binary Robust Independent Elementary Features) descriptor were used to efficiently extract and mark important features. Third, the trained and texture features were combined and balanced using the Synthetic Minority Over-Sampling (SMOTE) method; then, the CNN network was used to mine the deep features. The balanced features were then input into the ensemble model for efficient malware classification and detection. The proposed method was analyzed extensively using two public datasets, CICMalDroid 2020 and CIC-InvesAndMal2019. To explain and validate the proposed methodology, an interpretable artificial intelligence (AI) experiment was conducted
Cyber-threat detection system using a hybrid approach of transfer learning and multi-model image representation
Currently, Android apps are easily targeted by malicious network traffic because of their constant network access. These threats have the potential to steal vital information and disrupt the commerce, social system, and banking markets. In this paper, we present a malware detection system based on word2vec-based transfer learning and multi-model image representation. The proposed method combines the textual and texture features of network traffic to leverage the advantages of both types. Initially, the transfer learning method is used to extract trained vocab from network traffic. Then, the malware-to-image algorithm visualizes network bytes for visual analysis of data traffic. Next, the texture features are extracted from malware images using a combination of scale-invariant feature transforms (SIFTs) and oriented fast and rotated brief transforms (ORBs). Moreover, a convolutional neural network (CNN) is designed to extract deep features from a set of trained vocab and texture features. Finally, an ensemble model is designed to classify and detect malware based on the combination of textual and texture features. The proposed method is tested using two standard datasets, CIC-AAGM2017 and CICMalDroid 2020, which comprise a total of 10.2K malware and 3.2K benign samples. Furthermore, an explainable AI experiment is performed to interpret the proposed approach
From Malware Samples to Fractal Images: A New Paradigm for Classification. (Version 2.0, Previous version paper name: Have you ever seen malware?)
To date, a large number of research papers have been written on the
classification of malware, its identification, classification into different
families and the distinction between malware and goodware. These works have
been based on captured malware samples and have attempted to analyse malware
and goodware using various techniques, including techniques from the field of
artificial intelligence. For example, neural networks have played a significant
role in these classification methods. Some of this work also deals with
analysing malware using its visualisation. These works usually convert malware
samples capturing the structure of malware into image structures, which are
then the object of image processing. In this paper, we propose a very
unconventional and novel approach to malware visualisation based on dynamic
behaviour analysis, with the idea that the images, which are visually very
interesting, are then used to classify malware concerning goodware. Our
approach opens an extensive topic for future discussion and provides many new
directions for research in malware analysis and classification, as discussed in
conclusion. The results of the presented experiments are based on a database of
6 589 997 goodware, 827 853 potentially unwanted applications and 4 174 203
malware samples provided by ESET and selected experimental data (images,
generating polynomial formulas and software generating images) are available on
GitHub for interested readers. Thus, this paper is not a comprehensive compact
study that reports the results obtained from comparative experiments but rather
attempts to show a new direction in the field of visualisation with possible
applications in malware analysis.Comment: This paper is under review; the section describing conversion from
malware structure to fractal figure is temporarily erased here to protect our
idea. It will be replaced by a full version when accepte