Search CORE

566 research outputs found

Cyber-threat detection system using a hybrid approach of transfer learning and multi-model image representation

Author: Cheng X.
Cheng X.
Mostarda L.
Mostarda L.
Naeem M.
Naeem M.
Rho S.
Rho S.
Ullah F.
Ullah F.
Ullah S.
Ullah S.
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2022
Field of study

Currently, Android apps are easily targeted by malicious network traffic because of their constant network access. These threats have the potential to steal vital information and disrupt the commerce, social system, and banking markets. In this paper, we present a malware detection system based on word2vec-based transfer learning and multi-model image representation. The proposed method combines the textual and texture features of network traffic to leverage the advantages of both types. Initially, the transfer learning method is used to extract trained vocab from network traffic. Then, the malware-to-image algorithm visualizes network bytes for visual analysis of data traffic. Next, the texture features are extracted from malware images using a combination of scale-invariant feature transforms (SIFTs) and oriented fast and rotated brief transforms (ORBs). Moreover, a convolutional neural network (CNN) is designed to extract deep features from a set of trained vocab and texture features. Finally, an ensemble model is designed to classify and detect malware based on the combination of textual and texture features. The proposed method is tested using two standard datasets, CIC-AAGM2017 and CICMalDroid 2020, which comprise a total of 10.2K malware and 3.2K benign samples. Furthermore, an explainable AI experiment is performed to interpret the proposed approach

Middlesex University Research Repository

Malicious code detection in android : the role of sequence characteristics and disassembling methods

Author: Acartürk Cengiz
Balikcioglu Pinar G.
Kucuk Ozge A.
Sirlanci Melih
Turkmen Ramazan K.
Ulukapi Bulut
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers

Jagiellonian Univeristy Repository

Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware

Author: Canfora Gerardo
De Lorenzo Andrea
Medvet Eric
Mercaldo Francesco
Visaggio Corrado Aaron
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

With the wide diffusion of smartphones and their usage in a plethora of processes and activities, these devices have been handling an increasing variety of sensitive resources. Attackers are hence producing a large number of malware applications for Android (the most spread mobile platform), often by slightly modifying existing applications, which results in malware being organized in families. Some works in the literature showed that opcodes are informative for detecting malware, not only in the Android platform. In this paper, we investigate if frequencies of ngrams of opcodes are effective in detecting Android malware and if there is some significant malware family for which they are more or less effective. To this end, we designed a method based on state-of-the-art classifiers applied to frequencies of opcodes ngrams. Then, we experimentally evaluated it on a recent dataset composed of 11120 applications, 5560 of which are malware belonging to several different families. Results show that an accuracy of 97% can be obtained on the average, whereas perfect detection rate is achieved for more than one malware family

Archivio istituzionale della ricerca - Università di Trieste

Explainable Malware Detection System Using Transformers-Based Transfer Learning and Multi-Model Visual Representation

Author: Alomari Abdullah
Alshahrani Mohammed Mujib
Alsirhani Amjad
Naeem Hamad
Shah Syed Aziz
Ullah Farhan
Publication venue: 'MDPI AG'
Publication date: 01/09/2022
Field of study

Android has become the leading mobile ecosystem because of its accessibility and adaptability. It has also become the primary target of widespread malicious apps. This situation needs the immediate implementation of an effective malware detection system. In this study, an explainable malware detection system was proposed using transfer learning and malware visual features. For effective malware detection, our technique leverages both textual and visual features. First, a pre-trained model called the Bidirectional Encoder Representations from Transformers (BERT) model was designed to extract the trained textual features. Second, the malware-to-image conversion algorithm was proposed to transform the network byte streams into a visual representation. In addition, the FAST (Features from Accelerated Segment Test) extractor and BRIEF (Binary Robust Independent Elementary Features) descriptor were used to efficiently extract and mark important features. Third, the trained and texture features were combined and balanced using the Synthetic Minority Over-Sampling (SMOTE) method; then, the CNN network was used to mine the deep features. The balanced features were then input into the ensemble model for efficient malware classification and detection. The proposed method was analyzed extensively using two public datasets, CICMalDroid 2020 and CIC-InvesAndMal2019. To explain and validate the proposed methodology, an interpretable artificial intelligence (AI) experiment was conducted

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Coventry University Pure Portal

Improving malware detection with neuroevolution : a study with the semantic learning machine

Author: Teixeira Mário José Santos
Publication venue
Publication date: 01/07/2019
Field of study

Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceMachine learning has become more attractive over the years due to its remarkable adaptation and problem-solving abilities. Algorithms compete amongst each other to claim the best possible results for every problem, being one of the most valued characteristics their generalization ability. A recently proposed methodology of Genetic Programming (GP), called Geometric Semantic Genetic Programming (GSGP), has seen its popularity rise over the last few years, achieving great results compared to other state-of-the-art algorithms, due to its remarkable feature of inducing a fitness landscape with no local optima solutions. To any supervised learning problem, where a metric is used as an error function, GSGP’s landscape will be unimodal, therefore allowing for genetic algorithms to behave much more efficiently and effectively. Inspired by GSGP’s features, Gonçalves developed a new mutation operator to be applied to the Neural Networks (NN) domain, creating the Semantic Learning Machine (SLM). Despite GSGP’s good results already proven, there are still research opportunities for improvement, that need to be performed to empirically prove GSGP as a state-of-the-art framework. In this case, the study focused on applying SLM to NNs with multiple hidden layers and compare its outputs to a very popular algorithm, Multilayer Perceptron (MLP), on a considerably large classification dataset about Android malware. Findings proved that SLM, sharing common parametrization with MLP, in order to have a fair comparison, is able to outperform it, with statistical significance

Repositório da Universidade Nova de Lisboa

Experimental comparison of features and classifiers for Android malware detection

Author: Barandiaran I.
Chen K.
Huang C.-Y.
Paszke A.
Witten I. H.
Yuan Z.
Zhang H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study