377 research outputs found
Malware detection based on call graph similarities
S rostoucím množstvím škodlivých souborů se stalo využití strojového učení pro jejich detekci nezbytností. Autoři škodlivých souborů vytváří důmyslnější programy, aby překonali stále se zlepšující antivirovou ochranu. Windows OS zůstává nejčastějším cílem útoků. Viry se často šíří ve formátu Portable Executable (PE). PE soubory mohou být zkoumány pomocí metod statické analýzy, které se hodí pro zpracovávání velkého množství dat. Mnoho antivirových systémů disassembluje soubory a zkoumá jejich kód, který nabízí vhled do funkcionality souboru. Assembly kód je členěn do funkcí. Vztahy mezi funkcemi zachycuje graf volání funkcí (GVF). Tento graf byl zkoumán v literatuře a jeho struktura byla využita k hledání podobností mezi soubory. V poslední době začaly být úspěšně využívány grafové neuronové sítě (GNN) ke zpracování těchto grafů. V naší práci zkoumáme různé druhy a architektury GNN a vzájemně je porovnáváme. Po tom, co vybereme nejlepší GNN model, ho srovnáme s modelem, který nevyužívá grafovou strukturu GVF, abychom zjistili zda tato struktura zlepšuje klasifikační modely. Naši studii provádíme na velkém datasetu o více než 5 milionech PE souborů.Machine learning-powered malware detection systems became a necessity to fight the rising volume of malware. Malware authors create more sophisticated programs to overcome always improving antivirus engines. Windows OS remains the most targeted system, and the malicious payload commonly comes in the Portable executable (PE) file format. PE files can be analyzed with the static analysis methods, which are suitable for processing large amounts of data. Many engines disassemble binaries and study the code, which carries valuable insight into binary behavior. The assembly code is divided into functions that carry the functionality. The relations between functions form a Function Call Graph (FCG). FCG has been studied in the literature, and the graph structure was employed to find similarities between files. Recently, Graph Neural Networks (GNNs) have been adapted to work upon FCGs and are claimed to be performing well. In this work, we study and compare different GNN models and their architectures. After selecting the best GNN model, we compare it with a non-structural model to verify if an FCG structure improves classification models. We perform our empirical study on a large dataset of more than 5 million PE files
DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAndroid malware has continued to grow in volume and complexity posing significant threats to the security of mobile devices and the services they enable. This has prompted increasing interest in employing machine learning to improve Android malware detection. In this paper, we present a novel classifier fusion approach based on a multilevel architecture that enables effective combination of machine learning algorithms for improved accuracy. The framework (called DroidFusion), generates a model by training base classifiers at a lower level and then applies a set of ranking-based algorithms on their predictive accuracies at the higher level in order to derive a final classifier. The induced multilevel DroidFusion model can then be utilized as an improved accuracy predictor for Android malware detection. We present experimental results on four separate datasets to demonstrate the effectiveness of our proposed approach. Furthermore, we demonstrate that the DroidFusion method can also effectively enable the fusion of ensemble learning algorithms for improved accuracy. Finally, we show that the prediction accuracy of DroidFusion, despite only utilizing a computational approach in the higher level, can outperform stacked generalization, a well-known classifier fusion method that employs a meta-classifier approach in its higher level
Detecting Malware with an Ensemble Method Based on Deep Neural Network
Malware detection plays a crucial role in computer security. Recent researches mainly use machine learning based methods heavily relying on domain knowledge for manually extracting malicious features. In this paper, we propose MalNet, a novel malware detection method that learns features automatically from the raw data. Concretely, we first generate a grayscale image from malware file, meanwhile extracting its opcode sequences with the decompilation tool IDA. Then MalNet uses CNN and LSTM networks to learn from grayscale image and opcode sequence, respectively, and takes a stacking ensemble for malware classification. We perform experiments on more than 40,000 samples including 20,650 benign files collected from online software providers and 21,736 malwares provided by Microsoft. The evaluation result shows that MalNet achieves 99.88% validation accuracy for malware detection. In addition, we also take malware family classification experiment on 9 malware families to compare MalNet with other related works, in which MalNet outperforms most of related works with 99.36% detection accuracy and achieves a considerable speed-up on detecting efficiency comparing with two state-of-the-art results on Microsoft malware dataset
TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS
With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems
Static malware detection Using Stacked BiLSTM and GPT-2
In recent years, cyber threats and malicious software attacks have been escalated on various platforms. Therefore, it has become essential to develop automated machine learning methods for defending against malware. In the present study, we propose stacked bidirectional long short-term memory (Stacked
BiLSTM) and generative pre-trained transformer based (GPT-2) deep learning language models for detecting malicious code. We developed language models using assembly instructions extracted from .text sections of malicious and benign Portable Executable (PE) files. We treated each instruction as a sentence and each .text section as a document. We also labeled each sentence and document as benign or malicious, according to the file source. We created three datasets from those sentences and documents. The first dataset, composed of documents, was fed into a Document Level Analysis Model (DLAM) based on Stacked BiLSTM. The second dataset, composed of sentences, was used in Sentence Level Analysis
Models (SLAMs) based on Stacked BiLSTM and DistilBERT, Domain Specific Language Model GPT-2
(DSLM-GPT2), and General Language Model GPT-2 (GLM-GPT2). Lastly, we merged all assembly
instructions without labels for creating the third dataset; then we fed a custom pre-trained model with it.
We then compared malware detection performances. The results showed that the pre-trained model improved the DSLM-GPT2 and GLM-GPT2 detection performance. The experiments showed that the DLAM, the SLAM based on DistilBERT, the DSLM-GPT2, and the GLM-GPT2 achieved 98.3%, 70.4%, 86.0%, and 76.2% F1 scores, respectively
Recommended from our members
Investigating Android permissions and intents for malware detection
Today’s smart phones are used for wider range of activities. This extended range of functionalities has also seen the infiltration of new security threats. Android has been the favorite target of cyber criminals. The malicious parties are using highly stealthy techniques to perform the targeted operations, which are hard to detect by the conventional signature and behaviour based approaches. Additionally, the limited resources of mobile device are inadequate to perform the extensive malware detection tasks. Impulsively emerging Android malware merit a robust and effective malware detection solution.
In this thesis, we present the PIndroid ― a novel Permissions and Intents based framework for identifying Android malware apps. To the best of author’s knowledge, PIndroid is the first solution that uses a combination of permissions and intents supplemented with ensemble methods for malware detection. It overcomes the drawbacks of some of the existing malware detection methods. Our goal is to provide mobile users with an effective malware detection and prevention solution keeping in view the limited resources of mobile devices and versatility of malware behavior. Our detection engine classifies the apps against certain distinguishing combinations of permissions and intents. We conducted a comparative study of different machine learning algorithms against several performance measures to demonstrate their relative advantages. The proposed approach, when applied to 1,745 real world applications, provides more than 99% accuracy (which is best reported to date). Empirical results suggest that the proposed framework is effective in detection of malware apps including the obfuscated ones.
In this thesis, we also present AndroPIn—an Android based malware detection algorithm using Permissions and Intents. It is designed with the methodology proposed in PInDroid. AndroPIn overcomes the limitation of stealthy techniques used by malware by exploiting the usage pattern of permissions and intents. These features, which play a major role in sharing user data and device resources cannot be obfuscated or altered. These vital features are well suited for resource constrained smartphones. Experimental evaluation on a corpus of real-world malware and benign apps demonstrate that the proposed algorithm can effectively detect malicious apps and is resilient to common obfuscations methods.
Besides PInDroid and AndroPIn, this thesis consists of three additional studies, which supplement the proposed methodology. First study investigates if there is any correlation between permissions and intents which can be exploited to detect malware apps. For this, the statistical significance test is applied to investigate the correlation between permissions and intents. We found statistical evidence of a strong correlation between permissions and intents which could be exploited to detect malware applications.
The second study is conducted to investigate if the performance of classifiers can further be improved with ensemble learning methods. We applied different ensemble methods such as bagging, boosting and stacking. The experiments with ensemble methods yielded much improved results.
The third study is related to investigating if the permissions and intents based system can be used to detect the ever challenging colluding apps. Application collusion is an emerging threat to Android based devices. We discuss the current state of research on app collusion and open challenges to the detection of colluding apps. We compare existing approaches and present an integrated approach that can be used to detect the malicious app collusion
AI-based algorithm for intrusion detection on a real Dataset
[Abstract]: In this Project, Novel Machine Learning proposals are given to produce a Network Intrusion
Detection System (NIDS). For this, a state of the art Dataset for Cyclo Stationary NIDS has
been used, together with a previously proposed standard methodology to compare the results
of different models over the same Dataset. An extensive research has been done for
this Project about the different Datasets available for NIDS, as has been done to expose the
evolution and functioning of IDSs.
Finally, experiments have been made with Outlier Detectors, Ensemble Methods, Deep
Learning and Conventional Classifiers to compare with previously published results over the
same Dataset and with the same methodology. The findings reveal that the Ensemble Methods
have been capable to improve the results from prior research being the best approach the
Extreme Gradient Boosting method.[Resumen]: En este Proyecto, se presentan novedosas propuestas de Aprendizaje Automático para
producir un Sistema de Detección de Intrusos en Red (NIDS). Para ello, se ha utilizado un
Dataset de última generación para NIDS Cicloestacionarios, junto con una metodología estándar
previamente propuesta para comparar los resultados de diferentes modelos sobre el
mismo Dataset. Para este Proyecto se ha realizado una extensa investigación sobre los diferentes
conjuntos de datos disponibles para NIDS, así como se ha expuesto la evolución y
funcionamiento de los IDSs.
Por último, se han realizado experimentos con Detectores de Anomalias, Métodos de
Conjunto, Aprendizaje Profundo y Clasificadores Convencionales para comparar con resultados
previamente publicados sobre el mismo Dataset y con la misma metodología. Los resultados
revelan que los Métodos de Conjunto han sido capaces de mejorar los resultados de
investigaciones previas siendo el mejor enfoque el método de Extreme Gradient Boosting.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2022/202
- …