5 research outputs found
Enhancing Efficiency and Privacy in Memory-Based Malware Classification through Feature Selection
Malware poses a significant security risk to individuals, organizations, and
critical infrastructure by compromising systems and data. Leveraging memory
dumps that offer snapshots of computer memory can aid the analysis and
detection of malicious content, including malware. To improve the efficacy and
address privacy concerns in malware classification systems, feature selection
can play a critical role as it is capable of identifying the most relevant
features, thus, minimizing the amount of data fed to classifiers. In this
study, we employ three feature selection approaches to identify significant
features from memory content and use them with a diverse set of classifiers to
enhance the performance and privacy of the classification task. Comprehensive
experiments are conducted across three levels of malware classification tasks:
i) binary-level benign or malware classification, ii) malware type
classification (including Trojan horse, ransomware, and spyware), and iii)
malware family classification within each family (with varying numbers of
classes). Results demonstrate that the feature selection strategy,
incorporating mutual information and other methods, enhances classifier
performance for all tasks. Notably, selecting only 25\% and 50\% of input
features using Mutual Information and then employing the Random Forest
classifier yields the best results. Our findings reinforce the importance of
feature selection for malware classification and provide valuable insights for
identifying appropriate approaches. By advancing the effectiveness and privacy
of malware classification systems, this research contributes to safeguarding
against security threats posed by malicious software.Comment: Accepted in IEEE ICMLA-2023 Conferenc
Performance of Malware Classification on Machine Learning using Feature Selection
The exponential growth of malware has created a significant threat in our daily lives, which heavily rely on computers running all kinds of software. Malware writers create malicious software by creating new variants, new innovations, new infections and more obfuscated malware by using techniques such as packing and encrypting techniques. Malicious software classification and detection play an important role and a big challenge for cyber security research. Due to the increasing rate of false alarm, the accurate classification and detection of malware is a big necessity issue to be solved. In this research, eight malware family have been classifying according to their family the research provides four feature selection algorithms to select best feature for multiclass classification problem. Comparing. Then find these algorithms top 100 features are selected to performance evaluations. Five machine learning algorithms is compared to find best models. Then frequency distribution of features are find by feature ranking of best model. At last it is said that frequency distribution of every character of API call sequence can be used to classify malware family
On Leveraging Next-Generation Deep Learning Techniques for IoT Malware Classification, Family Attribution and Lineage Analysis
Recent years have witnessed the emergence of new and more sophisticated malware targeting insecure Internet of Things (IoT) devices, as part of orchestrated large-scale botnets. Moreover, the
public release of the source code of popular malware families such as Mirai [1] has spawned diverse variants, making it harder to disambiguate their ownership, lineage, and correct label. Such a rapidly
evolving landscape makes it also harder to deploy and generalize effective learning models against retired, updated, and/or new threat campaigns. To mitigate such threat, there is an utmost need for effective IoT malware detection, classification and family attribution, which provide essential steps towards initiating attack mitigation/prevention countermeasures, as well as understanding the evolutionary trajectories and tangled relationships of IoT malware. This is particularly challenging
due to the lack of fine-grained empirical data about IoT malware, the diverse architectures of IoT-targeted devices, and the massive code reuse between IoT malware families.
To address these challenges, in this thesis, we leverage the general lack of obfuscation in IoT malware to extract and combine static features from multi-modal views of the executable binaries (e.g., images, strings, assembly instructions), along with Deep Learning (DL) architectures for effective IoT malware classification and family attribution. Additionally, we aim to address concept drift and the limitations of inter-family classification due to the evolutionary nature of IoT malware, by detecting in-class evolving IoT malware variants and interpreting the meaning behind their mutations. To this end, we perform the following to achieve our objectives:
First, we analyze 70,000 IoT malware samples collected by a specialized IoT honeypot and popular malware repositories in the past 3 years. Consequently, we utilize features extracted from strings- and image-based representations of IoT malware to implement a multi-level DL architecture that fuses the learned features from each sub-component (i.e, images, strings) through a neural network classifier. Our in-depth experiments with four prominent IoT malware families highlight
the significant accuracy of the proposed approach (99.78%), which outperforms conventional single-level classifiers, by relying on different representations of the target IoT malware binaries that do not
require expensive feature extraction. Additionally, we utilize our IoT-tailored approach for labeling unknown malware samples, while identifying new malware strains.
Second, we seek to identify when the classifier shows signs of aging, by which it fails to effectively recognize new variants and adapt to potential changes in the data. Thus, we introduce a robust and effective method that uses contrastive learning and attentive Transformer models to learn and compare semantically meaningful representations of IoT malware binaries and codes without the need for expensive target labels. We find that the evolution of IoT binaries can be used as an augmentation strategy to learn effective representations to contrast (dis)similar variant pairs. We discuss the impact and findings of our analysis and present several evaluation studies to highlight the tangled relationships of IoT malware, as well as the efficiency of our contrastively learned fine-grained feature vectors in preserving semantics and reducing out-of-vocabulary size in cross-architecture IoT malware binaries.
We conclude this thesis by summarizing our findings and discussing research gaps that lay the way for future work