9 research outputs found

    Genetic boosting classification for malware detection

    Get PDF
    In the last few years virus writers have made use of new obfuscation techniques with the aim of hindering malware in order to difficult their detection by Anti-Virus engines. Strategies to reverse this trend involve executing potentially malicious programs and monitor the actions they perform in runtime, what is known as dynamic analysis. In this paper we present a method able to reach a high accuracy rate without using this kind of analysis. Instead we use a static analysis approach, which discards those samples that cannot be classified with enough certainty and need, certainly, a dynamic analysis. The K-means clustering algorithm has been used to group samples into regions according to their features. Then a boosting process, guided by a genetic algorithm, is executed in each region that are evaluated using a test dataset discarding those regions which do not reach a minimum accuracy threshold

    GRASE: Granulometry Analysis with Semi Eager Classifier to Detect Malware

    Get PDF
    Technological advancement in communication leading to 5G, motivates everyone to get connected to the internet including ‘Devices’, a technology named Web of Things (WoT). The community benefits from this large-scale network which allows monitoring and controlling of physical devices. But many times, it costs the security as MALicious softWARE (MalWare) developers try to invade the network, as for them, these devices are like a ‘backdoor’ providing them easy ‘entry’. To stop invaders from entering the network, identifying malware and its variants is of great significance for cyberspace. Traditional methods of malware detection like static and dynamic ones, detect the malware but lack against new techniques used by malware developers like obfuscation, polymorphism and encryption. A machine learning approach to detect malware, where the classifier is trained with handcrafted features, is not potent against these techniques and asks for efforts to put in for the feature engineering. The paper proposes a malware classification using a visualization methodology wherein the disassembled malware code is transformed into grey images. It presents the efficacy of Granulometry texture analysis technique for improving malware classification. Furthermore, a Semi Eager (SemiE) classifier, which is a combination of eager learning and lazy learning technique, is used to get robust classification of malware families. The outcome of the experiment is promising since the proposed technique requires less training time to learn the semantics of higher-level malicious behaviours. Identifying the malware (testing phase) is also done faster. A benchmark database like malimg and Microsoft Malware Classification challenge (BIG-2015) has been utilized to analyse the performance of the system. An overall average classification accuracy of 99.03 and 99.11% is achieved, respectively

    MDFRCNN: Malware Detection using Faster Region Proposals Convolution Neural Network

    Get PDF
    Technological advancement of smart devices has opened up a new trend: Internet of Everything (IoE), where all devices are connected to the web. Large scale networking benefits the community by increasing connectivity and giving control of physical devices. On the other hand, there exists an increased ‘Threat’ of an ‘Attack’. Attackers are targeting these devices, as it may provide an easier ‘backdoor entry to the users’ network’.MALicious softWARE (MalWare) is a major threat to user security. Fast and accurate detection of malware attacks are the sine qua non of IoE, where large scale networking is involved. The paper proposes use of a visualization technique where the disassembled malware code is converted into gray images, as well as use of Image Similarity based Statistical Parameters (ISSP) such as Normalized Cross correlation (NCC), Average difference (AD), Maximum difference (MaxD), Singular Structural Similarity Index Module (SSIM), Laplacian Mean Square Error (LMSE), MSE and PSNR. A vector consisting of gray image with statistical parameters is trained using a Faster Region proposals Convolution Neural Network (F-RCNN) classifier. The experiment results are promising as the proposed method includes ISSP with F-RCNN training. Overall training time of learning the semantics of higher-level malicious behaviors is less. Identification of malware (testing phase) is also performed in less time. The fusion of image and statistical parameter enhances system performance with greater accuracy. The benchmark database from Microsoft Malware Classification challenge has been used to analyze system performance, which is available on the Kaggle website. An overall average classification accuracy of 98.12% is achieved by the proposed method

    An ensemble-based anomaly-behavioural crypto-ransomware pre-encryption detection model

    Get PDF
    Crypto-ransomware is a malware that leverages cryptography to encrypt files for extortion purposes. Even after neutralizing such attacks, the targeted files remain encrypted. This irreversible effect on the target is what distinguishes crypto-ransomware attacks from traditional malware. Thus, it is imperative to detect such attacks during pre-encryption phase. However, existing crypto-ransomware early detection solutions are not effective due to inaccurate definition of the pre-encryption phase boundaries, insufficient data at that phase and the misuse-based approach that the solutions employ, which is not suitable to detect new (zero-day) attacks. Consequently, those solutions suffer from low detection accuracy and high false alarms. Therefore, this research addressed these issues and developed an Ensemble-Based Anomaly-Behavioural Pre-encryption Detection Model (EABDM) to overcome data insufficiency and improve detection accuracy of known and novel crypto-ransomware attacks. In this research, three phases were used in the development of EABDM. In the first phase, a Dynamic Pre-encryption Boundary Definition and Features Extraction (DPBD-FE) scheme was developed by incorporating Rocchio feedback and vector space model to build a pre-encryption boundary vector. Then, an improved term frequency-inverse document frequency technique was utilized to extract the features from runtime data generated during the pre-encryption phase of crypto-ransomware attacks’ lifecycle. In the second phase, a Maximum of Minimum-Based Enhanced Mutual Information Feature Selection (MM-EMIFS) technique was used to select the informative features set, and prevent overfitting caused by high dimensional data. The MM-EMIFS utilized the developed Redundancy Coefficient Gradual Upweighting (RCGU) technique to overcome data insufficiency during pre-encryption phase and improve feature’s significance estimation. In the final phase, an improved technique called incremental bagging (iBagging) built incremental data subsets for anomaly and behavioural-based detection ensembles. The enhanced semi-random subspace selection (ESRS) technique was then utilized to build noise-free and diverse subspaces for each of these incremental data subsets. Based on the subspaces, the base classifiers were trained for each ensemble. Both ensembles employed the majority voting to combine the decisions of the base classifiers. After that, the decision of the anomaly ensemble was combined into behavioural ensemble, which gave the final decision. The experimental evaluation showed that, DPBD-FE scheme reduced the ratio of crypto-ransomware samples whose pre-encryption boundaries were missed from 18% to 8% as compared to existing works. Additionally, the features selected by MM-EMIFS technique improved the detection accuracy from 89% to 96% as compared to existing techniques. Likewise, on average, the EABDM model increased detection accuracy from 85% to 97.88% and reduced the false positive alarms from 12% to 1% in comparison to existing early detection models. These results demonstrated the ability of the EABDM to improve the detection accuracy of crypto-ransomware attacks early and before the encryption takes place to protect files from being held to ransom

    Tree-Based Classifier Ensembles for PE Malware Analysis: A Performance Revisit

    Get PDF
    Given their escalating number and variety, combating malware is becoming increasingly strenuous. Machine learning techniques are often used in the literature to automatically discover the models and patterns behind such challenges and create solutions that can maintain the rapid pace at which malware evolves. This article compares various tree-based ensemble learning methods that have been proposed in the analysis of PE malware. A tree-based ensemble is an unconventional learning paradigm that constructs and combines a collection of base learners (e.g., decision trees), as opposed to the conventional learning paradigm, which aims to construct individual learners from training data. Several tree-based ensemble techniques, such as random forest, XGBoost, CatBoost, GBM, and LightGBM, are taken into consideration and are appraised using different performance measures, such as accuracy, MCC, precision, recall, AUC, and F1. In addition, the experiment includes many public datasets, such as BODMAS, Kaggle, and CIC-MalMem-2022, to demonstrate the generalizability of the classifiers in a variety of contexts. Based on the test findings, all tree-based ensembles performed well, and performance differences between algorithms are not statistically significant, particularly when their respective hyperparameters are appropriately configured. The proposed tree-based ensemble techniques also outperformed other, similar PE malware detectors that have been published in recent years

    Malware detection and classification based on extraction of API sequences

    No full text
    corecore