6 research outputs found

    Malware detection issues, challenges, and future directions: A survey

    Get PDF
    The evolution of recent malicious software with the rising use of digital services has increased the probability of corrupting data, stealing information, or other cybercrimes by malware attacks. Therefore, malicious software must be detected before it impacts a large number of computers. Recently, many malware detection solutions have been proposed by researchers. However, many challenges limit these solutions to effectively detecting several types of malware, especially zero-day attacks due to obfuscation and evasion techniques, as well as the diversity of malicious behavior caused by the rapid rate of new malware and malware variants being produced every day. Several review papers have explored the issues and challenges of malware detection from various viewpoints. However, there is a lack of a deep review article that associates each analysis and detection approach with the data type. Such an association is imperative for the research community as it helps to determine the suitable mitigation approach. In addition, the current survey articles stopped at a generic detection approach taxonomy. Moreover, some review papers presented the feature extraction methods as static, dynamic, and hybrid based on the utilized analysis approach and neglected the feature representation methods taxonomy, which is considered essential in developing the malware detection model. This survey bridges the gap by providing a comprehensive state-of-the-art review of malware detection model research. This survey introduces a feature representation taxonomy in addition to the deeper taxonomy of malware analysis and detection approaches and links each approach with the most commonly used data types. The feature extraction method is introduced according to the techniques used instead of the analysis approach. The survey ends with a discussion of the challenges and future research directions

    Getting to the root of the problem: A detailed comparison of kernel and user level data for dynamic malware analysis

    Get PDF
    Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis it is common practice to capture the system calls that are made to better understand the behaviour of malware. There are several techniques to capture system calls, the most popular of which is a user-level hook. To study the effects of collecting system calls at different privilege levels and viewpoints, we collected data at a process-specific user-level using a virtualised sandbox environment and a system-wide kernel-level using a custom-built kernel driver. We then tested the performance of several state-of-the-art machine learning classifiers on the data. Random Forest was the best performing classifier with an accuracy of 95.2% for the kernel driver and 94.0% at a user-level. The combination of user and kernel level data gave the best classification results with an accuracy of 96.0% for Random Forest. This may seem intuitive but was hitherto not empirically demonstrated. Additionally, we observed that machine learning algorithms trained on data from the user-level tended to use the anti-debug/anti-vm features in malware to distinguish it from benignware. Whereas, when trained on data from our kernel driver, machine learning algorithms seemed to use the differences in the general behaviour of the system to make their prediction, which explains why they complement each other so well. Our results show that capturing data at different privilege levels will affect the classifier's ability to detect malware, with kernel-level providing more utility than user-level for malware classification. Despite this, there exist more established user-level tools than kernel-level tools, suggesting more research effort should be directed at kernel-level. In short, this paper provides the first objective, evidence-based comparison of user and kernel level data for the purposes of malware classification

    Comparing the utility of user-level and kernel-level data for dynamic malware analysis

    Get PDF
    Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis, it is common practice to capture the system calls that are made to better understand the behaviour of malware. System calls are captured by hooking certain structures in the Operating System. There are several hooking techniques that broadly fall into two categories, those that run at user-level and those that run at kernel level. User-level hooks are currently more popular despite there being no evidence that they are better suited to detecting malware. The focus in much of the literature surrounding dynamic malware analysis is on the data analysis method over the data capturing method. This thesis, on the other hand, seeks to ascertain if the level at which data is captured affects the ability of a detector to identify malware. This is important because if the data captured by the hooking method most commonly used is sub-optimal, the machine learning classifier can only go so far. To study the effects of collecting system calls at different privilege levels and viewpoints, data was collected at a process-specific user-level using a virtualised sandbox environment and a systemwide kernel-level using a custom-built kernel driver for all experiments in this thesis. The experiments conducted in this thesis showed kernel-level data to be marginally better for detecting malware than user-level data. Further analysis revealed that the behaviour of malware used to differentiate it differed based on the data given to the classifiers. When trained on user-level data, classifiers used the evasive features of malware to differentiate it from benignware. These are the very features that malware uses to avoid detection. When trained on kernel-level data, the classifiers preferred to use the general behaviour of malware to differentiate it from benignware. The implications of this were witnessed when the classifiers trained on user-level and kernel-level data were made to classify malware that had been stripped of its evasive properties. Classifiers trained on user-level data could not detect malware that only possessed malicious attributes. While classifiers trained on kernel-level data were unable to detect malware that did not exhibit the amount of general activity they expected in malware. This research highlights the importance of giving careful consideration to the hooking methodology employed to collect data, since it not only affects the classification results, but a classifier’s understanding of malware
    corecore