1 research outputs found

    Comparing the utility of user-level and kernel-level data for dynamic malware analysis

    Get PDF
    Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis, it is common practice to capture the system calls that are made to better understand the behaviour of malware. System calls are captured by hooking certain structures in the Operating System. There are several hooking techniques that broadly fall into two categories, those that run at user-level and those that run at kernel level. User-level hooks are currently more popular despite there being no evidence that they are better suited to detecting malware. The focus in much of the literature surrounding dynamic malware analysis is on the data analysis method over the data capturing method. This thesis, on the other hand, seeks to ascertain if the level at which data is captured affects the ability of a detector to identify malware. This is important because if the data captured by the hooking method most commonly used is sub-optimal, the machine learning classifier can only go so far. To study the effects of collecting system calls at different privilege levels and viewpoints, data was collected at a process-specific user-level using a virtualised sandbox environment and a systemwide kernel-level using a custom-built kernel driver for all experiments in this thesis. The experiments conducted in this thesis showed kernel-level data to be marginally better for detecting malware than user-level data. Further analysis revealed that the behaviour of malware used to differentiate it differed based on the data given to the classifiers. When trained on user-level data, classifiers used the evasive features of malware to differentiate it from benignware. These are the very features that malware uses to avoid detection. When trained on kernel-level data, the classifiers preferred to use the general behaviour of malware to differentiate it from benignware. The implications of this were witnessed when the classifiers trained on user-level and kernel-level data were made to classify malware that had been stripped of its evasive properties. Classifiers trained on user-level data could not detect malware that only possessed malicious attributes. While classifiers trained on kernel-level data were unable to detect malware that did not exhibit the amount of general activity they expected in malware. This research highlights the importance of giving careful consideration to the hooking methodology employed to collect data, since it not only affects the classification results, but a classifier’s understanding of malware
    corecore