6 research outputs found

    Exploiting And Estimating Malware Using Feature Impact Derived From API Call Sequence Learning

    Get PDF
    Malware is a serious threat being posed and it has been a continuous process of protecting the systems from existing and new malware variants by defining new approaches for malware detection .In this process malware samples are first analyzed to understand the behavior of the vulnerable samples and accordingly statistical methods are defined for malware detection. Many approaches are defined for understanding the behavior of malware executables which are broadly classified in to static and dynamic assessments. The static analysis can only be used for identifying the existing types of malware but code obfuscation has made it complex to identify the variants of existing malware. To counter the code obfuscation the dynamic analysis of malware is prioritized over static analysis where the malwares are analyzed by running them in an emulated environment to understand the intent of the samples. As there is an acute need of developing a more precise and accurate approach for malware detection, this paper contributes in the above said direction where we proposed a novel measure to estimate malware by exploiting the malicious intent of executables. It is a machine learning approach where the knowledge is acquired from the existing malicious executable and the same knowledge is used to estimate the new variants of the existing malware. The proposed statistical approach can be used to improve the scalability, accuracy and robustness. It also defends against zero day exploits

    Application of Adversarial Attacks on Malware Detection Models

    Get PDF
    Malware detection is vital as it ensures that a computer is safe from any kind of malicious software that puts users at risk. Too many variants of these malicious software are being introduced everyday at increased speed. Thus, to guarantee security of computer systems, huge advancements in the field of malware detection are made and one such approach is to use machine learning for malware detection. Even though machine learning is very powerful, it is prone to adversarial attacks. In this project, we will try to apply adversarial attacks on malware detection models. To perform these attacks, fake samples that are generated using Generative Adversarial Networks (GAN) algorithm are used and these fake malware data along with the actual data is given to a machine learning model for malware detection. Here, we will also be experimenting with the percentage of fake malware samples to be considered and observe the behavior of the model according to the given input. The novelty of this project is given by the use of adversarial samples that are generated by the implementation of word embeddings produced by our generative algorithms

    Getting to the root of the problem: A detailed comparison of kernel and user level data for dynamic malware analysis

    Get PDF
    Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis it is common practice to capture the system calls that are made to better understand the behaviour of malware. There are several techniques to capture system calls, the most popular of which is a user-level hook. To study the effects of collecting system calls at different privilege levels and viewpoints, we collected data at a process-specific user-level using a virtualised sandbox environment and a system-wide kernel-level using a custom-built kernel driver. We then tested the performance of several state-of-the-art machine learning classifiers on the data. Random Forest was the best performing classifier with an accuracy of 95.2% for the kernel driver and 94.0% at a user-level. The combination of user and kernel level data gave the best classification results with an accuracy of 96.0% for Random Forest. This may seem intuitive but was hitherto not empirically demonstrated. Additionally, we observed that machine learning algorithms trained on data from the user-level tended to use the anti-debug/anti-vm features in malware to distinguish it from benignware. Whereas, when trained on data from our kernel driver, machine learning algorithms seemed to use the differences in the general behaviour of the system to make their prediction, which explains why they complement each other so well. Our results show that capturing data at different privilege levels will affect the classifier's ability to detect malware, with kernel-level providing more utility than user-level for malware classification. Despite this, there exist more established user-level tools than kernel-level tools, suggesting more research effort should be directed at kernel-level. In short, this paper provides the first objective, evidence-based comparison of user and kernel level data for the purposes of malware classification
    corecore