10 research outputs found

    Comparison of Deep Learning and the Classical Machine Learning Algorithm for the Malware Detection

    Full text link
    Recently, Deep Learning has been showing promising results in various Artificial Intelligence applications like image recognition, natural language processing, language modeling, neural machine translation, etc. Although, in general, it is computationally more expensive as compared to classical machine learning techniques, their results are found to be more effective in some cases. Therefore, in this paper, we investigated and compared one of the Deep Learning Architecture called Deep Neural Network (DNN) with the classical Random Forest (RF) machine learning algorithm for the malware classification. We studied the performance of the classical RF and DNN with 2, 4 & 7 layers architectures with the four different feature sets, and found that irrespective of the features inputs, the classical RF accuracy outperforms the DNN.Comment: 11 Pages, 1 figur

    DRLDO A Novel DRL based De obfuscation System for Defence Against Metamorphic Malware

    Get PDF
    In this paper, we propose a novel mechanism to normalise metamorphic and obfuscated malware down at the opcode level and hence create an advanced metamorphic malware de-obfuscation and defence system. We name this system as DRLDO, for deep reinforcement learning based de-obfuscator. With the inclusion of the DRLDO as a sub-component, an existing Intrusion Detection System could be augmented with defensive capabilities against ‘zero-day’ attack from obfuscated and metamorphic variants of existing malware. This gains importance, not only because there exists no system till date that use advance DRL to intelligently and automatically normalise obfuscation down even to the opcode level, but also because the DRLDO system does not mandate any changes to the existing IDS. The DRLDO system does not even mandate the IDS’ classifier to be retrained with any new dataset containing obfuscated samples. Hence DRLDO could be easily retrofitted into any existing IDS deployment. We designed, developed, and conducted experiments on the system to evaluate the same against multiple-simultaneous attacks from obfuscations generated from malware samples from a standardised dataset that contain multiple generations of malware. Experimental results prove that DRLDO was able to successfully make the otherwise undetectable obfuscated variants of the malware detectable by an existing pre-trained malware classifier. The detection probability was raised well above the cut-off mark to 0.6 for the classifier to detect the obfuscated malware unambiguously. Further, the de-obfuscated variants generated by DRLDO achieved a very high correlation (of ≈ 0.99) with the base malware. This observation validates that the DRLDO system is actually learning to de-obfuscate and not exploiting a trivial trick

    Making Large Language Models Better Data Creators

    Full text link
    Although large language models (LLMs) have advanced the state-of-the-art in NLP significantly, deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security. As such, trainable models are still the preferred option in some cases. However, these models still require human-labeled data for optimal performance, which is expensive and time-consuming to obtain. In order to address this issue, several techniques to reduce human effort involve labeling or generating data using LLMs. Although these methods are effective for certain applications, in practice they encounter difficulties in real-world scenarios. Labeling data requires careful data selection, while generating data necessitates task-specific prompt engineering. In this paper, we propose a unified data creation pipeline that requires only a single formatting example, and which is applicable to a broad range of tasks, including traditionally problematic ones with semantically devoid label spaces. In our experiments we demonstrate that instruction-following LLMs are highly cost-effective data creators, and that models trained with these data exhibit performance better than those trained with human-labeled data (by up to 17.5%) on out-of-distribution evaluation, while maintaining comparable performance on in-distribution tasks. These results have important implications for the robustness of NLP systems deployed in the real-world.Comment: Accepted to EMNLP 2023 main conference. 12 pages, 5 figures, 6 tables. Code is available at https://github.com/microsoft/llm-data-creatio

    Deep reinforcement learning: frontiers of artificial intelligence

    No full text

    AI in Finance: A Review

    No full text
    corecore