68 research outputs found

    A Malware Analysis and Artifact Capture Tool

    Get PDF
    Malware authors attempt to obfuscate and hide their code in its static and dynamic states. This paper provides a novel approach to aid analysis by intercepting and capturing malware artifacts and providing dynamic control of process flow. Capturing malware artifacts allows an analyst to more quickly and comprehensively understand malware behavior and obfuscation techniques and doing so interactively allows multiple code paths to be explored. The faster that malware can be analyzed the quicker the systems and data compromised by it can be determined and its infection stopped. This research proposes an instantiation of an interactive malware analysis and artifact capture tool

    Obfuscated computer virus detection using machine learning algorithm

    Get PDF
    Nowadays, computer virus attacks are getting very advanced. New obfuscated computer virus created by computer virus writers will generate a new shape of computer virus automatically for every single iteration and download. This constantly evolving computer virus has caused significant threat to information security of computer users, organizations and even government. However, signature based detection technique which is used by the conventional anti-computer virus software in the market fails to identify it as signatures are unavailable. This research proposed an alternative approach to the traditional signature based detection method and investigated the use of machine learning technique for obfuscated computer virus detection. In this work, text strings are used and have been extracted from virus program codes as the features to generate a suitable classifier model that can correctly classify obfuscated virus files. Text string feature is used as it is informative and potentially only use small amount of memory space. Results show that unknown files can be correctly classified with 99.5% accuracy using SMO classifier model. Thus, it is believed that current computer virus defense can be strengthening through machine learning approach

    Code Obfuscation and Virus Detection

    Get PDF
    Typically, computer viruses and other malware are detected by searching for a string of bits which is found in the virus or malware. Such a string can be viewed as a “fingerprint” of the virus. These “fingerprints” are not generally unique; however they can be used to make rapid malware scanning feasible. This fingerprint is often called a signature and the technique of detecting viruses using signatures is known as signaturebased detection [8]. Today, virus writers often camouflage their viruses by using code obfuscation techniques in an effort to defeat signature-based detection schemes. So-called metamorphic viruses are viruses in which each instance has the same functionality but differs in its internal structure. Metamorphic viruses differ from polymorphic viruses in the method they use to hide their signature. While polymorphic viruses primarily rely on encryption for signature obfuscation, metamorphic viruses hide their signature via “mutating” their own code [3]. The paper [1] provides a rigorous proof that metamorphic viruses can bypass any signature-based detection, provided the code obfuscation has been done carefully based on a set of specified rules. Specifically, according to [1], if dead code is added and the control flow is changed sufficiently by inserting jump statements, the virus cannot be detected. In this project we first developed a code obfuscation engine conforming to the rules in [1]. We then used this engine to create metamorphic variants of a seed virus (created using the PS-MPK virus creation kit [15]) and demonstrated the validity of the assertion in [1] about metamorphic viruses and signature based detectors. In the second phase of this project we validated another theory advanced in [2], namely, that machine learning based methods¾specifically ones based on Hidden Markov Model (HMM) ¾can detect metamorphic viruses. In other words, we show that a collection of metamorphic viruses which are (provably) undetectable via signature detection techniques can nevertheless be detected using an HMM approach

    Perfect is the enemy of test oracle

    Full text link
    Automation of test oracles is one of the most challenging facets of software testing, but remains comparatively less addressed compared to automated test input generation. Test oracles rely on a ground-truth that can distinguish between the correct and buggy behavior to determine whether a test fails (detects a bug) or passes. What makes the oracle problem challenging and undecidable is the assumption that the ground-truth should know the exact expected, correct, or buggy behavior. However, we argue that one can still build an accurate oracle without knowing the exact correct or buggy behavior, but how these two might differ. This paper presents SEER, a learning-based approach that in the absence of test assertions or other types of oracle, can determine whether a unit test passes or fails on a given method under test (MUT). To build the ground-truth, SEER jointly embeds unit tests and the implementation of MUTs into a unified vector space, in such a way that the neural representation of tests are similar to that of MUTs they pass on them, but dissimilar to MUTs they fail on them. The classifier built on top of this vector representation serves as the oracle to generate "fail" labels, when test inputs detect a bug in MUT or "pass" labels, otherwise. Our extensive experiments on applying SEER to more than 5K unit tests from a diverse set of open-source Java projects show that the produced oracle is (1) effective in predicting the fail or pass labels, achieving an overall accuracy, precision, recall, and F1 measure of 93%, 86%, 94%, and 90%, (2) generalizable, predicting the labels for the unit test of projects that were not in training or validation set with negligible performance drop, and (3) efficient, detecting the existence of bugs in only 6.5 milliseconds on average.Comment: Published in ESEC/FSE 202

    a framework for automated similarity analysis of malware

    Get PDF
    Malware, a category of software including viruses, worms, and other malicious programs, is developed by hackers to damage, disrupt, or perform other harmful actions on data, computer systems and networks. Malware analysis, as an indispensable part of the work of IT security specialists, aims to gain an in-depth understanding of malware code. Manual analysis of malware is a very costly and time-consuming process. As more malware variants are evolved by hackers who occasionally use a copy-paste-modify programming style to accelerate the generation of large number of malware, the effort spent in analyzing similar pieces of malicious code has dramatically grown. One approach to remedy this situation is to automatically perform similarity analysis on malware samples and identify the functions they share in order to minimize duplicated effort in analyzing similar codes of malware variants. In this thesis, we present a framework to match cloned functions in a large chunk of malware samples. Firstly, the instructions of the functions to be analyzed are extracted from the disassembled malware binary code and then normalized. We propose a new similarity metric and use it to determine the pair-wise similarity among malware samples based on the calculated similarity of their functions. The developed tool also includes an API class recognizer designed to determine probable malicious operations that can be performed by malware functions. Furthermore, it allows us to visualize the relationship among functions inside malware codes and locate similar functions importing the same API class. We evaluate this framework on three malware datasets including metamorphic viruses created by malware generation tools, real-life malware variants in the wild, and two well-known botnet trojans. The obtained experimental results confirm that the proposed framework is effective in detecting similar malware code

    A Malware Analysis and Artifact Capture Tool

    Get PDF
    Malware authors attempt to obfuscate and hide their execution objectives in their program’s static and dynamic states. This paper provides a novel approach to aid analysis by introducing a malware analysis tool which is quick to set up and use with respect to other existing tools. The tool allows for the intercepting and capturing of malware artifacts while providing dynamic control of process flow. Capturing malware artifacts allows an analyst to more quickly and comprehensively understand malware behavior and obfuscation techniques and doing so interactively allows multiple code paths to be explored. The faster that malware can be analyzed the quicker the systems and data compromised by it can be determined and its infection stopped. This research proposes an instantiation of an interactive malware analysis and artifact capture tool

    Software Testing with Large Language Model: Survey, Landscape, and Vision

    Full text link
    Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, and making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 52 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative ones. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.Comment: 20 pages, 11 figure
    • …
    corecore