3 research outputs found

    Towards a PHP webshell taxonomy using deobfuscation-assisted similarity analysis

    No full text
    The abundance of PHP-based Remote Access Trojans (or web shells) found in the wild has led malware researchers to develop systems capable of tracking and analysing these shells. In the past, such shells were ably classified using signature matching, a process that is currently unable to cope with the sheer volume and variety of web-based malware in circulation. Although a large percentage of newly-created webshell software incorporates portions of code derived from seminal shells such as c99 and r57, they are able to disguise this by making extensive use of obfuscation techniques intended to frustrate any attempts to dissect or reverse engineer the code. This paper presents an approach to shell classification and analysis (based on similarity to a body of known malware) in an attempt to create a comprehensive taxonomy of PHP-based web shells. Several different measures of similarity were used in conjunction with clustering algorithms and visualisation techniques in order to achieve this. Furthermore, an auxiliary component capable of syntactically deobfuscating PHP code is described. This was employed to reverse idiomatic obfuscation constructs used by software authors. It was found that this deobfuscation dramatically increased the observed levels of similarity by exposing additional code for analysis

    Application of artificial intelligence for detecting derived viruses.

    Get PDF
    Master of Science in Computer Science. University of KwaZulu-Natal, Durban 2017.A lot of new viruses are being created each and every day. However, some of these viruses are not completely new per se. Most of the supposedly ‘new’ viruses are not necessarily created from scratch with completely new (something novel that has never been seen before) mechanisms. For example, some of these viruses just change their forms and come up with new signatures to avoid detection. Hence, such viruses cannot be argued to be new. This research refers to such as derived viruses. Just like new viruses, we argue that derived viruses are hard to detect with current scanning-detection methods. Many virus detection methods exist in the literature, but very few address the detection of derived viruses. Hence, the ultimate research question that this study aims to answer is; how might we improve the detection rate of derived computer viruses? The proposed system integrates a mutation engine together with a neural network to detect derived viruses. Derived viruses come from existing viruses that change their forms. They do so by adding some irrelevant instructions that will not alter the intended purpose of the virus. A mutation engine is used to group existing virus signatures based on their similarities. The engine then creates derivatives of groups of signatures. This is done up until the third generation (of mutations). The existing virus signatures and the created derivatives are both used to train the neural network. The derived signatures that are not used for the training are used to determine the effectiveness of the neural network. Ten experiments were conducted on each of the three derived virus generations. The first generation showed the highest derived virus detection rate compared to the other two generations. The second generation also showed a slightly higher detection rate than the third generation which has the least detection rate. Experimental results show that the proposed model can detect derived viruses with an average accuracy detection rate of 80% (This includes a 91% success rate on first generation, 83% success rate on second generation and 65% success rate on third generation). The results further show that the correlation between the original virus signature and its derivatives decreases with the generations. This means that after many generations of a virus changing form, its variants will no longer look like the original. Instead the variants look like a completely new virus even though the variants and the original virus will always have the same behaviour and operational characteristics with similar effects

    Detecting derivative malware samples using deobfuscation-assisted similarity analysis

    Get PDF
    The overwhelming popularity of PHP as a hosting platform has made it the language of choice for developers of Remote Access Trojans (RATs or web shells) and other malicious software. These shells are typically used to compromise and monetise web platforms by providing the attacker with basic remote access to the system, including _le transfer, command execution, network reconnaissance, and database connectivity. Once infected, compromised systems can be used to defraud users by hosting phishing sites, performing Distributed Denial of Service attacks, or serving as anonymous platforms for sending spam or other malfeasance. The vast majority of these threats are largely derivative, incorporating core capabilities found in more established RATs such as c99 and r57. Authors of malicious software routinely produce new shell variants by modifying the behaviours of these ubiquitous RATs, either to add desired functionality or to avoid detection by signature-based detection systems. Once these modified shells are eventually identified (or additional functionality is required), the process of shell adaptation begins again. The end result of this iterative process is a web of separate but related shell variants, many of which are at least partially derived from one of the more popular and influential RATs. In response to the problem outlined above, the author set out to design and implement a system capable of circumventing common obfuscation techniques and identifying derivative malware samples in a given collection. To begin with, a decoder component was developed to syntactically deobfuscate and normalise PHP code by detecting and reversing idiomatic obfuscation constructs, and to apply uniform formatting conventions to all system inputs. A unified malware analysis framework, called Viper, was then extended to create a modular similarity analysis system comprised of individual feature extraction modules, modules responsible for batch processing, a matrix module for comparing sample features, and two visualisation modules capable of generating visual representations of shell similarity. The principal conclusion of the research was that the deobfuscation performed by the decoder component prior to analysis dramatically improved the observed levels of similarity between test samples. This in turn allowed the modular similarity analysis system to identify derivative clusters (or families) within a large collection of shells more accurately. Techniques for isolating and re-rendering these clusters were also developed and demonstrated to be effective at increasing the amount of detail available for evaluating the relative magnitudes of the relationships within each cluster
    corecore