8 research outputs found

    Detecting derivative malware samples using deobfuscation-assisted similarity analysis

    Get PDF
    The overwhelming popularity of PHP as a hosting platform has made it the language of choice for developers of Remote Access Trojans (RATs or web shells) and other malicious software. These shells are typically used to compromise and monetise web platforms by providing the attacker with basic remote access to the system, including _le transfer, command execution, network reconnaissance, and database connectivity. Once infected, compromised systems can be used to defraud users by hosting phishing sites, performing Distributed Denial of Service attacks, or serving as anonymous platforms for sending spam or other malfeasance. The vast majority of these threats are largely derivative, incorporating core capabilities found in more established RATs such as c99 and r57. Authors of malicious software routinely produce new shell variants by modifying the behaviours of these ubiquitous RATs, either to add desired functionality or to avoid detection by signature-based detection systems. Once these modified shells are eventually identified (or additional functionality is required), the process of shell adaptation begins again. The end result of this iterative process is a web of separate but related shell variants, many of which are at least partially derived from one of the more popular and influential RATs. In response to the problem outlined above, the author set out to design and implement a system capable of circumventing common obfuscation techniques and identifying derivative malware samples in a given collection. To begin with, a decoder component was developed to syntactically deobfuscate and normalise PHP code by detecting and reversing idiomatic obfuscation constructs, and to apply uniform formatting conventions to all system inputs. A unified malware analysis framework, called Viper, was then extended to create a modular similarity analysis system comprised of individual feature extraction modules, modules responsible for batch processing, a matrix module for comparing sample features, and two visualisation modules capable of generating visual representations of shell similarity. The principal conclusion of the research was that the deobfuscation performed by the decoder component prior to analysis dramatically improved the observed levels of similarity between test samples. This in turn allowed the modular similarity analysis system to identify derivative clusters (or families) within a large collection of shells more accurately. Techniques for isolating and re-rendering these clusters were also developed and demonstrated to be effective at increasing the amount of detail available for evaluating the relative magnitudes of the relationships within each cluster

    Backup To The Rescue: Automated Forensic Techniques For Advanced Website-Targeting Cyber Attacks

    Get PDF
    The last decade has seen a significant rise in non-technical users gaining a web presence, often via the easy-to-use functionalities of Content Management Systems (CMS). In fact, over 60% of the world’s websites run on CMSs. Unfortunately, this huge user population has made CMS-based websites a high-profile target for hackers. Worse still, the vast majority of the website hosting industry has shifted to a “backup and restore” model of security, which relies on error-prone AV scanners to prompt non-technical users to roll back to a pre-infection nightly snapshot. My cyber forensics research directly addresses this emergent problem by developing next-generation techniques for the investigation of advanced cyber crimes. Driven by economic incentives, attackers abuse the trust in this economy: selling malware on legitimate marketplaces, pirating popular website plugins, and infecting websites post-deployment. Furthermore, attackers are exploiting these websites at scale by carelessly dropping thousands of obfuscated and packed malicious files on the webserver. This is counter-intuitive since attackers are assumed to be stealthy. Despite the rise in web attacks, efficiently locating and accurately analyzing the malware dropped on compromised webservers has remained an open research challenge. This dissertation posits that the already collected webserver nightly backup snapshots contain all required information to enable automated and scalable detection of website compromises. This dissertation presents a web attack forensics framework that leverages program analysis to automatically understand the webserver’s nightly backup snapshots. This will enable the recovery of temporal phases of a webserver compromise and its origin within the website supply chain.Ph.D

    Dissection of Modern Malicious Software

    Get PDF
    The exponential growth of the number of malicious software samples, known by malware in the specialized literature, constitutes nowadays one of the major concerns of cyber-security professionals. The objectives of the creators of this type of malware are varied, and the means used to achieve them are getting increasingly sophisticated. The increase of the computation and storage resources, as well as the globalization have been contributing to this growth, and fueling an entire industry dedicated to developing, selling and improving systems or solutions for securing, recovering, mitigating and preventing malware related incidents. The success of these systems typically depends of detailed analysis, often performed by humans, of malware samples captured in the wild. This analysis includes the search for patterns or anomalous behaviors that may be used as signatures to identify or counter-attack these threats. This Master of Science (Ms.C.) dissertation addresses problems related with dissecting and analyzing malware. The main objectives of the underlying work were to study and understand the techniques used by this type of software nowadays, as well as the methods that are used by specialists on that analysis, so as to conduct a detailed investigation and produce structured documentation for at least one modern malware sample. The work was mostly focused in malware developed for the Operating Systems (OSs) of the Microsoft Windows family for desktops. After a brief study of the state of the art, the dissertation presents the classifications applied to malware, which can be found in the technical literature on the area, elaborated mainly by an industry community or seller of a security product. The structuring of the categories is nonetheless the result of an effort to unify or complete different classifications. The families of some of the most popular or detected malware samples are also presented herein, initially in a tabular form and, subsequently, via a genealogical tree, with some of the variants of each previously described family. This tree provides an interesting perspective over malware and is one of the contributions of this programme. Within the context of the description of functionalities and behavior of malware, some advanced techniques, with which modern specimens of this type of software are equipped to ease their propagation and execution, while hindering their detection, are then discussed with more detail. The discussion evolves to the presentation of the concepts related to the detection and defense against modern malware, along with a small introduction to the main subject of this work. The analysis and dissection of two samples of malware is then the subject of the final chapters of the dissertation. A basic static analysis is performed to the malware known as Stuxnet, while the Trojan Banker known as Tinba/zuzy is subdued to both basic and advanced dynamic analysis. The results of this part of the work emphasize difficulties associated with these tasks and the sophistication and dangerous level of samples under investigation.O crescimento exponencial do número de amostras de software malicioso, conhecido na gíria informática como malware, constitui atualmente uma das maiores preocupações dos profissionais de cibersegurança. São vários os objetivos dos criadores deste tipo de software e a forma cada vez mais sofisticada como os mesmos são alcançados. O aumento da computação e capacidade de armazenamento, bem como a globalização, têm contribuído para este crescimento, e têm alimentado toda uma indústria dedicada ao desenvolvimento, venda e melhoramento de sistemas ou soluções de segurança, recuperação, mitigação e prevenção de incidentes relacionados com malware. O sucesso destes sistemas depende normalmente da análise detalhada, feita muitas vezes por humanos, de peças de malware capturadas no seu ambiente de atuação. Esta análise compreende a procura de padrões ou de comportamentos anómalos que possam servir de assinatura para identificar ou contra-atacar essas ameaças. Esta dissertação aborda a problemática da análise e dissecação de malware. O trabalho que lhe está subjacente tinha como objetivos estudar e compreender as técnicas utilizadas por este tipo de software hoje em dia, bem como as que são utilizadas por especialistas nessa análise, de forma a conduzir uma investigação detalhada e a produzir documentação estruturada sobre pelo menos uma amostra de malware moderna. O trabalho focou-se, sobretudo, em malware desenvolvido para os sistemas operativos da família Microsoft Windows para computadores de secretária. Após um breve estudo ao estado da arte, a dissertação apresenta as classificações de malware encontradas na literatura técnica da especialidade, principalmente usada pela indústria, resultante de um esforço de unificação das mesmas. São também apresentadas algumas das famílias de malware mais detetadas da atualidade, inicialmente através de uma tabela e, posteriormente, através de uma árvore geneológica, com algumas das variantes de cada uma das famílias descritas previamente. Esta árvore fornece uma perspetiva interessante sobre malware e constitui uma das contribuições deste programa de mestrado. Ainda no âmbito da descrição de funcionalidades e comportamentos do malware, são expostas, com algum detalhe, algumas técnicas avançadas com as quais os programas maliciosos mais modernos são por vezes munidos com o intuito a facilitar a sua propagação e execução, dificultando a sua deteção. A descrição evolui para a apresentação dos conceitos adjacentes à deteção e combate ao malware moderno, assim como para uma pequena introdução ao tema principal deste trabalho. A análise e dissecação de duas amostras de malware moderno surgem nos capítulos finais da dissertação. Ao malware conhecido por Stuxnet é feita a análise básica estática, enquanto que ao Trojan Banker Tinba/zusy é feita e demonstrada a análise dinâmica básica e avançada. Os resultados desta parte são demonstrativos do grau de sofisticação e perigosidade destas amostras e das dificuldades associadas a estas tarefas

    On Leveraging Next-Generation Deep Learning Techniques for IoT Malware Classification, Family Attribution and Lineage Analysis

    Get PDF
    Recent years have witnessed the emergence of new and more sophisticated malware targeting insecure Internet of Things (IoT) devices, as part of orchestrated large-scale botnets. Moreover, the public release of the source code of popular malware families such as Mirai [1] has spawned diverse variants, making it harder to disambiguate their ownership, lineage, and correct label. Such a rapidly evolving landscape makes it also harder to deploy and generalize effective learning models against retired, updated, and/or new threat campaigns. To mitigate such threat, there is an utmost need for effective IoT malware detection, classification and family attribution, which provide essential steps towards initiating attack mitigation/prevention countermeasures, as well as understanding the evolutionary trajectories and tangled relationships of IoT malware. This is particularly challenging due to the lack of fine-grained empirical data about IoT malware, the diverse architectures of IoT-targeted devices, and the massive code reuse between IoT malware families. To address these challenges, in this thesis, we leverage the general lack of obfuscation in IoT malware to extract and combine static features from multi-modal views of the executable binaries (e.g., images, strings, assembly instructions), along with Deep Learning (DL) architectures for effective IoT malware classification and family attribution. Additionally, we aim to address concept drift and the limitations of inter-family classification due to the evolutionary nature of IoT malware, by detecting in-class evolving IoT malware variants and interpreting the meaning behind their mutations. To this end, we perform the following to achieve our objectives: First, we analyze 70,000 IoT malware samples collected by a specialized IoT honeypot and popular malware repositories in the past 3 years. Consequently, we utilize features extracted from strings- and image-based representations of IoT malware to implement a multi-level DL architecture that fuses the learned features from each sub-component (i.e, images, strings) through a neural network classifier. Our in-depth experiments with four prominent IoT malware families highlight the significant accuracy of the proposed approach (99.78%), which outperforms conventional single-level classifiers, by relying on different representations of the target IoT malware binaries that do not require expensive feature extraction. Additionally, we utilize our IoT-tailored approach for labeling unknown malware samples, while identifying new malware strains. Second, we seek to identify when the classifier shows signs of aging, by which it fails to effectively recognize new variants and adapt to potential changes in the data. Thus, we introduce a robust and effective method that uses contrastive learning and attentive Transformer models to learn and compare semantically meaningful representations of IoT malware binaries and codes without the need for expensive target labels. We find that the evolution of IoT binaries can be used as an augmentation strategy to learn effective representations to contrast (dis)similar variant pairs. We discuss the impact and findings of our analysis and present several evaluation studies to highlight the tangled relationships of IoT malware, as well as the efficiency of our contrastively learned fine-grained feature vectors in preserving semantics and reducing out-of-vocabulary size in cross-architecture IoT malware binaries. We conclude this thesis by summarizing our findings and discussing research gaps that lay the way for future work

    Optimum parameter machine learning classification and prediction of Internet of Things (IoT) malwares using static malware analysis techniques

    Get PDF
    Application of machine learning in the field of malware analysis is not a new concept, there have been lots of researches done on the classification of malware in android and windows environments. However, when it comes to malware analysis in the internet of things (IoT), it still requires work to be done. IoT was not designed to keeping security/privacy under consideration. Therefore, this area is full of research challenges. This study seeks to evaluate important machine learning classifiers like Support Vector Machines, Neural Network, Random Forest, Decision Trees, Naive Bayes, Bayesian Network, etc. and proposes a framework to utilize static feature extraction and selection processes highlight issues like over-fitting and generalization of classifiers to get an optimized algorithm with better performance. For background study, we used systematic literature review to find out research gaps in IoT, presented malware as a big challenge for IoT and the reasons for applying malware analysis targeting IoT devices and finally perform classification on malware dataset. The classification process used was applied on three different datasets containing file header, program header and section headers as features. Preliminary results show the accuracy of over 90% on file header, program header, and section headers. The scope of this document just discusses these results as initial results and still require some issues to be addressed which may effect on the performance measures

    Towards a sandbox for the deobfuscation and dissection of PHP malware

    No full text
    The creation and proliferation of PHP-based Remote Access Trojans (or web shells) used in both the compromise and post exploitation of web platforms has fuelled research into automated methods of dissecting and analysing these shells. Current malware tools disguise themselves by making use of obfuscation techniques designed to frustrate any efforts to dissect or reverse engineer the code. Advanced code engineering can even cause malware to behave differently if it detects that it is not running on the system for which it was originally targeted. To combat these defensive techniques, this paper presents a sandbox-based environment that aims to accurately mimic a vulnerable host and is capable of semi-automatic semantic dissection and syntactic deobfuscation of PHP code

    Towards a Sandbox for the Deobfuscation and Dissection of PHP Malware: A Literature Survey

    No full text
    Abstract The creation and proliferation of Remote Access Trojans (or web shells) capable of compromising web platforms has fuelled research into automated methods of dissecting and analysing these shells. In the past, such shells were ably detected using signature matching, a process that is currently unable to cope with the sheer volume and variety of web-based malware in circulation. This survey describes and evaluates some of the notable solutions that have been proposed to address the twin problems of code deobfuscation and dissection with the aim of identifying viable and automatable analysis techniques
    corecore