24 research outputs found

    WOPR: A Dynamic Cybersecurity Detection and Response Framework

    Get PDF
    Malware authors develop software to exploit the flaws in any platform and application which suffers a vulnerability in its defenses, be it through unpatched known attack vectors or zero-day attacks for which there is no current solution. It is the responsibility of cybersecurity personnel to monitor, detect, respond to and protect against such incidents that could affect their organization. Unfortunately, the low number of skilled, available cybersecurity professionals in the job market means that many positions go unfilled and cybersecurity threats are unknowingly allowed to negatively affect many enterprises.The demand for a greater cybersecurity posture has led several organizations to de- velop automated threat analysis tools which can be operated by less-skilled infor- mation security analysts and response teams. However, the diverse needs and organizational factors of most businesses presents a challenge for a “one size fits all” cybersecurity solution. Organizations in different industries may not have the same regulatory and standards compliance concerns due to processing different forms and classifications of data. As a result, many common security solutions are ill equipped to accurately model cybersecurity threats as they relate to each unique organization.We propose WOPR, a framework for automated static and dynamic analysis of software to identify malware threats, classify the nature of those threats, and deliver an appropriate automated incident response. Additionally, WOPR provides the end user the ability to adjust threat models to fit the risks relevant to an organization, allowing for bespoke automated cybersecurity threat management. Finally, WOPR presents a departure from traditional signature-based detection found in anti-virus and intrusion detection systems through learning system-level behavior and matching system calls with malicious behavior

    Integrating Multiple Data Views for Improved Malware Analysis

    Get PDF
    Malicious software (malware) has become a prominent fixture in computing. There have been many methods developed over the years to combat the spread of malware, but these methods have inevitably been met with countermeasures. For instance, signature-based malware detection gave rise to polymorphic viruses. This arms race\u27 will undoubtedly continue for the foreseeable future as the incentives to develop novel malware continue to outweigh the costs. In this dissertation, I describe analysis frameworks for three important problems related to malware: classification, clustering, and phylogenetic reconstruction. The important component of my methods is that they all take into account multiple views of malware. Typically, analysis has been performed in either the static domain (e.g. the byte information of the executable) or the dynamic domain (e.g. system call traces). This dissertation develops frameworks that can easily incorporate well-studied views from both domains, as well as any new views that may become popular in the future. The only restriction that must be met is that a positive semidefinite similarity (kernel) matrix must be defined on the view, a restriction that is easily met in practice. While the classification problem can be solved with well known multiple kernel learning techniques, the clustering and phylogenetic problems required the development of novel machine learning methods, which I present in this dissertation. It is important to note that although these methods were developed in the context of the malware problem, they are applicable to a wide variety of domains

    On Leveraging Next-Generation Deep Learning Techniques for IoT Malware Classification, Family Attribution and Lineage Analysis

    Get PDF
    Recent years have witnessed the emergence of new and more sophisticated malware targeting insecure Internet of Things (IoT) devices, as part of orchestrated large-scale botnets. Moreover, the public release of the source code of popular malware families such as Mirai [1] has spawned diverse variants, making it harder to disambiguate their ownership, lineage, and correct label. Such a rapidly evolving landscape makes it also harder to deploy and generalize effective learning models against retired, updated, and/or new threat campaigns. To mitigate such threat, there is an utmost need for effective IoT malware detection, classification and family attribution, which provide essential steps towards initiating attack mitigation/prevention countermeasures, as well as understanding the evolutionary trajectories and tangled relationships of IoT malware. This is particularly challenging due to the lack of fine-grained empirical data about IoT malware, the diverse architectures of IoT-targeted devices, and the massive code reuse between IoT malware families. To address these challenges, in this thesis, we leverage the general lack of obfuscation in IoT malware to extract and combine static features from multi-modal views of the executable binaries (e.g., images, strings, assembly instructions), along with Deep Learning (DL) architectures for effective IoT malware classification and family attribution. Additionally, we aim to address concept drift and the limitations of inter-family classification due to the evolutionary nature of IoT malware, by detecting in-class evolving IoT malware variants and interpreting the meaning behind their mutations. To this end, we perform the following to achieve our objectives: First, we analyze 70,000 IoT malware samples collected by a specialized IoT honeypot and popular malware repositories in the past 3 years. Consequently, we utilize features extracted from strings- and image-based representations of IoT malware to implement a multi-level DL architecture that fuses the learned features from each sub-component (i.e, images, strings) through a neural network classifier. Our in-depth experiments with four prominent IoT malware families highlight the significant accuracy of the proposed approach (99.78%), which outperforms conventional single-level classifiers, by relying on different representations of the target IoT malware binaries that do not require expensive feature extraction. Additionally, we utilize our IoT-tailored approach for labeling unknown malware samples, while identifying new malware strains. Second, we seek to identify when the classifier shows signs of aging, by which it fails to effectively recognize new variants and adapt to potential changes in the data. Thus, we introduce a robust and effective method that uses contrastive learning and attentive Transformer models to learn and compare semantically meaningful representations of IoT malware binaries and codes without the need for expensive target labels. We find that the evolution of IoT binaries can be used as an augmentation strategy to learn effective representations to contrast (dis)similar variant pairs. We discuss the impact and findings of our analysis and present several evaluation studies to highlight the tangled relationships of IoT malware, as well as the efficiency of our contrastively learned fine-grained feature vectors in preserving semantics and reducing out-of-vocabulary size in cross-architecture IoT malware binaries. We conclude this thesis by summarizing our findings and discussing research gaps that lay the way for future work

    Software similarity and classification

    Full text link
    This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities

    Automatic Configuration of Programmable Logic Controller Emulators

    Get PDF
    Programmable logic controllers (PLCs), which are used to control much of the world\u27s critical infrastructures, are highly vulnerable and exposed to the Internet. Many efforts have been undertaken to develop decoys, or honeypots, of these devices in order to characterize, attribute, or prevent attacks against Industrial Control Systems (ICS) networks. Unfortunately, since ICS devices typically are proprietary and unique, one emulation solution for a particular vendor\u27s model will not likely work on other devices. Many previous efforts have manually developed ICS honeypots, but it is a very time intensive process. Thus, a scalable solution is needed in order to automatically configure PLC emulators. The ScriptGenE Framework presented in this thesis leverages several techniques used in reverse engineering protocols in order to automatically configure PLC emulators using network traces. The accuracy, flexibility, and efficiency of the ScriptGenE Framework is tested in three fully automated experiments

    A study of malicious software on the macOS operating system

    Get PDF
    Much of the published malware research begins with a common refrain: the cost, quantum and complexity of threats are increasing, and research and practice should prioritise efforts to automate and reduce times to detect and prevent malware, while improving the consistency of categories and taxonomies applied to modern malware. Existing work related to malware targeting Apple's macOS platform has not been spared this approach, although limited research has been conducted on the true nature of threats faced by users of the operating system. While macOS focused research available consistently notes an increase in macOS users, devices and ultimately in threats, an opportunity exists to understand the real nature of threats faced by macOS users and suggest potential avenues for future work. This research provides a view of the current state of macOS malware by analysing and exploring a dataset of malware detections on macOS endpoints captured over a period of eleven months by an anti-malware software vendor. The dataset is augmented with malware information provided by the widely used Virus. Total service, as well as the application of prior automated malware categorisation work, AVClass to categorise and SSDeep to cluster and report on observed data. With Windows and Android platforms frequently in the spotlight as targets for highly disruptive malware like botnets, ransomware and cryptominers, research and intuition seem to suggest the threat of malware on this increasingly popular platform should be growing and evolving accordingly. Findings suggests that the direction and nature of growth and evolution may not be entirely as clear as industry reports suggest. Adware and Potentially Unwanted Applications (PUAs) make up the vast majority of the detected threats, with remote access trojans (RATs), ransomware and cryptocurrency miners comprising a relatively small proportion of the detected malware. This provides a number of avenues for potential future work to compare and contrast with research on other platforms, as well as identification of key factors that may influence its growth in the future

    The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

    Get PDF
    In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated
    corecore