166 research outputs found

    Image-based malware classification: A space filling curve approach

    Get PDF
    Anti-virus (AV) software is effective at distinguishing between benign and malicious programs yet lack the ability to effectively classify malware into their respective family classes. AV vendors receive considerably large volumes of malicious programs daily and so classification is crucial to quickly identify variants of existing malware that would otherwise have to be manually examined. This paper proposes a novel method of visualizing and classifying malware using Space-Filling Curves (SFC\u27s) in order to improve the limitations of AV tools. The classification models produced were evaluated on previously unseen samples and showed promising results, with precision, recall and accuracy scores of 82%, 80% and 83% respectively. Furthermore, a comparative assessment with previous research and current AV technologies revealed that the method presented her was robust, outperforming most commercial and open-source AV scanner software programs

    DroidSieve:Fast and Accurate Classification of Obfuscated Android Malware

    Get PDF
    With more than two million applications, Android marketplaces require automatic and scalable methods to efficiently vet apps for the absence of malicious threats. Recent techniques have successfully relied on the extraction of lightweight syntactic features suitable for machine learning classification, but despite their promising results, the very nature of such features suggest they would unlikely-on their own-be suitable for detecting obfuscated Android malware. To address this challenge, we propose DroidSieve, an Android malware classifier based on static analysis that is fast, accurate, and resilient to obfuscation. For a given app, DroidSieve first decides whether the app is malicious and, if so, classifies it as belonging to a family of related malware. DroidSieve exploits obfuscation-invariant features and artifacts introduced by obfuscation mechanisms used in malware. At the same time, these purely static features are designed for processing at scale and can be extracted quickly. For malware detection, we achieve up to 99.82% accuracy with zero false positives; for family identification of obfuscated malware, we achieve 99.26% accuracy at a fraction of the computational cost of state-of-The-Art techniques

    Multi-level analysis of Malware using Machine Learning

    Get PDF
    Multi-level analysis of Malware using Machine Learnin

    Program Similarity Analysis for Malware Classification and its Pitfalls

    Get PDF
    Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process

    DDoS-Capable IoT Malwares: comparative analysis and Mirai Investigation

    Get PDF
    The Internet of Things (IoT) revolution has not only carried the astonishing promise to interconnect a whole generation of traditionally “dumb” devices, but also brought to the Internet the menace of billions of badly protected and easily hackable objects. Not surprisingly, this sudden flooding of fresh and insecure devices fueled older threats, such as Distributed Denial of Service (DDoS) attacks. In this paper, we first propose an updated and comprehensive taxonomy of DDoS attacks, together with a number of examples on how this classification maps to real-world attacks. Then, we outline the current situation of DDoS-enabled malwares in IoT networks, highlighting how recent data support our concerns about the growing in popularity of these malwares. Finally, we give a detailed analysis of the general framework and the operating principles of Mirai, the most disruptive DDoS-capable IoT malware seen so far

    EtherAnnotate: a transparent malware analysis tool for integrating dynamic and static examination

    Get PDF
    Software security researchers commonly reverse engineer and analyze current malicious software (malware) to determine what the latest techniques malicious attackers are utilizing and how to protect computer systems from attack. The most common analysis methods involve examining how the program behaves during execution and interpreting its machine-level instructions. However, modern malicious applications use advanced anti-debugger, anti-virtualization, and code packing techniques to obfuscate the malware\u27s true activities and divert security analysts. Malware analysts currently do not have a simple method for tracing malicious code activity at the instruction-level in a highly undetectable environment. There also lacks a simple method for combining actual run-time register and memory values with statically disassembled code. Combining statically disassembled code with the run-time values found in the memory and registers being accessed would create a new level of analysis possible by combining key aspects of static analysis with dynamic analysis. This thesis presents EtherAnnotate, a new extension to the Xen Ether virtualization framework and the IDA Pro disassembler to aid in the task of malicious software analysis. This new extension consists of two separate components - an enhanced instruction tracer and a graphical annotation and visualization plug-in for IDA Pro. The specialized instruction tracer places a malware binary into a virtualized environment and records the contents of all processor general register values that occur during its execution. The annotation plug-in for IDA Pro interprets the output of the instruction tracer and adds line comments of the register values in addition to visualizing code coverage of all disassembled instructions that were executed during the malware\u27s execution. These two tools can be combined to provide a new level of introspection for advanced malware that was not available with the previous state-of-the-art analysis tools --Abstract, page iii
    • …
    corecore