58 research outputs found

    Probabilistic Naming of Functions in Stripped Binaries

    Get PDF
    Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art

    CyberGuarder: a virtualization security assurance architecture for green cloud computing

    Get PDF
    Cloud Computing, Green Computing, Virtualization, Virtual Security Appliance, Security Isolation

    Malware classification using self organising feature maps and machine activity data

    Get PDF
    In this article we use machine activity metrics to automatically distinguish between malicious and trusted portable executable software samples. The motivation stems from the growth of cyber attacks using techniques that have been employed to surreptitiously deploy Advanced Persistent Threats (APTs). APTs are becoming more sophisticated and able to obfuscate much of their identifiable features through encryption, custom code bases and in-memory execution. Our hypothesis is that we can produce a high degree of accuracy in distinguishing malicious from trusted samples using Machine Learning with features derived from the inescapable footprint left behind on a computer system during execution. This includes CPU, RAM, Swap use and network traffic at a count level of bytes and packets. These features are continuous and allow us to be more flexible with the classification of samples than discrete features such as API calls (which can also be obfuscated) that form the main feature of the extant literature. We use these continuous data and develop a novel classification method using Self Organizing Feature Maps to reduce over fitting during training through the ability to create unsupervised clusters of similar ‘behaviour’ that are subsequently used as features for classification, rather than using the raw data. We compare our method to a set of machine classification methods that have been applied in previous research and demonstrate an increase of between 7.24% and 25.68% in classification accuracy using our method and an unseen dataset over the range of other machine classification methods that have been applied in previous research

    Exploiting tactics, techniques, and procedures for malware detection

    Get PDF
    There has been a meteoric rise in the use of malware to perpetrate cybercrime and more generally, serve the interests of malicious actors. As a result, malware has evolved both in terms of its sheer variety and sophistication. There is hence a need for developing effective malware detection systems to counter this surge. Typically, most such systems nowadays are purely data-driven - they utilise Machine Learning (ML) based approaches which rely on large volumes of data, to spot patterns, detect anomalies, and thus detect malware. In this thesis, we propose a methodology for malware detection on networks that combines human domain knowledge with conventional malware detection approaches to more effectively identify, reason about, and be resilient to malware. Specifically, we use domain knowledge in the form of the Tactics, Techniques, and Procedures (TTPs) described in the MITRE ATT\&CK ontology of adversarial behaviour to build Network Intrusion Detection Systems (NIDS). Through the course of our research, we design and evaluate the first such NIDS that can effectively exploit TTPs for the purpose of malware detection. We then attempt to expand the scope of usability of these TTPs to systems other than our specialised NIDS, and develop a methodology that lets any generic ML-based NIDS exploit these TTPs as model features. We further expand and generalise our approach by modelling it as a multi-label classification problem, which enables us to: (i) detect malware more precisely on the basis of individual TTPs, and (ii) identify the malicious usage of uncommon or rarely-used TTPs. Throughout all our experiments, we rigorously evaluate all our systems on several metrics using large datasets of real-world malware and benign samples. We empirically demonstrate the usefulness of TTPs in the malware detection process, the benefits of a TTP-based approach in reasoning about malware and responding to various challenging conditions, and the overall robustness of our systems to adversarial attack. As a consequence, we establish and improve the state-of-the-art when it comes to detecting network-based malware using TTP-based information. This thesis overall represents a step forward in building automated systems that combine purely-data driven approaches with human expertise in the field of malware analysis

    A Dynamic Self-Structuring Neural Network

    Get PDF
    Creating a neural network based classification model is commonly accomplished using the trial and error technique. However, the trial and error structuring method have several difficulties such as time and availability of experts. In this article, an algorithm that simplifies structuring neural network classification models has been proposed. The algorithm aims at creating a large enough structure to learn models from the training dataset that can be generalised well on the testing dataset. Our algorithm dynamically tunes the structure parameters during the training phase aiming to derive accurate non-overfitting classifiers. The proposed algorithm has been applied to phishing websites classification problem and it shows competitive results with respect to various evaluation measures such as Harmonic Mean (F1-score), precision, accuracy, etc

    Effective, Efficient, and Robust Packing Detection and Classification

    Get PDF
    International audiencePacking is a widespread tool to prevent static malware detection and analysis. Detecting and classifying the packer used by a given malware sample is fundamental to being able to unpack and study the malware, whether manually or automatically. Existing literature on packing detection and classification has focused on effectiveness, but does not consider the efficiency required to be part of a practical malware-analysis workflow. This paper studies how to train packing detection and classification algorithms based on machine learning to be both highly effective and efficient. Initially, we create ground truths by labeling more than 280,000 samples with three different techniques. Then we perform feature selection considering the contribution and computation cost of features. Then we iterate over more than 1,500 combinations of features, scenarios, and algorithms to determine which algorithms are the most effective and efficient, finding that a reduction of 1-2% effectiveness can increase efficiency by 17-44 times. Then, we test how the best algorithms perform against malware collected after the training data to assess them against new packing techniques and versions, finding a large impact of the ground truth used on algorithm robustness. Finally, we perform an economic analysis and find simple algorithms with small feature sets to be more economical than complex algorithms with large feature sets based on uptime/training time ratio

    Cyber Anomaly Detection: Using Tabulated Vectors and Embedded Analytics for Efficient Data Mining

    Get PDF
    Firewalls, especially at large organizations, process high velocity internet traffic and flag suspicious events and activities. Flagged events can be benign, such as misconfigured routers, or malignant, such as a hacker trying to gain access to a specific computer. Confounding this is that flagged events are not always obvious in their danger and the high velocity nature of the problem. Current work in firewall log analysis is manual intensive and involves manpower hours to find events to investigate. This is predominantly achieved by manually sorting firewall and intrusion detection/prevention system log data. This work aims to improve the ability of analysts to find events for cyber forensics analysis. A tabulated vector approach is proposed to create meaningful state vectors from time-oriented blocks. Multivariate and graphical analysis is then used to analyze state vectors in human–machine collaborative interface. Statistical tools, such as the Mahalanobis distance, factor analysis, and histogram matrices, are employed for outlier detection. This research also introduces the breakdown distance heuristic as a decomposition of the Mahalanobis distance, by indicating which variables contributed most to its value. This work further explores the application of the tabulated vector approach methodology on collected firewall logs. Lastly, the analytic methodologies employed are integrated into embedded analytic tools so that cyber analysts on the front-line can efficiently deploy the anomaly detection capabilities

    Automated Approach to Intrusion Detection in VM-based Dynamic Execution Environment

    Get PDF
    Because virtual computing platforms are dynamically changing, it is difficult to build high-quality intrusion detection system. In this paper, we present an automated approach to intrusions detection in order to maintain sufficient performance and reduce dependence on execution environment. We discuss a hidden Markov model strategy for abnormality detection using frequent system call sequences, letting us identify attacks and intrusions automatically and efficiently. We also propose an automated mining algorithm, named AGAS, to generate frequent system call sequences. In our approach, the detection performance is adaptively tuned according to the execution state every period. To improve performance, the period value is also under self-adjustment

    Secure peer-to-peer networks for trusted collaboration

    Full text link
    • 

    corecore