72 research outputs found

    Neural malware detection

    Get PDF
    At the heart of today’s malware problem lies theoretically infinite diversity created by metamorphism. The majority of conventional machine learning techniques tackle the problem with the assumptions that a sufficiently large number of training samples exist and that the training set is independent and identically distributed. However, the lack of semantic features combined with the models under these wrong assumptions result largely in overfitting with many false positives against real world samples, resulting in systems being left vulnerable to various adversarial attacks. A key observation is that modern malware authors write a script that automatically generates an arbitrarily large number of diverse samples that share similar characteristics in program logic, which is a very cost-effective way to evade detection with minimum effort. Given that many malware campaigns follow this paradigm of economic malware manufacturing model, the samples within a campaign are likely to share coherent semantic characteristics. This opens up a possibility of one-to-many detection. Therefore, it is crucial to capture this non-linear metamorphic pattern unique to the campaign in order to detect these seemingly diverse but identically rooted variants. To address these issues, this dissertation proposes novel deep learning models, including generative static malware outbreak detection model, generative dynamic malware detection model using spatio-temporal isomorphic dynamic features, and instruction cognitive malware detection. A comparative study on metamorphic threats is also conducted as part of the thesis. Generative adversarial autoencoder (AAE) over convolutional network with global average pooling is introduced as a fundamental deep learning framework for malware detection, which captures highly complex non-linear metamorphism through translation invariancy and local variation insensitivity. Generative Adversarial Network (GAN) used as a part of the framework enables oneshot training where semantically isomorphic malware campaigns are identified by a single malware instance sampled from the very initial outbreak. This is a major innovation because, to the best of our knowledge, no approach has been found to this challenging training objective against the malware distribution that consists of a large number of very sparse groups artificially driven by arms race between attackers and defenders. In addition, we propose a novel method that extracts instruction cognitive representation from uninterpreted raw binary executables, which can be used for oneto- many malware detection via one-shot training against frequency spectrum of the Transformer’s encoded latent representation. The method works regardless of the presence of diverse malware variations while remaining resilient to adversarial attacks that mostly use random perturbation against raw binaries. Comprehensive performance analyses including mathematical formulations and experimental evaluations are provided, with the proposed deep learning framework for malware detection exhibiting a superior performance over conventional machine learning methods. The methods proposed in this thesis are applicable to a variety of threat environments here artificially formed sparse distributions arise at the cyber battle fronts.Doctor of Philosoph

    Proceedings of the First NASA Formal Methods Symposium

    Get PDF
    Topics covered include: Model Checking - My 27-Year Quest to Overcome the State Explosion Problem; Applying Formal Methods to NASA Projects: Transition from Research to Practice; TLA+: Whence, Wherefore, and Whither; Formal Methods Applications in Air Transportation; Theorem Proving in Intel Hardware Design; Building a Formal Model of a Human-Interactive System: Insights into the Integration of Formal Methods and Human Factors Engineering; Model Checking for Autonomic Systems Specified with ASSL; A Game-Theoretic Approach to Branching Time Abstract-Check-Refine Process; Software Model Checking Without Source Code; Generalized Abstract Symbolic Summaries; A Comparative Study of Randomized Constraint Solvers for Random-Symbolic Testing; Component-Oriented Behavior Extraction for Autonomic System Design; Automated Verification of Design Patterns with LePUS3; A Module Language for Typing by Contracts; From Goal-Oriented Requirements to Event-B Specifications; Introduction of Virtualization Technology to Multi-Process Model Checking; Comparing Techniques for Certified Static Analysis; Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder; jFuzz: A Concolic Whitebox Fuzzer for Java; Machine-Checkable Timed CSP; Stochastic Formal Correctness of Numerical Algorithms; Deductive Verification of Cryptographic Software; Coloured Petri Net Refinement Specification and Correctness Proof with Coq; Modeling Guidelines for Code Generation in the Railway Signaling Context; Tactical Synthesis Of Efficient Global Search Algorithms; Towards Co-Engineering Communicating Autonomous Cyber-Physical Systems; and Formal Methods for Automated Diagnosis of Autosub 6000

    THE SCALABLE AND ACCOUNTABLE BINARY CODE SEARCH AND ITS APPLICATIONS

    Get PDF
    The past decade has been witnessing an explosion of various applications and devices. This big-data era challenges the existing security technologies: new analysis techniques should be scalable to handle “big data” scale codebase; They should be become smart and proactive by using the data to understand what the vulnerable points are and where they locate; effective protection will be provided for dissemination and analysis of the data involving sensitive information on an unprecedented scale. In this dissertation, I argue that the code search techniques can boost existing security analysis techniques (vulnerability identification and memory analysis) in terms of scalability and accuracy. In order to demonstrate its benefits, I address two issues of code search by using the code analysis: scalability and accountability. I further demonstrate the benefit of code search by applying it for the scalable vulnerability identification [57] and the cross-version memory analysis problems [55, 56]. Firstly, I address the scalability problem of code search by learning “higher-level” semantic features from code [57]. Instead of conducting fine-grained testing on a single device or program, it becomes much more crucial to achieve the quick vulnerability scanning in devices or programs at a “big data” scale. However, discovering vulnerabilities in “big code” is like finding a needle in the haystack, even when dealing with known vulnerabilities. This new challenge demands a scalable code search approach. To this end, I leverage successful techniques from the image search in computer vision community and propose a novel code encoding method for scalable vulnerability search in binary code. The evaluation results show that this approach can achieve comparable or even better accuracy and efficiency than the baseline techniques. Secondly, I tackle the accountability issues left in the vulnerability searching problem by designing vulnerability-oriented raw features [58]. The similar code does not always represent the similar vulnerability, so it requires that the feature engineering for the code search should focus on semantic level features rather than syntactic ones. I propose to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search. A conditional formula explicitly captures two cardinal factors of a vulnerability: 1) erroneous data dependencies and 2) missing or invalid condition checks. As a result, the binary code search on conditional formulas produces significantly higher accuracy and provides meaningful evidence for human analysts to further examine the search results. The evaluation results show that this approach can further improve the search accuracy of existing bug search techniques with very reasonable performance overhead. Finally, I demonstrate the potential of the code search technique in the memory analysis field, and apply it to address their across-version issue in the memory forensic problem [55, 56]. The memory analysis techniques for COTS software usually rely on the so-called “data structure profiles” for their binaries. Construction of such profiles requires the expert knowledge about the internal working of a specified software version. However, it is still a cumbersome manual effort most of time. I propose to leverage the code search technique to enable a notion named “cross-version memory analysis”, which can update a profile for new versions of a software by transferring the knowledge from the model that has already been trained on its old version. The evaluation results show that the code search based approach advances the existing memory analysis methods by reducing the manual efforts while maintaining the reasonable accuracy. With the help of collaborators, I further developed two plugins to the Volatility memory forensic framework [2], and show that each of the two plugins can construct a localized profile to perform specified memory forensic tasks on the same memory dump, without the need of manual effort in creating the corresponding profile

    Practical Systems For Strengthening And Weakening Binary Analysis Frameworks

    Get PDF
    Binary analysis detects software vulnerability. Cutting-edge analysis techniques can quickly and automatically explore the internals of a program and report any discovered problems. Therefore, developers commonly use various analysis techniques as part of their software development process. Unfortunately, it also means that such techniques and the automatic natures of binary analysis methods are appealing to adversaries who are looking for zero-day vulnerabilities. In this thesis, binary analysis is considered a double-edged sword for the users, based on their purpose. To deliver the benefit of the binary analysis only for credible users such as developers or testers, this thesis aims to present a practical system to strengthening the binary analysis for the trusted parties and weakening the power of the binary analysis against the untrusted groups exclusively. To achieve the aforementioned goals, this thesis presents the new domain of the binary analysis in two directions: 1) a protection technique against the fuzz testing and 2) a new binary analysis system to expand the applicability of the current binary analysis techniques. The mitigation approach will help developers protect the released software from attackers who can apply fuzzing techniques. On the other hand, the new binary analysis frameworks will provide a set of solutions to address the challenges that COTS binary fuzzing faces.Ph.D

    Fingerprinting Vulnerabilities in Intelligent Electronic Device Firmware

    Get PDF
    Modern smart grid deployments heavily rely on the advanced capabilities that Intelligent Electronic Devices (IEDs) provide. Furthermore, these devices firmware often contain critical vulnerabilities that if exploited could cause large impacts on national economic security, and national safety. As such, a scalable domain specific approach is required in order to assess the security of IED firmware. In order to resolve this lack of an appropriate methodology, we present a scalable vulnerable function identification framework. It is specifically designed to analyze IED firmware and binaries that employ the ARM CPU architecture. Its core functionality revolves around a multi-stage detection methodology that is specifically designed to resolve the lack of specialization that limits other general-purpose approaches. This is achieved by compiling an extensive database of IED specific vulnerabilities and domain specific firmware that is evaluated. Its analysis approach is composed of three stages that leverage function syntactic, semantic, structural and statistical features in order to identify vulnerabilities. As such it (i) first filters out dissimilar functions based on a group of heterogeneous features, (ii) it then further filters out dissimilar functions based on their execution paths, and (iii) it finally identifies candidate functions based on fuzzy graph matching . In order to validate our methodologies capabilities, it is implemented as a binary analysis framework entitled BinArm. The resulting algorithm is then put through a rigorous set of evaluations that demonstrate its capabilities. These include the capability to identify vulnerabilities within a given IED firmware image with a total accuracy of 0.92

    Fuzzware: Using Precise MMIO Modeling for Effective Firmware Fuzzing

    Get PDF
    As embedded devices are becoming more pervasive in our everyday lives, they turn into an attractive target for adversaries. Despite their high value and large attack surface, applying automated testing techniques such as fuzzing is not straightforward for such devices. As fuzz testing firmware on constrained embedded devices is inefficient, state-of-the-art approaches instead opt to run the firmware in an emulator (through a process called re-hosting). However, existing approaches either use coarse-grained static models of hardware behavior or require manual effort to re-host the firmware. We propose a novel combination of lightweight program analysis, re-hosting, and fuzz testing to tackle these challenges. We present the design and implementation of Fuzzware, a software-only system to fuzz test unmodified monolithic firmware in a scalable way. By determining how hardware-generated values are actually used by the firmware logic, Fuzzware can automatically generate models that help focusing the fuzzing process on mutating the inputs that matter, which drastically improves its effectiveness. We evaluate our approach on synthetic and real-world targets comprising a total of 19 hardware platforms and 77 firmware images. Compared to state-of-the-art work, Fuzzware achieves up to 3.25 times the code coverage and our modeling approach reduces the size of the input space by up to 95.5%. The synthetic samples contain 66 unit tests for various hardware interactions, and we find that our approach is the first generic re-hosting solution to automatically pass all of them. Fuzzware discovered 15 completely new bugs including bugs in targets which were previously analyzed by other works; a total of 12 CVEs were assigned

    Improving Memory Forensics Through Emulation and Program Analysis

    Get PDF
    Memory forensics is an important tool in the hands of investigators. However, determining if a computer is infected with malicious software is time consuming, even for experts. Tasks that require manual reverse engineering of code or data structures create a significant bottleneck in the investigative workflow. Through the application of emulation software and symbolic execution, these strains have been greatly lessened, allowing for faster and more thorough investigation. Furthermore, these efforts have reduced the barrier for forensic investigation, so that reasonable conclusions can be drawn even by non-expert investigators. While previously Volatility had allowed for the detection of malicious hooks and injected code with an insurmountably high false positive rate, the techniques presented in the work have allowed for a much lower false positive rate automatically, and yield more detailed information when manual analysis is required. The second contribution of this work is to improve the reliability of memory forensic tools. As it currently stands, if some component of the operating system or language runtime has been updated, the task of verifying that these changes do not affect the correctness of investigative tools involves a large reverse engineering effort, and significant domain knowledge, on the part of whoever maintains the tool. Through modifications of the techniques used in the hook analysis, this burden can be lessened or eliminated by comparing the last known functionality to the new functionality. This allows the tool to be updated quickly and effectively, so that investigations can proceed without issue
    • …
    corecore