56 research outputs found

    A Human-Centric Approach For Binary Code Decompilation

    Get PDF
    Many security techniques have been developed both in academia and industry to analyze source code, including methods to discover bugs, apply taint tracking, or find vulnerabilities. These source-based techniques leverage the wealth of high-level abstractions available in the source code to achieve good precision and efficiency. Unfortunately, these methods cannot be applied directly on binary code which lacks such abstractions. In security, there are many scenarios where analysts only have access to the compiled version of a program. When compiled, all high-level abstractions, such as variables, types, and functions, are removed from the final version of the program that security analysts have access to. This dissertation investigates novel methods to recover abstractions from binary code. First, a novel pattern-independent control flow structuring algorithm is presented to recover high-level control-flow abstractions from binary code. Unlike existing structural analysis algorithms which produce unstructured code with many goto statements, our algorithm produces fully-structured goto-free decompiled code. We implemented this algorithm in a decompiler called DREAM. Second, we develop three categories of code optimizations in order to simplify the decompiled code and increase readability. These categories are expression simplification, control-flow simplification and semantics-aware naming. We have implemented our usability extensions on top of DREAM and call this extended version DREAM++. We conducted the first user study to evaluate the quality of decompilers for malware analysis. We have chosen malware since it represents one of the most challenging cases for binary code analysis. The study included six reverse engineering tasks of real malware samples that we obtained from independent malware experts. We evaluated three decompilers: the leading industry decompiler Hex-Rays and both versions of our decompiler DREAM and DREAM++. The results of our study show that our improved decompiler DREAM++ produced significantly more understandable code that outperforms both Hex-Rays and DREAM. Using DREAM++participants solved 3 times more tasks than when using Hex-Rays and 2 times more tasks than when using DREAM. Moreover, participants rated DREAM++ significantly higher than the competition

    Decompiler For Pseudo Code Generation

    Get PDF
    Decompiling is an area of interest for researchers in the field of software reverse engineering. When the source code from a high-level programming language is compiled, it loses a great deal of information, including code structure, syntax, and punctuation.The purpose of this research is to develop an algorithm that can efficiently decompile assembly language into pseudo C code. There are tools available that claim to extract high-level code from an executable file, but the results of these tools tend to be inaccurate and unreadable.Our proposed algorithm can decompile assembly code to recover many basic high-level programming structures, including if/else, loops, switches, and math instructions. The approach adopted here is different from that of existing tools. Our algorithm performs three passes through the assembly code, and includes a virtual execution of each assembly instruction. We also construct a dependency graph and incidence list to aid in the decompilation

    Decompiling x86 Deep Neural Network Executables

    Full text link
    Due to their widespread use on heterogeneous hardware devices, deep learning (DL) models are compiled into executables by DL compilers to fully leverage low-level hardware primitives. This approach allows DL computations to be undertaken at low cost across a variety of computing platforms, including CPUs, GPUs, and various hardware accelerators. We present BTD (Bin to DNN), a decompiler for deep neural network (DNN) executables. BTD takes DNN executables and outputs full model specifications, including types of DNN operators, network topology, dimensions, and parameters that are (nearly) identical to those of the input models. BTD delivers a practical framework to process DNN executables compiled by different DL compilers and with full optimizations enabled on x86 platforms. It employs learning-based techniques to infer DNN operators, dynamic analysis to reveal network architectures, and symbolic execution to facilitate inferring dimensions and parameters of DNN operators. Our evaluation reveals that BTD enables accurate recovery of full specifications of complex DNNs with millions of parameters (e.g., ResNet). The recovered DNN specifications can be re-compiled into a new DNN executable exhibiting identical behavior to the input executable. We show that BTD can boost two representative attacks, adversarial example generation and knowledge stealing, against DNN executables. We also demonstrate cross-architecture legacy code reuse using BTD, and envision BTD being used for other critical downstream tasks like DNN security hardening and patching.Comment: The extended version of a paper to appear in the Proceedings of the 32nd USENIX Security Symposium, 2023, (USENIX Security '23), 25 page

    Enhancing Usability of Malware Analysis Pipelines With Reverse Engineering

    Get PDF
    Lots of work has been done on analyzing software distributed in binary form. This is a challenging problem because of the relatively unstructured nature of binaries. To recover high-level structure, various attempts have included static and dynamic analysis. However, human inspection is often required, as high-level structure is compiled away. Recent success in this area includes work on variable-name recovery, vulnerability discovery, class recovery for object-oriented languages. We are interested in building a pipeline for user to analyze malware. In this thesis we tackle two problems central to malware analysis pipelines. The first is D3RE, an interactive querying tool that allows users to analyze binaries interactively by writing declarative rules and visualizing their results projected onto a binary. The second is Assmeblage, a tool which automatically scrapes GitHub for C and C++ repositories and builds these repositories automatically using different compilation settings to produce a variety of configurations. These two tools will enable users to get enough data to do analysis as well for them to do interactive analysis. Finally, we present future work demonstrating a possible visualization combining d3re and Ghidra along with some specific questions for future user studies

    Software similarity and classification

    Full text link
    This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities
    • …
    corecore