2 research outputs found

    A jump-target identification method for multi-architecture static binary translation

    No full text
    Static binary translation is a technique that allows an executable program for a given architecture to be translated into a different one, with a reduced overhead compared to emulators and dynamic binary translators. The main downside of the static approach lies in the absence of runtime information, which is available in other solutions. In particular, one of the key issues consists in the identification of data and code in the program, and, more specifically, in the detection of basic block start addresses (jump targets). The presence of indirect jump instructions whose target is not immediately evident, in particular due to C switch statements, makes the recovery of jump targets a challenging task. In this paper, we present an effective technique for jump targets identification composed by an initial step of global data harvesting followed by two novel analyses: the Simple Expression Tracker and the Offset Shifted Range Analysis. Both analyses work on a Single Statement Assignment (SSA) intermediate representation and are iterated multiple times until they provide no additional information. In particular, OSRA is a data-ow analysis modeled after the typical code generated for switch statements. It tracks each SSA value in terms of an offset, a scaling factor, and another SSA value, comprised between a lower and an upper bound (e.g., b = 10 + 4 · x, with 8 ≤ x ≤ 10). To validate the effectiveness of the proposed technique, we employ revamb, an in-house tool for binary translation leveraging QEMU and the LLVM compiler framework. Our experimental results show that we are able to run the coreutils test suite on ARM, MIPS and x86-64 without significant failures due to unidentified jump targets

    A Human-Centric Approach For Binary Code Decompilation

    Get PDF
    Many security techniques have been developed both in academia and industry to analyze source code, including methods to discover bugs, apply taint tracking, or find vulnerabilities. These source-based techniques leverage the wealth of high-level abstractions available in the source code to achieve good precision and efficiency. Unfortunately, these methods cannot be applied directly on binary code which lacks such abstractions. In security, there are many scenarios where analysts only have access to the compiled version of a program. When compiled, all high-level abstractions, such as variables, types, and functions, are removed from the final version of the program that security analysts have access to. This dissertation investigates novel methods to recover abstractions from binary code. First, a novel pattern-independent control flow structuring algorithm is presented to recover high-level control-flow abstractions from binary code. Unlike existing structural analysis algorithms which produce unstructured code with many goto statements, our algorithm produces fully-structured goto-free decompiled code. We implemented this algorithm in a decompiler called DREAM. Second, we develop three categories of code optimizations in order to simplify the decompiled code and increase readability. These categories are expression simplification, control-flow simplification and semantics-aware naming. We have implemented our usability extensions on top of DREAM and call this extended version DREAM++. We conducted the first user study to evaluate the quality of decompilers for malware analysis. We have chosen malware since it represents one of the most challenging cases for binary code analysis. The study included six reverse engineering tasks of real malware samples that we obtained from independent malware experts. We evaluated three decompilers: the leading industry decompiler Hex-Rays and both versions of our decompiler DREAM and DREAM++. The results of our study show that our improved decompiler DREAM++ produced significantly more understandable code that outperforms both Hex-Rays and DREAM. Using DREAM++participants solved 3 times more tasks than when using Hex-Rays and 2 times more tasks than when using DREAM. Moreover, participants rated DREAM++ significantly higher than the competition
    corecore