43 research outputs found

    Devil is Virtual: Reversing Virtual Inheritance in C++ Binaries

    Full text link
    Complexities that arise from implementation of object-oriented concepts in C++ such as virtual dispatch and dynamic type casting have attracted the attention of attackers and defenders alike. Binary-level defenses are dependent on full and precise recovery of class inheritance tree of a given program. While current solutions focus on recovering single and multiple inheritances from the binary, they are oblivious to virtual inheritance. Conventional wisdom among binary-level defenses is that virtual inheritance is uncommon and/or support for single and multiple inheritances provides implicit support for virtual inheritance. In this paper, we show neither to be true. Specifically, (1) we present an efficient technique to detect virtual inheritance in C++ binaries and show through a study that virtual inheritance can be found in non-negligible number (more than 10\% on Linux and 12.5\% on Windows) of real-world C++ programs including Mysql and libstdc++. (2) we show that failure to handle virtual inheritance introduces both false positives and false negatives in the hierarchy tree. These false positves and negatives either introduce attack surface when the hierarchy recovered is used to enforce CFI policies, or make the hierarchy difficult to understand when it is needed for program understanding (e.g., during decompilation). (3) We present a solution to recover virtual inheritance from COTS binaries. We recover a maximum of 95\% and 95.5\% (GCC -O0) and a minimum of 77.5\% and 73.8\% (Clang -O2) of virtual and intermediate bases respectively in the virtual inheritance tree.Comment: Accepted at CCS20. This is a technical report versio

    A Human-Centric Approach For Binary Code Decompilation

    Get PDF
    Many security techniques have been developed both in academia and industry to analyze source code, including methods to discover bugs, apply taint tracking, or find vulnerabilities. These source-based techniques leverage the wealth of high-level abstractions available in the source code to achieve good precision and efficiency. Unfortunately, these methods cannot be applied directly on binary code which lacks such abstractions. In security, there are many scenarios where analysts only have access to the compiled version of a program. When compiled, all high-level abstractions, such as variables, types, and functions, are removed from the final version of the program that security analysts have access to. This dissertation investigates novel methods to recover abstractions from binary code. First, a novel pattern-independent control flow structuring algorithm is presented to recover high-level control-flow abstractions from binary code. Unlike existing structural analysis algorithms which produce unstructured code with many goto statements, our algorithm produces fully-structured goto-free decompiled code. We implemented this algorithm in a decompiler called DREAM. Second, we develop three categories of code optimizations in order to simplify the decompiled code and increase readability. These categories are expression simplification, control-flow simplification and semantics-aware naming. We have implemented our usability extensions on top of DREAM and call this extended version DREAM++. We conducted the first user study to evaluate the quality of decompilers for malware analysis. We have chosen malware since it represents one of the most challenging cases for binary code analysis. The study included six reverse engineering tasks of real malware samples that we obtained from independent malware experts. We evaluated three decompilers: the leading industry decompiler Hex-Rays and both versions of our decompiler DREAM and DREAM++. The results of our study show that our improved decompiler DREAM++ produced significantly more understandable code that outperforms both Hex-Rays and DREAM. Using DREAM++participants solved 3 times more tasks than when using Hex-Rays and 2 times more tasks than when using DREAM. Moreover, participants rated DREAM++ significantly higher than the competition

    Generic Reverse Compilation to Recognize Specific Behavior

    Get PDF
    Práce je zaměřena na rozpoznávání specifického chování pomocí generického zpětného překladu. Generický zpětný překlad je proces, který transformuje spustitelné soubory z různých architektur a formátů objektových souborů na stejný jazyk na vysoké úrovni. Tento proces se vztahuje k nástroji Lissom Decompiler. Pro účely rozpoznání chování práce zavádí Language for Decompilation -- LfD. LfD představuje jednoduchý imperativní jazyk, který je vhodný pro srovnávaní. Konkrétní chování je dáno známým spustitelným souborem (např. malware) a rozpoznání se provádí jako najítí poměru podobnosti s jiným neznámým spustitelným souborem. Tento poměr podobnosti je vypočítán nástrojem LfDComparator, který zpracovává dva vstupy v LfD a rozhoduje o jejich podobnosti.Thesis is aimed on recognition of specific behavior by generic reverse compilation. The generic reverse compilation is a process that transforms executables from different architectures and object file formats to same high level language. This process is covered by a tool Lissom Decompiler. For purpose of behavior recognition the thesis introduces Language for Decompilation -- LfD. LfD represents a simple imperative language, which is suitable for a comparison. The specific behavior is given by the known executable (e.g. malware) and the recognition is performed as finding the ratio of similarity with other unknown executable. This ratio of similarity is calculated by a tool LfDComparator, which processes two sources in LfD to decide their similarity.

    Decompilation of Selected C++ Constructions

    Get PDF
    Tato práce se zabývá rekonstrukcí hierarchie tříd a jejich virtuálních metod z programů vytvořených jazykem C++. Cílem práce je rozšířit zpětný překladač, který je vyvíjen v rámci projektu Lissom o analýzu těchto konstrukcí pro různé překladače. Rekonstrukce jsou realizovány detekcí Run- Time Type Information (zkratka RTTI ) a virtuálních tabulek. V úvodní části práce je popsán vědní obor reverzní inženýrství a projekt Lissom s jeho zpětným překladačem. Poté následuje popis jazyka C++, jeho struktur s možnostmi jejich dekompilace. Dále následuje část věnující se návrhu, implementaci a testování rozpoznání RTTI a virtuálních tabulek.This bachelor's thesis deals with the reconstruction of a hierarchy of classes and their virtual methods from programmes created by C++ language . The aim of this work is to extend a decompiler , which has been developed as a part of the Lissom project, by an analysis of those reconstructions for various decompilers . The reconstructions are created through detection of RTTI and virtual tables . The first part of this thesis involves a description of reverse engineering as well as of the Lissom project in terms of the decompiler . The following section of the paper explains the basics of C++ language , its structures and different possibilities of their decompilation . The final part of the paper deals with a design, implementation and testing of a recognition of RTTI and virtual tables .

    On Matching Binary to Source Code

    Get PDF
    Reverse engineering of executable binary programs has diverse applications in computer security and forensics, and often involves identifying parts of code that are reused from third party software projects. Identification of code clones by comparing and fingerprinting low-level binaries has been explored in various pieces of work as an effective approach for accelerating the reverse engineering process. Binary clone detection across different environments and computing platforms bears significant challenges, and reasoning about sequences of low-level machine in- structions is a tedious and time consuming process. Because of these reasons, the ability of matching reused functions to their source code is highly advantageous, de- spite being rarely explored to date. In this thesis, we systematically assess the feasibility of automatic binary to source matching to aid the reverse engineering process. We highlight the challenges, elab- orate on the shortcomings of existing proposals, and design a new approach that is targeted at addressing the challenges while delivering more extensive and detailed results in a fully automated fashion. By evaluating our approach, we show that it is generally capable of uniquely matching over 50% of reused functions in a binary to their source code in a source database with over 500,000 functions, while narrowing down over 75% of reused functions to at most five candidates in most cases. Finally, we investigate and discuss the limitations and provide directions for future work

    Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

    Get PDF
    Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide’s state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. We abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75% performance improvement, four kernels from IrfanView, leading to 4.97× performance, and one stencil from the miniGMG multigrid benchmark netting a 4.25× improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop’s filters with our lifted implementations, giving 1.12× speedup without affecting the user experience.United States. Dept. of Energy (Award DE-SC0005288)United States. Dept. of Energy (Award DE-SC0008923)United States. Defense Advanced Research Projects Agency (Agreement FA8759-14-2-0009)MIT Energy Initiative (Fellowship

    Reconstruction of Data Types for Decompilation

    Get PDF
    Práce se zabývá popisem metod rekonstrukce datových typů při zpětném překladu. Je definován pojem zpětného inženýrství a představen zpětný překladač vyvíjen v rámci projektu Lissom, pro potřeby kterého tato práce vznikla. Jsou představeny stávající metody rekonstrukce jednoduchých i složených datových typů a podrobně vysvětleny přístupy založené na analýze toku dat a analýze ofsetů paměťových operací. Jádrem práce je návrh nové techniky rekonstrukce jednoduchých a složených datových typů, vhodné pro nasazení v prostředí rekonfigurovatelného zpětného překladače projektu Lissom. Jsou vysvětleny základní principy nového návrhu, jeho implementace a souvisejících změn ve vyvíjeném zpětném překladači a jeho medzikódě. Výsledné řešení je podrobeno řadě testů. V závěru jsou diskutovány dosažené výsledky, nedostatky a směr další práce.This document describes methods for a reconstruction of data types in the decompilation problem. It defines the concept of reverse engineering and introduces decompiler developed by the Lissom project. It presents existing methods of reconstruction of the simple and complex data types, and explains in detail approaches based on data-flow analysis and analysis of the memory operation offsets. The core of this thesis is the design of a new technique of reconstructing simple and complex data types, suitable for deployment in a retargetable decompiler environment of the Lissom project. Basic principles of the new technique, its implementation and related changes in decompiler and intermediate language are described. The solution is tested and the conclusion discusses the achievements, shortcomings and direction of the further work.

    Knowledge and innovation in intellectual property : the case of computer program copyright

    No full text
    Information economics 1s used to develop a model of technological innovation which is applied to the case of computer program copyright. A critical outline of the neo-classical economic perspective of innovation and Arrow's concerns regarding appropriability of information is provided. This perspective justifies intellectual property institutions as a correction of market failure and as a "reward for invention". The same literature marginalises countervailing arguments including monopoly distortions, alternative sources of innovator reward and the potential for anti-competitive strategies. Information economics provides a distinct and preferred perspective in the analysis of technological development and in the role of intellectual property in the promotion of innovat~on. The conception of information as a resource, rather than as a commodity, implies that information is part of a shared technological capital, whose indivisibilities should be exploited for social benefit. The information perspective conceives innovation as a messy, evolutionary and interactive process involving many participants, and a cycle of innovation characterised by incremental improvements, imitation and learning strategies, and technological trajectories influenced by bounded rationality. These environments will also generate powerful network externalities. A model of innovation based on these assumptions is developed which incorporates two major distinctions. One is between tacit and codified knowledge; the other is between technology and technological artefacts. This knowledge-artefact distinction is defined in the innovation model by the concept of an information technology artefact, characterised as a physical product whose underlying means of creation is not communicated by mere possession of that product. This innovation model is reconciled to the intellectual property regimes of confidential information, patent and copyright, demonstrating the use of legal doctrines to encourage the diffusion of tacit knowledge through society. Applying the innovation model to the question of computer programs, it is argued that programs in their executable of machine code forms correspond to the concept of an IT artefact, in that possession of machine code does not imply access to the underlying source code. The process of software development and the utility of decompilation are discussed in this context, particularly the lack of isomorphic correspondence between machine code and third or higher generation source code languages. The close analogy between the software development model and the scenario of confidential information suggests a limited role for copyright of computer programs beyond a prohibition of literal copying or piracy. Arguments favouring broader protection of non-literal elements of computer programs are critically reviewed and prescriptions for proprietary protocols, user interfaces and standards in the literature are rejected as inconsistent with the realisation of network externalities by the software industry. An information economics perspective instead recommends the encouragement of reverse engineering and imitative competition provided that developers implement their own source code solutions to invest in the diffusion of tacit programming knowledge. Decompilation should be permitted to provide a limited degree of access to internal interfaces and communications protocols. Elements of a user interface should not be protected. Copyright regimes in the United States, Europe and Australia are assessed against the policy prescriptions generated by the application of the innovation model to computer programs. The influence of political actors and international pressures such as TRIPS are noted. It is hoped that the infusion of an information economics approach might trigger the switch in perspective needed in policy debates to preserve the integrity of the intellectual commons
    corecore