14 research outputs found

    Generic Reverse Compilation to Recognize Specific Behavior

    Get PDF
    Práce je zaměřena na rozpoznávání specifického chování pomocí generického zpětného překladu. Generický zpětný překlad je proces, který transformuje spustitelné soubory z různých architektur a formátů objektových souborů na stejný jazyk na vysoké úrovni. Tento proces se vztahuje k nástroji Lissom Decompiler. Pro účely rozpoznání chování práce zavádí Language for Decompilation -- LfD. LfD představuje jednoduchý imperativní jazyk, který je vhodný pro srovnávaní. Konkrétní chování je dáno známým spustitelným souborem (např. malware) a rozpoznání se provádí jako najítí poměru podobnosti s jiným neznámým spustitelným souborem. Tento poměr podobnosti je vypočítán nástrojem LfDComparator, který zpracovává dva vstupy v LfD a rozhoduje o jejich podobnosti.Thesis is aimed on recognition of specific behavior by generic reverse compilation. The generic reverse compilation is a process that transforms executables from different architectures and object file formats to same high level language. This process is covered by a tool Lissom Decompiler. For purpose of behavior recognition the thesis introduces Language for Decompilation -- LfD. LfD represents a simple imperative language, which is suitable for a comparison. The specific behavior is given by the known executable (e.g. malware) and the recognition is performed as finding the ratio of similarity with other unknown executable. This ratio of similarity is calculated by a tool LfDComparator, which processes two sources in LfD to decide their similarity.

    Decompiler Front-End Optimizations

    Get PDF
    Zpětný překladač je nástroj reverzního inženýrství umožňující rekonstrukci strojového kódu na některý z vyšších programovacích jazyků. Tato práce se zaobírá popisem tohoto nástroje, přičemž se soustředí hlavně na zpětný překladač projektu Lissom. Je tu navrhnutých několik technik pro optimalizaci překladu jako statická interpretace LLVM IR kódu a paměť pro výsledky interpretace. Další optimalizace se týkají rozšíření funkcionality přední části překladače, podporu delay slotů a detekci rozložení paměti a endianity. Implementované techniky jsou nakonec demonstrované na generovaném kódu.Decompiler is a reverse engineering tool for translation of binary codes into one of the higher level languages. This bachelor thesis describes such a tool paying special attention on decompiler of the Lissom project. There are proposed several techniques for translation optimalization like static LLVM IR code interpretation and memory for its results. Other optimalizations are conserning addition of platform dependent features like delay slot support or memory datalayout and endianness detection. Finally implemented techniques are demostrated on several examples.

    C Back-End for a Decompiler

    Get PDF
    Práce popisuje implementaci zadní části zpětného překladače produkujícího kód v jazyce C. Obsahuje základní informace o principech a využití reverzního inženýrství v oblasti informačních technologií i mimo něj. Hlavním cílem je vytvořit zadní část zpětného překladače, která bude generovat kód ekvivalentní vůči vstupu, který bude opět přeložitelný do binární formy se zachováním stejné funkčnosti jako zdrojový binární kód. Výstupem je implementace tříd v jazyce C++, vykonávající popisovanou činnost jako součást obecného dekompilátoru, který je vyvíjený v rámci projektu Lissom.This thesis deals with the implementation of the back-end of the decompiler, which produces a code in C language. It contains basic information about the principals and using of the reverse engineering either in the area of information technology or apart from it. The main goal is to create the back-end of the decompiler which would generate a code that would be equivalent against the input and will be translatable into a binary code. Functionality of the output code will be conserved in state of the functionality of the source code. The output is the implementation of the classes in C++ language. It does described activity as a part of the general decompiler which is developed in terms of the project Lissom.

    THE SCALABLE AND ACCOUNTABLE BINARY CODE SEARCH AND ITS APPLICATIONS

    Get PDF
    The past decade has been witnessing an explosion of various applications and devices. This big-data era challenges the existing security technologies: new analysis techniques should be scalable to handle “big data” scale codebase; They should be become smart and proactive by using the data to understand what the vulnerable points are and where they locate; effective protection will be provided for dissemination and analysis of the data involving sensitive information on an unprecedented scale. In this dissertation, I argue that the code search techniques can boost existing security analysis techniques (vulnerability identification and memory analysis) in terms of scalability and accuracy. In order to demonstrate its benefits, I address two issues of code search by using the code analysis: scalability and accountability. I further demonstrate the benefit of code search by applying it for the scalable vulnerability identification [57] and the cross-version memory analysis problems [55, 56]. Firstly, I address the scalability problem of code search by learning “higher-level” semantic features from code [57]. Instead of conducting fine-grained testing on a single device or program, it becomes much more crucial to achieve the quick vulnerability scanning in devices or programs at a “big data” scale. However, discovering vulnerabilities in “big code” is like finding a needle in the haystack, even when dealing with known vulnerabilities. This new challenge demands a scalable code search approach. To this end, I leverage successful techniques from the image search in computer vision community and propose a novel code encoding method for scalable vulnerability search in binary code. The evaluation results show that this approach can achieve comparable or even better accuracy and efficiency than the baseline techniques. Secondly, I tackle the accountability issues left in the vulnerability searching problem by designing vulnerability-oriented raw features [58]. The similar code does not always represent the similar vulnerability, so it requires that the feature engineering for the code search should focus on semantic level features rather than syntactic ones. I propose to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search. A conditional formula explicitly captures two cardinal factors of a vulnerability: 1) erroneous data dependencies and 2) missing or invalid condition checks. As a result, the binary code search on conditional formulas produces significantly higher accuracy and provides meaningful evidence for human analysts to further examine the search results. The evaluation results show that this approach can further improve the search accuracy of existing bug search techniques with very reasonable performance overhead. Finally, I demonstrate the potential of the code search technique in the memory analysis field, and apply it to address their across-version issue in the memory forensic problem [55, 56]. The memory analysis techniques for COTS software usually rely on the so-called “data structure profiles” for their binaries. Construction of such profiles requires the expert knowledge about the internal working of a specified software version. However, it is still a cumbersome manual effort most of time. I propose to leverage the code search technique to enable a notion named “cross-version memory analysis”, which can update a profile for new versions of a software by transferring the knowledge from the model that has already been trained on its old version. The evaluation results show that the code search based approach advances the existing memory analysis methods by reducing the manual efforts while maintaining the reasonable accuracy. With the help of collaborators, I further developed two plugins to the Volatility memory forensic framework [2], and show that each of the two plugins can construct a localized profile to perform specified memory forensic tasks on the same memory dump, without the need of manual effort in creating the corresponding profile

    Malware Visualization and Similarity via Tracking Binary Execution Path

    Get PDF
    Today, computer systems are widely and importantly used throughout society, and malicious codes to take over the system and perform malicious actions are continuously being created and developed. These malicious codes are sometimes found in new forms, but in many cases they are modified from existing malicious codes. Since there are too many threatening malicious codes that are being continuously generated for human analysis, various studies to efficiently detect, classify, and analyze are essential. There are two main ways to analyze malicious code. First, static analysis is a technique to identify malicious behaviors by analyzing the structure of malicious codes or specific binary patterns at the code level. The second is a dynamic analysis technique that uses virtualization tools to build an environment in a virtual machine and executes malicious code to analyze malicious behavior. The method used to analyze malicious codes in this paper is a static analysis technique. Although there is a lot of information that can be obtained from dynamic analysis, there is a disadvantage that it can be analyzed normally only when the environment in which each malicious code is executed is matched. However, since the method proposed in this paper tracks and analyzes the execution stream of the code, static analysis is performed, but the effect of dynamic analysis can be expected.The core idea of this paper is to express the malicious code as a 25 25 pixel image using 25 API categories selected. The interaction and frequency of the API is made into a 25 25 pixel image based on a matrix using RGB values. When analyzing the malicious code, the Euclidean distance algorithm is applied to the generated image to measure the color similarity, and the similarity of the mutual malicious behavior is calculated based on the final Euclidean distance value. As a result, as a result of comparing the similarity calculated by the proposed method with the similarity calculated by the existing similarity calculation method, the similarity was calculated to be 5-10% higher on average. The method proposed in this study spends a lot of time deriving results because it analyzes, visualizes, and calculates the similarity of the visualized sample. Therefore, it takes a lot of time to analyze a huge number of malicious codes. A large amount of malware can be analyzed through follow-up studies, and improvements are needed to study the accuracy according to the size of the data set

    Reconstruction of Data Types for Decompilation

    Get PDF
    Práce se zabývá popisem metod rekonstrukce datových typů při zpětném překladu. Je definován pojem zpětného inženýrství a představen zpětný překladač vyvíjen v rámci projektu Lissom, pro potřeby kterého tato práce vznikla. Jsou představeny stávající metody rekonstrukce jednoduchých i složených datových typů a podrobně vysvětleny přístupy založené na analýze toku dat a analýze ofsetů paměťových operací. Jádrem práce je návrh nové techniky rekonstrukce jednoduchých a složených datových typů, vhodné pro nasazení v prostředí rekonfigurovatelného zpětného překladače projektu Lissom. Jsou vysvětleny základní principy nového návrhu, jeho implementace a souvisejících změn ve vyvíjeném zpětném překladači a jeho medzikódě. Výsledné řešení je podrobeno řadě testů. V závěru jsou diskutovány dosažené výsledky, nedostatky a směr další práce.This document describes methods for a reconstruction of data types in the decompilation problem. It defines the concept of reverse engineering and introduces decompiler developed by the Lissom project. It presents existing methods of reconstruction of the simple and complex data types, and explains in detail approaches based on data-flow analysis and analysis of the memory operation offsets. The core of this thesis is the design of a new technique of reconstructing simple and complex data types, suitable for deployment in a retargetable decompiler environment of the Lissom project. Basic principles of the new technique, its implementation and related changes in decompiler and intermediate language are described. The solution is tested and the conclusion discusses the achievements, shortcomings and direction of the further work.
    corecore