180 research outputs found

    Formal Verification of Control-flow Graph Flattening

    Get PDF
    International audienceCode obfuscation is emerging as a key asset in security by obscurity.It aims at hiding sensitive information in programs so that theybecome more difficult to understand and reverse engineer. Since theresults on the impossibility of perfect and universal obfuscation,many obfuscation techniques have been proposed in the literature,ranging from simple variable encoding to hiding the control flow ofa program.In this paper, we formally verify in Coq an advanced code obfuscationcalled control-flow graph flattening, that is used in stateof-the-artprogram obfuscators. Our control-flow graph flatteningis a program transformation operating over C programs, that is integratedinto the CompCert formally verified compiler. The semanticspreservation proof of our program obfuscator relies on a simulationproof performed on a realistic language, the Clight languageof CompCert. The automatic extraction of our program obfuscatorinto OCaml yields a program with competitive results

    Formal framework for reasoning about the precision of dynamic analysis

    Get PDF
    Dynamic program analysis is extremely successful both in code debugging and in malicious code attacks. Fuzzing, concolic, and monkey testing are instances of the more general problem of analysing programs by dynamically executing their code with selected inputs. While static program analysis has a beautiful and well established theoretical foundation in abstract interpretation, dynamic analysis still lacks such a foundation. In this paper, we introduce a formal model for understanding the notion of precision in dynamic program analysis. It is known that in sound-by-construction static program analysis the precision amounts to completeness. In dynamic analysis, which is inherently unsound, precision boils down to a notion of coverage of execution traces with respect to what the observer (attacker or debugger) can effectively observe about the computation. We introduce a topological characterisation of the notion of coverage relatively to a given (fixed) observation for dynamic program analysis and we show how this coverage can be changed by semantic preserving code transformations. Once again, as well as in the case of static program analysis and abstract interpretation, also for dynamic analysis we can morph the precision of the analysis by transforming the code. In this context, we validate our model on well established code obfuscation and watermarking techniques. We confirm the efficiency of existing methods for preventing control-flow-graph extraction and data exploit by dynamic analysis, including a validation of the potency of fully homomorphic data encodings in code obfuscation

    Code obfuscation against abstraction refinement attacks

    Get PDF
    Code protection technologies require anti reverse engineering transformations to obfuscate programs in such a way that tools and methods for program analysis become ineffective. We introduce the concept of model deformation inducing an effective code obfuscation against attacks performed by abstract model checking. This means complicating the model in such a way a high number of spurious traces are generated in any formal verification of the property to disclose about the system under attack.We transform the program model in order to make the removal of spurious counterexamples by abstraction refinement maximally inefficient. Because our approach is intended to defeat the fundamental abstraction refinement strategy, we are independent from the specific attack carried out by abstract model checking. A measure of the quality of the obfuscation obtained by model deformation is given together with a corresponding best obfuscation strategy for abstract model checking based on partition refinement

    Program Similarity Analysis for Malware Classification and its Pitfalls

    Get PDF
    Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process

    Evaluation Methodologies in Software Protection Research

    Full text link
    Man-at-the-end (MATE) attackers have full control over the system on which the attacked software runs, and try to break the confidentiality or integrity of assets embedded in the software. Both companies and malware authors want to prevent such attacks. This has driven an arms race between attackers and defenders, resulting in a plethora of different protection and analysis methods. However, it remains difficult to measure the strength of protections because MATE attackers can reach their goals in many different ways and a universally accepted evaluation methodology does not exist. This survey systematically reviews the evaluation methodologies of papers on obfuscation, a major class of protections against MATE attacks. For 572 papers, we collected 113 aspects of their evaluation methodologies, ranging from sample set types and sizes, over sample treatment, to performed measurements. We provide detailed insights into how the academic state of the art evaluates both the protections and analyses thereon. In summary, there is a clear need for better evaluation methodologies. We identify nine challenges for software protection evaluations, which represent threats to the validity, reproducibility, and interpretation of research results in the context of MATE attacks

    Maximal incompleteness as obfuscation potency

    Get PDF
    Obfuscation is the art of making code hard to reverse engineer and understand. In this paper, we propose aformal model for specifying and understanding the strength of obfuscating transformations with respect toa given attack model. The idea is to consider the attacker as an abstract interpreter willing to extractinformation about the program\u2019s semantics. In this scenario, we show that obfuscating code is making theanalysis imprecise, namely making the corresponding abstract domain incomplete. It is known thatcompleteness is a property of the abstract domain and the program to analyse. We introduce a frameworkfor transforming abstract domains, i.e., analyses, towards incompleteness. The family of incompleteabstractions for a given program provides a characterisation of the potency of obfuscation employed in thatprogram, i.e., its strength against the attack specified by those abstractions. We show this characterisationfor known obfuscating transformations used to inhibit program slicing and automated disassembly

    Characterizing A Property-Driven Obfuscation Strategy

    Get PDF
    n recent years, code obfuscation has attracted both researchers and software developers as a useful technique for protecting secret properties of proprietary programs. The idea of code obfuscation is to modify a program, while preserving its functionality, in order to make it more difficult to analyze. Thus, the aim of code obfuscation is to conceal certain properties to an attacker, while revealing its intended behavior. However, a general methodology for deriving an obfuscating transforma- tion from the properties to conceal and reveal is still missing. In this work, we start to address this problem by studying the existence and the characterization of function transformers that minimally or maximally modify a program in order to reveal or conceal a certain property. Based on this general formal framework, we are able to provide a characterization of the maximal obfuscating strategy for transformations concealing a given property while revealing the desired observational behavior. To conclude, we discuss the applicability of the proposed characterization by showing how some common obfuscation techniques can be interpreted in this framework. Moreover, we show how this approach allows us to deeply understand what are the behavioral properties that these transformations conceal, and therefore protect, and which are the ones that they reveal, and therefore disclose

    JAVA DESIGN PATTERN OBFUSCATION

    Get PDF
    Software Reverse Engineering (SRE) consists of analyzing the design and imple- mentation of software. Typically, we assume that the executable file is available, but not the source code. SRE has many legitimate uses, including analysis of software when no source code is available, porting old software to a modern programming language, and analyzing code for security vulnerabilities. Attackers also use SRE to probe for weaknesses in closed-source software, to hack software activation mecha- nisms (or otherwise change the intended function of software), to cheat at games, etc. There are many tools available to aid the aspiring reverse engineer. For example, there are several tools that recover design patterns from Java byte code or source code. In this project, we develop and analyze a technique to obfuscate design patterns. We show that our technique can defeat design pattern detection tools, thereby making reverse engineering attacks more difficult

    Semantics-based software watermarking by abstract interpretation

    Get PDF
    Software watermarking is a software protection technique used to defend the intellectual property of proprietary code. In particular, software watermarking aims at preventing software piracy by embedding a signature, i.e. an identier reliably representing the owner, in the code. When an illegal copy is made, the owner can claim his/her identity by extracting the signature. It is important to hide the signature in the program in order to make it dicult for the attacker to detect, tamper or remove it. In this work we present a formal framework for software watermarking, based on program semantics and abstract interpretation, where attackers are modeled as abstract interpreters. In this setting we can prove that the ability to identify signatures can be modeled as a completeness property of the attackers in the abstract interpretation framework. Indeed, hiding a signature in the code corresponds to embed it as a semantic property that can be retrieved only by attackers that are complete for it. Any abstract interpreter that is not complete for the property specifying the signature cannot detect, tamper or remove it. We formalize in the proposed framework the major quality features of a software watermarking technique: secrecy, resilience, transparence and accuracy. This provides an unifying framework for interpreting both watermarking schemes and attacks, and it allows us to formally compare the quality of dierent watermarking techniques. Indeed, a large number of watermarking techniques exist in the literature and they are typically evaluated with respect to their secrecy, resilience, transparence and accuracy to attacks. Formally identifying the attacks for which a watermarking scheme is secret, resilient, transparent or accurate can be a complex and error-prone task, since attacks and watermarking schemes are typically dened in dierent settings and using dierent languages (e.g. program transformation vs. program analysis), complicating the task of comparing one against the others
    • …
    corecore