1,037 research outputs found

    Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis

    Get PDF
    International audienceWe present a new approach that bridges binary analysis techniques with machine learning classification for the purpose of providing a static and generic evaluation technique for opaque predicates, regardless of their constructions. We use this technique as a static automated deobfuscation tool to remove the opaque predicates introduced by obfuscation mechanisms. According to our experimental results, our models have up to 98% accuracy at detecting and deob-fuscating state-of-the-art opaque predicates patterns. By contrast, the leading edge deobfuscation methods based on symbolic execution show less accuracy mostly due to the SMT solvers constraints and the lack of scalability of dynamic symbolic analyses. Our approach underlines the efficiency of hybrid symbolic analysis and machine learning techniques for a static and generic deobfuscation methodology

    Simplification of General Mixed Boolean-Arithmetic Expressions: GAMBA

    Full text link
    Malware code often resorts to various self-protection techniques to complicate analysis. One such technique is applying Mixed-Boolean Arithmetic (MBA) expressions as a way to create opaque predicates and diversify and obfuscate the data flow. In this work we aim to provide tools for the simplification of nonlinear MBA expressions in a very practical context to compete in the arms race between the generation of hard, diverse MBAs and their analysis. The proposed algorithm GAMBA employs algebraic rewriting at its core and extends SiMBA. It achieves efficient deobfuscation of MBA expressions from the most widely tested public datasets and simplifies expressions to their ground truths in most cases, surpassing peer tools

    Code deobfuscation by program synthesis-aided simplification of mixed boolean-arithmetic expressions

    Get PDF
    Treballs Finals de Grau de MatemĂ tiques, Facultat de MatemĂ tiques, Universitat de Barcelona, Any: 2020, Director: RaĂșl Roca CĂĄnovas, Antoni Benseny i Mario Reyes de los Mozos[en] This project studies the theoretical background of Mixed Boolean-Arithmetic (MBA) expressions as well as its practical applicability within the field of code obfuscation, which is a technique used both by malware threats and software protection in order to complicate the process of reverse engineering (parts of) a program. An MBA expression is composed of integer arithmetic operators, e.g. (+,−,∗)(+,-, *) and bitwise operators, e.g. (∧,√,⊕,ÂŹ).(\wedge, \vee, \oplus, \neg). MBA expressions can be leveraged to obfuscate the data-flow of code by iteratively applying rewrite rules and function identities that complicate (obfuscate) the initial expression while preserving its semantic behavior. This possibility is motivated by the fact that the combination of operators from these different fields do not interact well together: we have no rules (distributivity, factorization...) or general theory to deal with this mixing of operators. Current deobfuscation techniques to address simplification of this type of data-flow obfuscation are limited by being strongly tied to syntactic complexity. We explore novel program synthesis approaches for addressing simplification of MBA expressions by reasoning on the semantics of the obfuscated expressions instead of syntax, discussing their applicability as well as their limits. We present our own tool rr 2syntia that integrates Syntia, an open source program synthesis tool, into the reverse engineering framework radare 2 in order to retrieve the semantics of obfuscated code from its Input/Output behavior. Finally, we provide some improvement ideas and potential areas for future work to be done

    Bypassing Malware Obfuscation with Dynamic Synthesis

    Get PDF
    International audienceBlack-box synthesis is more efficient than SMT deobfuscation on predicates obfuscated with Mixed-Boolean Arithmetics

    Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning

    Get PDF
    International audienceThe ability to efficiently detect the software protections used is at a prime to facilitate the selection and application of adequate deob-fuscation techniques. We present a novel approach that combines semantic reasoning techniques with ensemble learning classification for the purpose of providing a static detection framework for obfuscation transformations. By contrast to existing work, we provide a methodology that can detect multiple layers of obfuscation, without depending on knowledge of the underlying functionality of the training-set used. We also extend our work to detect constructions of obfuscation transformations, thus providing a fine-grained methodology. To that end, we provide several studies for the best practices of the use of machine learning techniques for a scalable and efficient model. According to our experimental results and evaluations on obfuscators such as Tigress and OLLVM, our models have up to 91% accuracy on state-of-the-art obfuscation transformations. Our overall accuracies for their constructions are up to 100%
    • 

    corecore