85 research outputs found
SEEAD:A Semantic-based Approach for Automatic Binary Code De-obfuscation
Increasingly sophisticated code obfuscation techniques are quickly adopted by malware developers to escape from malware detection and to thwart the reverse engineering effort of security analysts. State-of-the-art de-obfuscation approaches rely on dynamic analysis, but face the challenge of low code coverage as not all software execution paths and behavior will be exposed at specific profiling runs. As a result, these approaches often fail to discover hidden malicious patterns. This paper introduces SEEAD, a novel and generic semantic-based de-obfuscation system. When building SEEAD, we try to rely on as few assumptions about the structure of the obfuscation tool as possible, so that the system can keep pace with the fast evolving code obfuscation techniques. To increase the code coverage, SEEAD dynamically directs the target program to execute different paths across different runs. This dynamic profiling scheme is rife with taint and control dependence analysis to reduce the search overhead, and a carefully designed protection scheme to bring the program to an error free status should any error happens during dynamic profile runs. As a result, the increased code coverage enables us to uncover hidden malicious behaviors that are not detected by traditional dynamic analysis based de-obfuscation approaches. We evaluate SEEAD on a range of benign and malicious obfuscated programs. Our experimental results show that SEEAD is able to successfully recover the original logic from obfuscated binaries
Binary Disassembly Block Coverage by Symbolic Execution vs. Recursive Descent
This research determines how appropriate symbolic execution is (given its current implementation) for binary analysis by measuring how much of an executable symbolic execution allows an analyst to reason about. Using the S2E Selective Symbolic Execution Engine with a built-in constraint solver (KLEE), this research measures the effectiveness of S2E on a sample of 27 Debian Linux binaries as compared to a traditional static disassembly tool, IDA Pro. Disassembly code coverage and path exploration is used as a metric for determining success. This research also explores the effectiveness of symbolic execution on packed or obfuscated samples of the same binaries to generate a model-based evaluation of success for techniques commonly employed by malware. Obfuscated results were much higher than expected, which lead to the discovery that S2E was not actually handling the multiple executable memory regions present in unpacker runtime code. Three recommendations are made to address the shortcomings of S2E and allow it to process obfuscated samples correctly
SEEAD:A Semantic-based Approach for Automatic Binary Code De-obfuscation
Increasingly sophisticated code obfuscation techniques are quickly adopted by malware developers to escape from malware detection and to thwart the reverse engineering effort of security analysts. State-of-the-art de-obfuscation approaches rely on dynamic analysis, but face the challenge of low code coverage as not all software execution paths and behavior will be exposed at specific profiling runs. As a result, these approaches often fail to discover hidden malicious patterns. This paper introduces SEEAD, a novel and generic semantic-based de-obfuscation system. When building SEEAD, we try to rely on as few assumptions about the structure of the obfuscation tool as possible, so that the system can keep pace with the fast evolving code obfuscation techniques. To increase the code coverage, SEEAD dynamically directs the target program to execute different paths across different runs. This dynamic profiling scheme is rife with taint and control dependence analysis to reduce the search overhead, and a carefully designed protection scheme to bring the program to an error free status should any error happens during dynamic profile runs. As a result, the increased code coverage enables us to uncover hidden malicious behaviors that are not detected by traditional dynamic analysis based de-obfuscation approaches. We evaluate SEEAD on a range of benign and malicious obfuscated programs. Our experimental results show that SEEAD is able to successfully recover the original logic from obfuscated binaries
Recommended from our members
Uncovering Features in Behaviorally Similar Programs
The detection of similar code can support many so ware engineering tasks such as program understanding and program classification. Many excellent approaches have been proposed to detect programs having similar syntactic features. However, these approaches are unable to identify programs dynamically or statistically close to each other, which we call behaviorally similar programs. We believe the detection of behaviorally similar programs can enhance or even automate the tasks relevant to program classification. In this thesis, we will discuss our current approaches to identify programs having similar behavioral features in multiple perspectives.
We first discuss how to detect programs having similar functionality. While the definition of a program’s functionality is undecidable, we use inputs and outputs (I/Os) of programs as the proxy of their functionality. We then use I/Os of programs as a behavioral feature to detect which programs are functionally similar: two programs are functionally similar if they share similar inputs and outputs. This approach has been studied and developed in the C language to detect functionally equivalent programs having equivalent I/Os. Nevertheless, some natural problems in Object Oriented languages, such as input generation and comparisons between application-specific data types, hinder the development of this approach. We propose a new technique, in-vivo detection, which uses existing and meaningful inputs to drive applications systematically and then applies a novel similarity model considering both inputs and outputs of programs, to detect functionally similar programs. We develop the tool, HitoshiIO, based on our in-vivo detection. In the subjects that we study, HitoshiIO correctly detect 68.4% of functionally similar programs, where its false positive rate is only 16.6%.
In addition to functional I/Os of programs, we attempt to discover programs having similar execution behavior. Again, the execution behavior of a program can be undecidable, so we use instructions executed at run-time as a behavioral feature of a program. We create DyCLINK, which observes program executions and encodes them in dynamic instruction graphs. A vertex in a dynamic instruction graph is an instruction and an edge is a type of dependency between two instructions. The problem to detect which programs have similar executions can then be reduced to a problem of solving inexact graph isomorphism. We propose a link analysis based algorithm, LinkSub, which vectorizes each dynamic instruction graph by the importance of every instruction, to solve this graph isomorphism problem efficiently. In a K Nearest Neighbor (KNN) based program classification experiment, DyCLINK achieves 90 + % precision.
Because HitoshiIO and DyCLINK both rely on dynamic analysis to expose program behavior, they have better capability to locate and search for behaviorally similar programs than traditional static analysis tools. However, they suffer from some common problems of dynamic analysis, such as input generation and run-time overhead. These problems may make our approaches challenging to scale. Thus, we create the system, Macneto, which integrates static analysis with machine topic modeling and deep learning to approximate program behaviors from their binaries without truly executing programs. In our deobfuscation experiments considering two commercial obfuscators that alter lexical information and syntax in programs, Macneto achieves 90 + % precision, where the groundtruth is that the behavior of a program before and after obfuscation should be the same.
In this thesis, we offer a more extensive view of similar programs than the traditional definitions. While the traditional definitions of similar programs mostly use static features, such as syntax and lexical information, we propose to leverage the power of dynamic analysis and machine learning models to trace/collect behavioral features of pro- grams. These behavioral features of programs can then apply to detect behaviorally similar programs. We believe the techniques we invented in this thesis to detect behaviorally similar programs can improve the development of software engineering and security applications, such as code search and deobfuscation
VirtSC: Combining Virtualization Obfuscation with Self-Checksumming
Self-checksumming (SC) is a tamper-proofing technique that ensures certain
program segments (code) in memory hash to known values at runtime. SC has few
restrictions on application and hence can protect a vast majority of programs.
The code verification in SC requires computation of the expected hashes after
compilation, as the machine-code is not known before. This means the expected
hash values need to be adjusted in the binary executable, hence combining SC
with other protections is limited due to this adjustment step. However,
obfuscation protections are often necessary, as SC protections can be otherwise
easily detected and disabled via pattern matching. In this paper, we present a
layered protection using virtualization obfuscation, yielding an
architecture-agnostic SC protection that requires no post-compilation
adjustment. We evaluate the performance of our scheme using a dataset of 25
real-world programs (MiBench and 3 CLI games). Our results show that the SC
scheme induces an average overhead of 43% for a complete protection (100%
coverage). The overhead is tolerable for less CPU-intensive programs (e.g.
games) and when only parts of programs (e.g. license checking) are protected.
However, large overheads stemming from the virtualization obfuscation were
encountered
- …