19 research outputs found
Improving Function Coverage with Munch: A Hybrid Fuzzing and Directed Symbolic Execution Approach
Fuzzing and symbolic execution are popular techniques for finding
vulnerabilities and generating test-cases for programs. Fuzzing, a blackbox
method that mutates seed input values, is generally incapable of generating
diverse inputs that exercise all paths in the program. Due to the
path-explosion problem and dependence on SMT solvers, symbolic execution may
also not achieve high path coverage. A hybrid technique involving fuzzing and
symbolic execution may achieve better function coverage than fuzzing or
symbolic execution alone. In this paper, we present Munch, an open source
framework implementing two hybrid techniques based on fuzzing and symbolic
execution. We empirically show using nine large open-source programs that
overall, Munch achieves higher (in-depth) function coverage than symbolic
execution or fuzzing alone. Using metrics based on total analyses time and
number of queries issued to the SMT solver, we also show that Munch is more
efficient at achieving better function coverage.Comment: To appear at 33rd ACM/SIGAPP Symposium On Applied Computing (SAC). To
be held from 9th to 13th April, 201
A Task Analysis of Static Binary Reverse Engineering for Security
Software is ubiquitous in society, but understanding it, especially without access to source code, is both non-trivial and critical to security. A specialized group of cyber defenders conducts reverse engineering (RE) to analyze software. The expertise-driven process of software RE is not well understood, especially from the perspective of workflows and automated tools. We conducted a task analysis to explore the cognitive processes that analysts follow when using static techniques on binary code. Experienced analysts were asked to statically find a vulnerability in a small binary that could allow for unverified access to root privileges. Results show a highly iterative process with commonly used cognitive states across participants of varying expertise, but little standardization in process order and structure. A goal-centered analysis offers a different perspective about dominant RE states. We discuss implications about the nature of RE expertise and opportunities for new automation to assist analysts using static techniques
Obfuscating Java Programs by Translating Selected Portions of Bytecode to Native Libraries
Code obfuscation is a popular approach to turn program comprehension and
analysis harder, with the aim of mitigating threats related to malicious
reverse engineering and code tampering. However, programming languages that
compile to high level bytecode (e.g., Java) can be obfuscated only to a limited
extent. In fact, high level bytecode still contains high level relevant
information that an attacker might exploit.
In order to enable more resilient obfuscations, part of these programs might
be implemented with programming languages (e.g., C) that compile to low level
machine-dependent code. In fact, machine code contains and leaks less high
level information and it enables more resilient obfuscations.
In this paper, we present an approach to automatically translate critical
sections of high level Java bytecode to C code, so that more effective
obfuscations can be resorted to. Moreover, a developer can still work with a
single programming language, i.e., Java
VirtSC: Combining Virtualization Obfuscation with Self-Checksumming
Self-checksumming (SC) is a tamper-proofing technique that ensures certain
program segments (code) in memory hash to known values at runtime. SC has few
restrictions on application and hence can protect a vast majority of programs.
The code verification in SC requires computation of the expected hashes after
compilation, as the machine-code is not known before. This means the expected
hash values need to be adjusted in the binary executable, hence combining SC
with other protections is limited due to this adjustment step. However,
obfuscation protections are often necessary, as SC protections can be otherwise
easily detected and disabled via pattern matching. In this paper, we present a
layered protection using virtualization obfuscation, yielding an
architecture-agnostic SC protection that requires no post-compilation
adjustment. We evaluate the performance of our scheme using a dataset of 25
real-world programs (MiBench and 3 CLI games). Our results show that the SC
scheme induces an average overhead of 43% for a complete protection (100%
coverage). The overhead is tolerable for less CPU-intensive programs (e.g.
games) and when only parts of programs (e.g. license checking) are protected.
However, large overheads stemming from the virtualization obfuscation were
encountered
Analysis of Obfuscated Code with Program Slicing
In Man-At-The-End (MATE) attacks, software apps run on a device under full control of the attackers: they can violate the intellectual property of the app by means of malicious reverse engineering, software piracy, and software tampering. Obfuscation is a technique that is widely adopted by developers to mitigate this problem. Obfuscation increases complexity of software code, by obscuring the structure of code and data in order to thwart the reverse engineering process. However, it is possible to reverse engineer obfuscated code with time, determination and the right tools. In general, there is no accepted methodology to determine the strength of obfuscated code; however resilience is often considered a good metric as it indicates the percentage of obfuscated code that cannot be removed by automated de-obfuscation tools. We introduce a novel approach to measure the resilience of obfuscated C code using program slicing. Given a variable of interest, that might be part of a code region used to manipulate a crypto key or a license number, program slicing can mimic the attacker behaviour by trying to remove the code unrelated to that variable, acting as a new type of de-obfuscator
Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis
International audienceWe present a new approach that bridges binary analysis techniques with machine learning classification for the purpose of providing a static and generic evaluation technique for opaque predicates, regardless of their constructions. We use this technique as a static automated deobfuscation tool to remove the opaque predicates introduced by obfuscation mechanisms. According to our experimental results, our models have up to 98% accuracy at detecting and deob-fuscating state-of-the-art opaque predicates patterns. By contrast, the leading edge deobfuscation methods based on symbolic execution show less accuracy mostly due to the SMT solvers constraints and the lack of scalability of dynamic symbolic analyses. Our approach underlines the efficiency of hybrid symbolic analysis and machine learning techniques for a static and generic deobfuscation methodology
Formal framework for reasoning about the precision of dynamic analysis
Dynamic program analysis is extremely successful both in code debugging and in malicious code attacks. Fuzzing, concolic, and monkey testing are instances of the more general problem of analysing programs by dynamically executing their code with selected inputs. While static program analysis has a beautiful and well established theoretical foundation in abstract interpretation, dynamic analysis still lacks such a foundation. In this paper, we introduce a formal model for understanding the notion of precision in dynamic program analysis. It is known that in sound-by-construction static program analysis the precision amounts to completeness. In dynamic analysis, which is inherently unsound, precision boils down to a notion of coverage of execution traces with respect to what the observer (attacker or debugger) can effectively observe about the computation. We introduce a topological characterisation of the notion of coverage relatively to a given (fixed) observation for dynamic program analysis and we show how this coverage can be changed by semantic preserving code transformations. Once again, as well as in the case of static program analysis and abstract interpretation, also for dynamic analysis we can morph the precision of the analysis by transforming the code. In this context, we validate our model on well established code obfuscation and watermarking techniques. We confirm the efficiency of existing methods for preventing control-flow-graph extraction and data exploit by dynamic analysis, including a validation of the potency of fully homomorphic data encodings in code obfuscation