10 research outputs found
The Taint Rabbit: Optimizing Generic Taint Analysis with Dynamic Fast Path Generation
Generic taint analysis is a pivotal technique in software security. However,
it suffers from staggeringly high overhead. In this paper, we explore the
hypothesis whether just-in-time (JIT) generation of fast paths for tracking
taint can enhance the performance. To this end, we present the Taint Rabbit,
which supports highly customizable user-defined taint policies and combines a
JIT with fast context switching. Our experimental results suggest that this
combination outperforms notable existing implementations of generic taint
analysis and bridges the performance gap to specialized trackers. For instance,
Dytan incurs an average overhead of 237x, while the Taint Rabbit achieves 1.7x
on the same set of benchmarks. This compares favorably to the 1.5x overhead
delivered by the bitwise, non-generic, taint engine LibDFT
Recommended from our members
Identifying Program Entropy Characteristics with Symbolic Execution
The security infrastructure underpinning our society relies on encryption, which relies on the correct generation and use of pseudorandom data. Unfortunately, random data is deceptively hard to generate. Implementation problems in PRNGs and the incorrect usage of generated random data in cryptographic algorithms have led to many issues, including the infamous Debian OpenSSL bug, which exposed millions of systems on the internet to potential compromise due to a mistake that limited the source of randomness during key generation to have 2^15 different seeds (i.e. 15 bits of entropy).It is important to automatically identify if a given program applies a certain cryptographic algorithm or uses its random data correctly.This paper tackles the very first step of this problem by extracting an understanding of how a binary program generates or uses randomness. Specifically, we set the following problem: given a program (or a specific function), can we estimate bounds on the amount of randomness present in the program or function's output by determining bounds on the entropy of this output data? Our technique estimates upper bounds on the entropy of program output through a process of expression reinterpretation and stochastic probability estimation, related to abstract interpretation and model counting
Fuzzing Symbolic Expressions
Recent years have witnessed a wide array of results in software testing,
exploring different approaches and methodologies ranging from fuzzers to
symbolic engines, with a full spectrum of instances in between such as concolic
execution and hybrid fuzzing. A key ingredient of many of these tools is
Satisfiability Modulo Theories (SMT) solvers, which are used to reason over
symbolic expressions collected during the analysis. In this paper, we
investigate whether techniques borrowed from the fuzzing domain can be applied
to check whether symbolic formulas are satisfiable in the context of concolic
and hybrid fuzzing engines, providing a viable alternative to classic SMT
solving techniques. We devise a new approximate solver, FUZZY-SAT, and show
that it is both competitive with and complementary to state-of-the-art solvers
such as Z3 with respect to handling queries generated by hybrid fuzzers
WEIZZ: Automatic grey-box fuzzing for structured binary formats
Fuzzing technologies have evolved at a fast pace in recent years, revealing bugs in programs with ever increasing depth and speed. Applications working with complex formats are however more difficult to take on, as inputs need to meet certain format-specific characteristics to get through the initial parsing stage and reach deeper behaviors of the program. Unlike prior proposals based on manually written format specifications, we propose a technique to automatically generate and mutate inputs for unknown chunk-based binary formats. We identify dependencies between input bytes and comparison instructions, and use them to assign tags that characterize the processing logic of the program. Tags become the building block for structure-aware mutations involving chunks and fields of the input. Our technique can perform comparably to structure-aware fuzzing proposals that require human assistance. Our prototype implementation WEIZZ revealed 16 unknown bugs in widely used programs
WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats
Fuzzing technologies have evolved at a fast pace in recent years, revealing
bugs in programs with ever increasing depth and speed. Applications working
with complex formats are however more difficult to take on, as inputs need to
meet certain format-specific characteristics to get through the initial parsing
stage and reach deeper behaviors of the program. Unlike prior proposals based
on manually written format specifications, in this paper we present a technique
to automatically generate and mutate inputs for unknown chunk-based binary
formats. We propose a technique to identify dependencies between input bytes
and comparison instructions, and later use them to assign tags that
characterize the processing logic of the program. Tags become the building
block for structure-aware mutations involving chunks and fields of the input.
We show that our techniques performs comparably to structure-aware fuzzing
proposals that require human assistance. Our prototype implementation WEIZZ
revealed 16 unknown bugs in widely used programs
Fine Grained Dataflow Tracking with Proximal Gradients
Dataflow tracking with Dynamic Taint Analysis (DTA) is an important method in
systems security with many applications, including exploit analysis, guided
fuzzing, and side-channel information leak detection. However, DTA is
fundamentally limited by the Boolean nature of taint labels, which provide no
information about the significance of detected dataflows and lead to false
positives/negatives on complex real world programs.
We introduce proximal gradient analysis (PGA), a novel, theoretically
grounded approach that can track more accurate and fine-grained dataflow
information. PGA uses proximal gradients, a generalization of gradients for
non-differentiable functions, to precisely compose gradients over
non-differentiable operations in programs. Composing gradients over programs
eliminates many of the dataflow propagation errors that occur in DTA and
provides richer information about how each measured dataflow effects a program.
We compare our prototype PGA implementation to three state of the art DTA
implementations on 7 real-world programs. Our results show that PGA can improve
the F1 accuracy of data flow tracking by up to 33% over taint tracking (20% on
average) without introducing any significant overhead (<5% on average). We
further demonstrate the effectiveness of PGA by discovering 22 bugs (20
confirmed by developers) and 2 side-channel leaks, and identifying exploitable
dataflows in 19 existing CVEs in the tested programs.Comment: To appear in USENIX Security 202
Evaluation Methodologies in Software Protection Research
Man-at-the-end (MATE) attackers have full control over the system on which
the attacked software runs, and try to break the confidentiality or integrity
of assets embedded in the software. Both companies and malware authors want to
prevent such attacks. This has driven an arms race between attackers and
defenders, resulting in a plethora of different protection and analysis
methods. However, it remains difficult to measure the strength of protections
because MATE attackers can reach their goals in many different ways and a
universally accepted evaluation methodology does not exist. This survey
systematically reviews the evaluation methodologies of papers on obfuscation, a
major class of protections against MATE attacks. For 572 papers, we collected
113 aspects of their evaluation methodologies, ranging from sample set types
and sizes, over sample treatment, to performed measurements. We provide
detailed insights into how the academic state of the art evaluates both the
protections and analyses thereon. In summary, there is a clear need for better
evaluation methodologies. We identify nine challenges for software protection
evaluations, which represent threats to the validity, reproducibility, and
interpretation of research results in the context of MATE attacks