247 research outputs found
Targeted Greybox Fuzzing with Static Lookahead Analysis
Automatic test generation typically aims to generate inputs that explore new
paths in the program under test in order to find bugs. Existing work has,
therefore, focused on guiding the exploration toward program parts that are
more likely to contain bugs by using an offline static analysis.
In this paper, we introduce a novel technique for targeted greybox fuzzing
using an online static analysis that guides the fuzzer toward a set of target
locations, for instance, located in recently modified parts of the program.
This is achieved by first semantically analyzing each program path that is
explored by an input in the fuzzer's test suite. The results of this analysis
are then used to control the fuzzer's specialized power schedule, which
determines how often to fuzz inputs from the test suite. We implemented our
technique by extending a state-of-the-art, industrial fuzzer for Ethereum smart
contracts and evaluate its effectiveness on 27 real-world benchmarks. Using an
online analysis is particularly suitable for the domain of smart contracts
since it does not require any code instrumentation---instrumentation to
contracts changes their semantics. Our experiments show that targeted fuzzing
significantly outperforms standard greybox fuzzing for reaching 83% of the
challenging target locations (up to 14x of median speed-up)
Understanding Large Language Model Based Fuzz Driver Generation
Fuzz drivers are a necessary component of API fuzzing. However, automatically
generating correct and robust fuzz drivers is a difficult task. Compared to
existing approaches, LLM-based (Large Language Model) generation is a promising
direction due to its ability to operate with low requirements on consumer
programs, leverage multiple dimensions of API usage information, and generate
human-friendly output code. Nonetheless, the challenges and effectiveness of
LLM-based fuzz driver generation remain unclear.
To address this, we conducted a study on the effects, challenges, and
techniques of LLM-based fuzz driver generation. Our study involved building a
quiz with 86 fuzz driver generation questions from 30 popular C projects,
constructing precise effectiveness validation criteria for each question, and
developing a framework for semi-automated evaluation. We designed five query
strategies, evaluated 36,506 generated fuzz drivers. Furthermore, the drivers
were compared with manually written ones to obtain practical insights. Our
evaluation revealed that:
while the overall performance was promising (passing 91% of questions), there
were still practical challenges in filtering out the ineffective fuzz drivers
for large scale application; basic strategies achieved a decent correctness
rate (53%), but struggled with complex API-specific usage questions. In such
cases, example code snippets and iterative queries proved helpful; while
LLM-generated drivers showed competent fuzzing outcomes compared to manually
written ones, there was still significant room for improvement, such as
incorporating semantic oracles for logical bugs detection.Comment: 17 pages, 14 figure
Understanding Programs by Exploiting (Fuzzing) Test Cases
Semantic understanding of programs has attracted great attention in the
community. Inspired by recent successes of large language models (LLMs) in
natural language understanding, tremendous progress has been made by treating
programming language as another sort of natural language and training LLMs on
corpora of program code. However, programs are essentially different from texts
after all, in a sense that they are normally heavily structured and
syntax-strict. In particular, programs and their basic units (i.e., functions
and subroutines) are designed to demonstrate a variety of behaviors and/or
provide possible outputs, given different inputs. The relationship between
inputs and possible outputs/behaviors represents the functions/subroutines and
profiles the program as a whole. Therefore, we propose to incorporate such a
relationship into learning, for achieving a deeper semantic understanding of
programs. To obtain inputs that are representative enough to trigger the
execution of most part of the code, we resort to fuzz testing and propose fuzz
tuning to boost the performance of program understanding and code
representation learning, given a pre-trained LLM. The effectiveness of the
proposed method is verified on two program understanding tasks including code
clone detection and code classification, and it outperforms current
state-of-the-arts by large margins. Code is available at
https://github.com/rabbitjy/FuzzTuning.Comment: Findings of the Association for Computational Linguistics: ACL 202
- …