Compiler correctness is crucial, as miscompilation falsifying the program
behaviors can lead to serious consequences. In the literature, fuzzing has been
extensively studied to uncover compiler defects. However, compiler fuzzing
remains challenging: Existing arts focus on black- and grey-box fuzzing, which
generates tests without sufficient understanding of internal compiler
behaviors. As such, they often fail to construct programs to exercise
conditions of intricate optimizations. Meanwhile, traditional white-box
techniques are computationally inapplicable to the giant codebase of compilers.
Recent advances demonstrate that Large Language Models (LLMs) excel in code
generation/understanding tasks and have achieved state-of-the-art performance
in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code
information remains a missing piece of research in compiler testing.
To this end, we propose WhiteFox, the first white-box compiler fuzzer using
LLMs with source-code information to test compiler optimization. WhiteFox
adopts a dual-model framework: (i) an analysis LLM examines the low-level
optimization source code and produces requirements on the high-level test
programs that can trigger the optimization; (ii) a generation LLM produces test
programs based on the summarized requirements. Additionally,
optimization-triggering tests are used as feedback to further enhance the test
generation on the fly. Our evaluation on four popular compilers shows that
WhiteFox can generate high-quality tests to exercise deep optimizations
requiring intricate conditions, practicing up to 80 more optimizations than
state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80
confirmed as previously unknown and 51 already fixed. Beyond compiler testing,
WhiteFox can also be adapted for white-box fuzzing of other complex, real-world
software systems in general