517 research outputs found
TorchProbe: Fuzzing Dynamic Deep Learning Compilers
Static and dynamic computational graphs represent two distinct approaches to
constructing deep learning frameworks. The former prioritizes compiler-based
optimizations, while the latter focuses on programmability and
user-friendliness. The recent release of PyTorch 2.0, which supports compiling
arbitrary deep learning programs in Python, signifies a new direction in the
evolution of deep learning infrastructure to incorporate compiler techniques in
a more dynamic manner and support more dynamic language features like dynamic
control flows and closures. Given PyTorch's seamless integration with Python,
its compiler aims to support arbitrary deep learning code written in Python.
However, the inherent dynamism of Python poses challenges to the completeness
and robustness of the compiler. While recent research has introduced fuzzing to
test deep learning compilers, there is still a lack of comprehensive analysis
on how to test dynamic features. To address this issue, we propose several code
transformations to generate test cases involving dynamic features. These
transformations preserve the program's semantics, ensuring that any discrepancy
between the transformed and original programs indicates the presence of a bug.
Through our approach, we have successfully identified twenty previously unknown
bugs in the PyTorch compiler and its underlying tensor compiler Triton
White-box Compiler Fuzzing Empowered by Large Language Models
Compiler correctness is crucial, as miscompilation falsifying the program
behaviors can lead to serious consequences. In the literature, fuzzing has been
extensively studied to uncover compiler defects. However, compiler fuzzing
remains challenging: Existing arts focus on black- and grey-box fuzzing, which
generates tests without sufficient understanding of internal compiler
behaviors. As such, they often fail to construct programs to exercise
conditions of intricate optimizations. Meanwhile, traditional white-box
techniques are computationally inapplicable to the giant codebase of compilers.
Recent advances demonstrate that Large Language Models (LLMs) excel in code
generation/understanding tasks and have achieved state-of-the-art performance
in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code
information remains a missing piece of research in compiler testing.
To this end, we propose WhiteFox, the first white-box compiler fuzzer using
LLMs with source-code information to test compiler optimization. WhiteFox
adopts a dual-model framework: (i) an analysis LLM examines the low-level
optimization source code and produces requirements on the high-level test
programs that can trigger the optimization; (ii) a generation LLM produces test
programs based on the summarized requirements. Additionally,
optimization-triggering tests are used as feedback to further enhance the test
generation on the fly. Our evaluation on four popular compilers shows that
WhiteFox can generate high-quality tests to exercise deep optimizations
requiring intricate conditions, practicing up to 80 more optimizations than
state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80
confirmed as previously unknown and 51 already fixed. Beyond compiler testing,
WhiteFox can also be adapted for white-box fuzzing of other complex, real-world
software systems in general
NeuRI: Diversifying DNN Generation via Inductive Rule Inference
Deep Learning (DL) is prevalently used in various industries to improve
decision-making and automate processes, driven by the ever-evolving DL
libraries and compilers. The correctness of DL systems is crucial for trust in
DL applications. As such, the recent wave of research has been studying the
automated synthesis of test-cases (i.e., DNN models and their inputs) for
fuzzing DL systems. However, existing model generators only subsume a limited
number of operators, lacking the ability to pervasively model operator
constraints. To address this challenge, we propose NeuRI, a fully automated
approach for generating valid and diverse DL models composed of hundreds of
types of operators. NeuRI adopts a three-step process: (i) collecting valid and
invalid API traces from various sources; (ii) applying inductive program
synthesis over the traces to infer the constraints for constructing valid
models; and (iii) using hybrid model generation which incorporates both
symbolic and concrete operators. Our evaluation shows that NeuRI improves
branch coverage of TensorFlow and PyTorch by 24% and 15% over the
state-of-the-art model-level fuzzers. NeuRI finds 100 new bugs for PyTorch and
TensorFlow in four months, with 81 already fixed or confirmed. Of these, 9 bugs
are labelled as high priority or security vulnerability, constituting 10% of
all high-priority bugs of the period. Open-source developers regard
error-inducing tests reported by us as "high-quality" and "common in practice"
- …