73 research outputs found
ACETest: Automated Constraint Extraction for Testing Deep Learning Operators
Deep learning (DL) applications are prevalent nowadays as they can help with
multiple tasks. DL libraries are essential for building DL applications.
Furthermore, DL operators are the important building blocks of the DL
libraries, that compute the multi-dimensional data (tensors). Therefore, bugs
in DL operators can have great impacts. Testing is a practical approach for
detecting bugs in DL operators. In order to test DL operators effectively, it
is essential that the test cases pass the input validity check and are able to
reach the core function logic of the operators. Hence, extracting the input
validation constraints is required for generating high-quality test cases.
Existing techniques rely on either human effort or documentation of DL library
APIs to extract the constraints. They cannot extract complex constraints and
the extracted constraints may differ from the actual code implementation.
To address the challenge, we propose ACETest, a technique to automatically
extract input validation constraints from the code to build valid yet diverse
test cases which can effectively unveil bugs in the core function logic of DL
operators. For this purpose, ACETest can automatically identify the input
validation code in DL operators, extract the related constraints and generate
test cases according to the constraints. The experimental results on popular DL
libraries, TensorFlow and PyTorch, demonstrate that ACETest can extract
constraints with higher quality than state-of-the-art (SOTA) techniques.
Moreover, ACETest is capable of extracting 96.4% more constraints and detecting
1.95 to 55 times more bugs than SOTA techniques. In total, we have used ACETest
to detect 108 previously unknown bugs on TensorFlow and PyTorch, with 87 of
them confirmed by the developers. Lastly, five of the bugs were assigned with
CVE IDs due to their security impacts.Comment: Accepted by ISSTA 202
Understanding Large Language Model Based Fuzz Driver Generation
Fuzz drivers are a necessary component of API fuzzing. However, automatically
generating correct and robust fuzz drivers is a difficult task. Compared to
existing approaches, LLM-based (Large Language Model) generation is a promising
direction due to its ability to operate with low requirements on consumer
programs, leverage multiple dimensions of API usage information, and generate
human-friendly output code. Nonetheless, the challenges and effectiveness of
LLM-based fuzz driver generation remain unclear.
To address this, we conducted a study on the effects, challenges, and
techniques of LLM-based fuzz driver generation. Our study involved building a
quiz with 86 fuzz driver generation questions from 30 popular C projects,
constructing precise effectiveness validation criteria for each question, and
developing a framework for semi-automated evaluation. We designed five query
strategies, evaluated 36,506 generated fuzz drivers. Furthermore, the drivers
were compared with manually written ones to obtain practical insights. Our
evaluation revealed that:
while the overall performance was promising (passing 91% of questions), there
were still practical challenges in filtering out the ineffective fuzz drivers
for large scale application; basic strategies achieved a decent correctness
rate (53%), but struggled with complex API-specific usage questions. In such
cases, example code snippets and iterative queries proved helpful; while
LLM-generated drivers showed competent fuzzing outcomes compared to manually
written ones, there was still significant room for improvement, such as
incorporating semantic oracles for logical bugs detection.Comment: 17 pages, 14 figure
Enhancing the 3D printing fidelity of vat photopolymerization with machine learning-driven boundary prediction
Like many pixel-based additive manufacturing (AM) techniques, digital light processing (DLP) based vat pho-topolymerization faces the challenge that the square pixel based processing strategy can lead to zigzag edges especially when feature sizes come close to single-pixel levels. Introducing greyscale pixels has been a strategy to smoothen such edges, but it is a challenging task to understand which of the many permutations of projected pix-els would give the optimal 3D printing performance. To address this challenge, a novel data acquisition strategy based on machine learning (ML) principles is proposed, and a training routine is implemented to reproduce the smallest shape of an intended 3D printed object. Through this approach, a chessboard patterning strategy is developed along with an automated data refining and augmentation workflow, demonstrating its efficiency and effectiveness by reducing the deviation by around 30%
- …