732 research outputs found
Configuring Test Generators using Bug Reports: A Case Study of GCC Compiler and Csmith
The correctness of compilers is instrumental in the safety and reliability of
other software systems, as bugs in compilers can produce executables that do
not reflect the intent of programmers. Such errors are difficult to identify
and debug. Random test program generators are commonly used in testing
compilers, and they have been effective in uncovering bugs. However, the
problem of guiding these test generators to produce test programs that are more
likely to find bugs remains challenging. In this paper, we use the code
snippets in the bug reports to guide the test generation. The main idea of this
work is to extract insights from the bug reports about the language features
that are more prone to inadequate implementation and using the insights to
guide the test generators. We use the GCC C compiler to evaluate the
effectiveness of this approach. In particular, we first cluster the test
programs in the GCC bugs reports based on their features. We then use the
centroids of the clusters to compute configurations for Csmith, a popular test
generator for C compilers. We evaluated this approach on eight versions of GCC
and found that our approach provides higher coverage and triggers more
miscompilation failures than the state-of-the-art test generation techniques
for GCC.Comment: The 36th ACM/SIGAPP Symposium on Applied Computing, Software
Verification and Testing Track (SAC-SVT'21
Compiler fuzzing: how much does it matter?
Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether we can manually find application inputs that trigger execution divergence due to such changes. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for at least one package, in which case only a very small part of the binary is typically affected, yet causing two failures when running the test suites of all the impacted packages. User-reported and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or that they would require very specific runtime circumstances to trigger execution divergence
White-box Compiler Fuzzing Empowered by Large Language Models
Compiler correctness is crucial, as miscompilation falsifying the program
behaviors can lead to serious consequences. In the literature, fuzzing has been
extensively studied to uncover compiler defects. However, compiler fuzzing
remains challenging: Existing arts focus on black- and grey-box fuzzing, which
generates tests without sufficient understanding of internal compiler
behaviors. As such, they often fail to construct programs to exercise
conditions of intricate optimizations. Meanwhile, traditional white-box
techniques are computationally inapplicable to the giant codebase of compilers.
Recent advances demonstrate that Large Language Models (LLMs) excel in code
generation/understanding tasks and have achieved state-of-the-art performance
in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code
information remains a missing piece of research in compiler testing.
To this end, we propose WhiteFox, the first white-box compiler fuzzer using
LLMs with source-code information to test compiler optimization. WhiteFox
adopts a dual-model framework: (i) an analysis LLM examines the low-level
optimization source code and produces requirements on the high-level test
programs that can trigger the optimization; (ii) a generation LLM produces test
programs based on the summarized requirements. Additionally,
optimization-triggering tests are used as feedback to further enhance the test
generation on the fly. Our evaluation on four popular compilers shows that
WhiteFox can generate high-quality tests to exercise deep optimizations
requiring intricate conditions, practicing up to 80 more optimizations than
state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80
confirmed as previously unknown and 51 already fixed. Beyond compiler testing,
WhiteFox can also be adapted for white-box fuzzing of other complex, real-world
software systems in general
A Study of TypingRelated Bugs in JVM compilers
Ο έλεγχος των μεταγλωττιστών είναι ένα ερευνητικό πεδίο το οποίο έχει τραβήξει το ενδιαφέρον των ερευνητών την τελευταία δεκαετία. Οι ερευνητές έχουν κυρίως επικεντρωθεί στο να βρουν σφάλματα λογισμικού που τερματίζουν τους μεταγλωττιστές, και εσφαλμένες μεταγλωττίσεις προγραμμάτων οι οποίες οφείλονται σε σφάλματα κατά της φάσης των βελτιστοποιήσεων. Παραδόξως, αυτό το αυξανόμενο σώμα εργασίας παραμελεί άλλες φάσεις του μεταγλωττιστή, με την πιο σημαντική να είναι η μπροστινή πλευρά των μεταγλωττιστών. Σε γλώσσες προγραμματισμού με στατικό σύστημα τύπων που προσφέρουν πλούσιο και εκφραστικό σύστημα τύπων και μοντέρνα χαρακτηριστικά, όπως αυτοματοποιημένα συμπεράσματα τύπων, ή ένα μείγμα από αντικειμενοστραφείς και συναρτησιακά χαρακτηριστικά, ο έλεγχος σχετικά με τους τύπους στο μπροστινό μέρος των μεταγλωττιστών είναι περίπλοκο και περιέχει αρκετά σφάλματα. Τέτοια σφάλματα μπορεί να οδηγήσουν στην αποδοχή εσφαλμένων προγραμμάτων, στην απόρριψη σωστών προγραμμάτων, και στην αναφορά παραπλανητικών σφαλμάτων και προειδοποιήσεων.
Πραγματοποιούμε την πρώτη εμπειρική ανάλυση για την κατανόηση και την κατηγοριοποίηση σφαλμάτων σχετικά με τους τύπους στους μεταγλωττιστές. Για να το κάνουμε αυτό, μελετήσαμε 320 σφάλματα που σχετίζονται με την διαχείριση τύπων (μαζί με τις διορθώσεις και τους ελέγχους τους), τα οποία τα συλλέξαμε με τυχαία δειγματοληψία από τέσσερις δημοφιλής JVM γλώσσες προγραμματισμού, την Java, την Scala, την Kotlin, και την Groovy. Αξιολογήσαμε κάθε σφάλμα με βάση διάφορες πτυχές του, συμπεριλαμβανομένου του συμπτώματος του, της αιτίας που το προκάλεσε, της λύσης του, και των χαρακτηριστικών του προγράμματος που το αποκάλυψε.
Τέλος υλοποιήσαμε ένα εργαλείο το οποίο χρησιμοποιεί τα ευρήματα μας ώστε να βρει με αυτοματοποιημένο τρόπο σφάλματα στο μπροστινό μέρος των μεταγλωττιστών της Kotlin και της Groovy.Compiler testing is a prevalent research topic that has gained much attention in the past decade. Researchers have mainly focused on detecting compiler crashes and miscompilations caused by bugs in the implementation of compiler optimizations. Surprisingly, this growing body of work neglects other compiler components, most notably the front end. In staticallytyped programming languages with rich and expressive type systems and modern features, such as type inference or a mix of objectoriented with functional programming features, the process of static typing in compiler frontends is complicated by a highdensity of bugs. Such bugs can lead to the acceptance of incorrect programs, the rejection of correct programs, and the reporting of misleading errors and warnings.
In this thesis, we undertake the first-ever effort to the best of our knowledge to empirically investigate and characterize typingrelated compiler bugs. To do so, we manually study 320 typingrelated bugs (along with their fixes and test cases) that are randomly sampled from four mainstream JVM languages, namely Java, Scala, Kotlin, and Groovy. We evaluate each bug in terms of several aspects, including their symptom, root cause, bug fix’s size, and the characteristics of the bugrevealing test cases.
Finally, we implement a tool for finding frontend compiler bugs in Groovy and Kotlin compilers by exploiting the findings of our thesis
Recommended from our members
Small scale software engineering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In computing, the Software Crisis has arisen because software projects cannot meet their planned timescales, functional capabilities, reliability levels and budgets. This thesis reduces the general problem down to the Small Scale Software Engineering goal of improving the quality and tractability of the
designs of individual programs. It is demonstrated that the application of eight abstractions (set, sequence, hierarchy, h-reduction, integration, induction, enumeration, generation) can lead to a reduction in the size and complexity of and an increase in the quality of software designs when expressed via Dimensional Design, a new representational technique which uses the three spatial dimensions to represent set, sequence and hierarchy, whilst special symbols and axioms encode the other abstractions. Dimensional Designs are trees of symbols whose edges perceptually encode the relationships between the nodal symbols. They are easy to draw and manipulate both manually and mechanically. Details are given of real software projects already undertaken using Dimensional Design. Its tool kit, DD/ROOTS, produces high quality, machine drawn, detailed design documentation plus novel quality control information. A run time monitor records and animates execution, measures CPU time and
takes snapshots etc; all these results are represented according to Dimensional
Design principles to maintain conceptual integrity with the design. These techniques
are illustrated by the development of a non-trivial example program. Dimensional Design is axiomatised, compared to existing techniques and evaluated against the stated problem. It has advantages over existing techniques, mainly its clarity of expression and ease of manipulation of individual abstractions due to its graphical basis
- …