Search CORE

1,176 research outputs found

Finding and understanding bugs in C compilers

Author: Regehr John
Yang Xuejun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

ManuscriptCompilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers

The University of Utah: J. Willard Marriott Digital Library

Verification and Application of Program Transformations

Author: Horpácsi Dániel
Publication venue
Publication date: 01/01/2018
Field of study

A programtranszformáció és a refaktorálás alapvető elemei a szoftverfejlesztési folyamatnak. A refaktorálást a kezdetektől próbálják szoftvereszközökkel támogatni, amelyek megbízhatóan és hatékonyan valósítják meg a szoftverminőséget javító, a működést nem érintő programtranszformációkat. A statikus elemzésre alapuló hibakeresés és a refaktorálási transzformációk az akadémiában és a kutatás-fejlesztésben is nagy érdeklődésre tartanak számot, ám még ennél is fontosabb a szerepük a nagy bonyolultságú szoftvereket készítő vállalatoknál. Egyre pontosabbak és megbízhatóbbak a szoftverfejlesztést támogató eszközök, de bőven van még min javítani. A disszertáció olyan definíciós és verifikációs módszereket tárgyal, amelyekkel megbízhatóbb és szélesebb körben használt programtranszformációs eszközöket tudunk készíteni. A dolgozat a statikus és a dinamikus verifikációt is érinti. Elsőként egy újszerű, tömör leíró nyelvet mutat be L-attribútum grammatikákhoz, amelyet tulajdonságalapú teszteléshez használt véletlenszerű adatgenerátorra képezünk le. Ehhez egy esettanulmány társul, amely az Erlang programozási nyelv grammatikáját, majd a teszteléshez való felhasználását mutatja be. A tesztelés mellett a formális helyességbizonyítás kérdését is vizsgáljuk, ehhez bevezetünk egy refaktorálások leírására szolgáló nyelvet, amelyben végrehajtható és automatikusan bizonyítható specifikációkat tudunk megadni. A nyelv környezetfüggő és feltételes termátíráson, stratégiákon és úgynevezett refaktorálási sémákon alapszik. Végül, de nem utolsó sorban a programtranszformációk egy speciális alkalmazása kerül bemutatásra, amikor egy refaktoráló keretrendszert előfordítóként használunk a feldolgozott programozási nyelv kiterjesztésére. Utóbbi módszerrel könnyen implementálható az Erlang nyelvben a kódmigráció

ELTE Digital Institutional Repository (EDIT)

Doctor of Philosophy

Author: Yang Xuejun
Publication venue: University of Utah
Publication date: 01/12/2014
Field of study

dissertationCompilers are indispensable tools to developers. We expect them to be correct. However, compiler correctness is very hard to be reasoned about. This can be partly explained by the daunting complexity of compilers. In this dissertation, I will explain how we constructed a random program generator, Csmith, and used it to find hundreds of bugs in strong open source compilers such as the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). The success of Csmith depends on its ability of being expressive and unambiguous at the same time. Csmith is composed of a code generator and a GTAV (Generation-Time Analysis and Validation) engine. They work interactively to produce expressive yet unambiguous random programs. The expressiveness of Csmith is attributed to the code generator, while the unambiguity is assured by GTAV. GTAV performs program analyses, such as points-to analysis and effect analysis, efficiently to avoid ambiguities caused by undefined behaviors or unspecifed behaviors. During our 4.25 years of testing, Csmith has found over 450 bugs in the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). We analyzed the bugs by putting them into different categories, studying the root causes, finding their locations in compilers' source code, and evaluating their importance. We believe analysis results are useful to future random testers, as well as compiler writers/users

The University of Utah: J. Willard Marriott Digital Library

Automatic Test Case Reduction for OpenCL

Author: Alastair F Donaldson
Andrei Lascu
Moritz Pflanzer
Publication venue
Publication date: 05/03/2020
Field of study

ABSTRACT We report on an extension to the C-Reduce tool, for automatic reduction of C test cases, to handle OpenCL kernels. This enables an automated method for detecting bugs in OpenCL compilers, by generating large random kernels using the CLsmith generator, identifying kernels that yield result differences across OpenCL platforms and optimisation levels, and using our novel extension to C-Reduce to automatically reduce such kernels to minimal forms that can be filed as bug reports. A major part of our effort involved the design of ShadowKeeper, a new plugin for the Oclgrind simulator that provides accurate detection of accesses to uninitialised data. We present experimental results showing the effectiveness of our method for finding bugs in a number of OpenCL compilers

CiteSeerX

Towards Automating Grammar Equivalence Checking

Author: Gulwani Sumit
Kandhadai Madhavan Ravichandhran
Kuncak Viktor
Mayer Mikael
Publication venue
Publication date: 26/03/2015
Field of study

We consider from practical perspective the (generally undecidable) problem of checking equivalence of context-free grammars. We present both techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key building blocks of our approach is a novel algorithm for efficiently enumerating and sampling words and parse trees from arbitrary context-free grammars; the algorithm supports polynomial time random access to words belonging to the grammar. Furthermore, we propose an algorithm for proving equivalence of context-free grammars that is complete for LL grammars, yet can be invoked on any context-free grammar, including ambiguous grammars. Our techniques successfully find discrepancies between different syntax specifications of several real-world languages, and is capable of detecting fine-grained incremental modifications performed on grammars. Our evaluation shows that our tool improves significantly on the existing available state of the art tools. In addition, we used these algorithms to develop an online tutoring system for grammars that we then used in an undergraduate course on computer language processing. On questions involving grammar constructions, our system was able to automatically evaluate the correctness of 95% equivalence questions: it disproved 74% of cases and proved 21% of them. This opens up the possibility of using our tool in massive open online courses to introduce grammars to large populations of students

Infoscience - École polytechnique fédérale de Lausanne

Automating Grammar Comparison

Author: Aho A. V.
Kozen D.
Malloy B. A.
Purdom P.
Valiant L. G.
Publication venue
Publication date: 26/11/2015
Field of study

We consider from a practical perspective the problem of checking equivalence of context-free grammars. We present techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key building blocks of our approach is a novel algorithm for efficiently enumerating and sampling words and parse trees from arbitrary context-free grammars; the algorithm supports polynomial time random access to words belonging to the grammar. Furthermore, we propose an algorithm for proving equivalence of context-free grammars that is complete for LL grammars, yet can be invoked on any context-free grammar, including ambiguous grammars. Our techniques successfully find discrepancies between different syntax specifications of several real-world languages, and are capable of detecting fine-grained incremental modifications performed on grammars. Our evaluation shows that our tool improves significantly on the existing available state of the art tools. In addition, we used these algorithms to develop an online tutoring system for grammars that we then used in an undergraduate course on computer language processing. On questions involving grammar constructions, our system was able to automatically evaluate the correctness of 95% of the solutions submitted by students: it disproved 74% of cases and proved 21% of the

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Large Language Models for Compiler Optimization

Author: Cummins Chris
Elhoushi Mostafa
Gehring Jonas
Gloeckle Fabian
Grubisic Dejan
Hazelwood Kim
Leather Hugh
Liang Youwei
Roziere Baptiste
Seeker Volker
Synnaeve Gabriel
Publication venue
Publication date: 11/09/2023
Field of study

We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time

arXiv.org e-Print Archive