20 research outputs found

    Automated Test Generation for OpenCL Kernels using Fuzzing and Constraint Solving

    Get PDF

    Automated testing for GPU kernels

    Get PDF
    Graphics Processing Units (GPUs) are massively parallel processors offering performance acceleration and energy efficiency unmatched by current processors (CPUs) in computers. These advantages along with recent advances in the programmability of GPUs have made them widely used in various general-purpose computing domains. However, this has also made testing GPU kernels critical to ensure that their behaviour meets the requirements of the design and specification. Despite the advances in programmability, GPU kernels are hard to code and analyse due to the high complexity of memory sharing patterns, striding patterns for memory accesses, implicit synchronisation, and combinatorial explosion of thread interleavings. Existing few techniques for testing GPU kernels use symbolic execution for test generation that incur a high overhead, have limited scalability and do not handle all data types. In this thesis, we present novel approaches to measure test effectiveness and generate tests automatically for GPU kernels. To achieve this, we address significant challenges related to the GPU execution and memory model, and the lack of customised thread scheduling and global synchronisation. We make the following contributions: First, we present a framework, CLTestCheck, for assessing the quality of test suites developed for GPU kernels. The framework can measure code coverage using three different coverage metrics that are inspired by faults found in real kernel code. Fault finding capability of the test suite is also measured by the framework to seed different types of faults in the kernel and reported in the form of mutation score, which is the ratio of the number of uncovered faults to the total number of seeded faults. Second, with the goal of being fast, effective and scalable, we propose a test generation technique, CLFuzz, for GPU kernels that combines mutation-based fuzzing for fast test generation and selective SMT solving to help cover unreachable branches by fuzzing. Fuzz testing for GPU kernels has not been explored previously. Our approach for fuzz testing randomly mutates input kernel argument values with the goal of increasing branch coverage and supports GPU-specific data types such as images. When fuzz testing is unable to increase branch coverage with random mutations, we gather path constraints for uncovered branch conditions, build additional constraints to represent the context of GPU execution such as number of threads and work-group size, and invoke the Z3 constraint solver to generate tests for them. Finally, to help uncover inter work-group data races and replay these bugs with fixed work-group schedules, we present a schedule amplifier, CLSchedule, that simulates multiple work-group schedules, with which to execute each of the generated tests. By reimplementing the OpenCL API, CLSchedule executes the kernel with a fixed work-group schedule rather than the default arbitrary schedule. It also executes the kernel directly, without requiring the developer to manually provide boilerplate host code. The outcome of our research can be summarised as follows: 1. CLTestCheck is applied to 82 publicly available GPU kernels from industry-standard benchmark suites along with their test suites. The experiment reveals that CLTestCheck is capable of automatically measuring the effectiveness of test suites, in terms of code coverage, faulting finding capability and revealing data races in real OpenCL kernels. 2. CLFuzz can automatically generate tests and achieve close to 100% coverage and mutation score for the majority of the data set of 217 GPU kernels collected from open-source projects and industry-standard benchmarks. 3. CLSchedule is capable of exploring the effect of work-group schedules on the 217 GPU kernels and uncovers data races in 21 of them. The techniques developed in this thesis demonstrate that we can measure the effectiveness of tests developed for GPU kernels with our coverage criteria and fault seeding methods. The result is useful in highlighting code portions that may need developers' further attention. Our automated test generation and work-group scheduling approaches are also fast, effective and scalable, with small overhead incurred (average of 0.8 seconds) and scalability to large kernels with complex data structures

    Symbolic execution of verification languages and floating-point code

    Get PDF
    The focus of this thesis is a program analysis technique named symbolic execution. We present three main contributions to this field. First, an investigation into comparing several state-of-the-art program analysis tools at the level of an intermediate verification language over a large set of benchmarks, and improvements to the state-of-the-art of symbolic execution for this language. This is explored via a new tool, Symbooglix, that operates on the Boogie intermediate verification language. Second, an investigation into performing symbolic execution of floating-point programs via a standardised theory of floating-point arithmetic that is supported by several existing constraint solvers. This is investigated via two independent extensions of the KLEE symbolic execution engine to support reasoning about floating-point operations (with one tool developed by the thesis author). Third, an investigation into the use of coverage-guided fuzzing as a means for solving constraints over finite data types, inspired by the difficulties associated with solving floating-point constraints. The associated prototype tool, JFS, which builds on the LibFuzzer project, can at present be applied to a wide range of SMT queries over bit-vector and floating-point variables, and shows promise on floating-point constraints.Open Acces

    Metamorphic Testing for Software Libraries and Graphics Compilers

    Get PDF
    Metamorphic Testing is a testing technique which mutates existing test cases in semantically equivalent forms, by making use of metamorphic relations, while avoiding the oracle problem. However, these required relations are not readily available for a given system under test. Defining effective metamorphic relations is difficult, and arguably the main obstacle towards adoption of metamorphic testing in production-level software development. One example application is testing graphics compilers, where the approximate and under-specified nature of the domain makes it hard to apply more traditional techniques. We propose an approach with a lower barrier of entry to applying metamorphic testing for a software library. The user must still identify relations that hold over their particular library, but can do so within a development-like environment. We apply methods from the domains of metamorphic testing and fuzzing to produce complex test cases. We consider the user interaction a bonus, as they can control what parts of the target codebase is tested, potentially focusing on less-tested or critical sections of the codebase. We implement our proposed approach in a tool, MF++, which synthesises C++ test cases for a C++ library, defined by user-provided ingredients. We applied MF++ to 7 libraries in the domains of satisfiability modulo theories and Presburger arithmetic,. Our evaluation of MF++ was able to identify 21 bugs in these tools. We additionally provide an automatic reducer for tests generated by MF++, named MF++R. In addition to minimising tests exposing issues, MF++R can also be used to identify incorrect user-provided relations. Additionally, we investigate the combined use of MF++ and MF++R in order to augment code coverage of library test suites. We assess the utility of this application by contributing 21 tests aimed at improving coverage across 3 libraries.Open Acces

    A Generative and Mutational Approach for Synthesizing Bug-exposing Test Cases to Guide Compiler Fuzzing

    Get PDF
    Random test case generation, or fuzzing, is a viable means for uncovering compiler bugs. Unfortunately, compiler fuzzing can be time-consuming and inefficient with purely randomly generated test cases due to the complexity of modern compilers. We present ComFuzz, a focused compiler fuzzing framework. ComFuzz aims to improve compiler fuzzing efficiency by focusing on testing components and language features that are likely to trigger compiler bugs. Our key insight is human developers tend to make common and repeat errors across compiler implementations; hence, we can leverage the previously reported buggy-exposing test cases of a programming language to test a new compiler implementation. To this end, ComFuzz employs deep learning to learn a test program generator from open-source projects hosted on GitHub. With the machinegenerated test programs in place, ComFuzz then leverages a set of carefully designed mutation rules to improve the coverage and bug-exposing capabilities of the test cases. We evaluate ComFuzz on 11 compilers for JS and Java programming languages. Within 260 hours of automated testing runs, we discovered 33 unique bugs across nine compilers, of which 29 have been confirmed and 22, including an API documentation defect, have already been fixed by the developers. We also compared ComFuzz to eight prior fuzzers on four evaluation metrics. In a 24-hour comparative test, ComFuzz uncovers at least 1.5× more bugs than the state-of-the-art baselines

    Correct Optimized GPU Programs

    Get PDF

    Simulation methodologies for mobile GPUs

    Get PDF
    GPUs critically rely on a complex system software stack comprising kernel- and user-space drivers and JIT compilers. Yet, existing GPU simulators typically abstract away details of the software stack and GPU instruction set. Partly, this is because GPU vendors rarely release sufficient information about their latest GPU products. However, this is also due to the lack of an integrated CPU-GPU simulation framework, which is complete and powerful enough to drive the complex GPU software environment. This has led to a situation where research on GPU architectures and compilers is largely based on outdated or greatly simplified architectures and software stacks, undermining the validity of the generated results. Making the situation even more dire, existing GPU simulation efforts are concentrated around desktop GPUs, making infrastructure for modelling mobile GPUs virtually non-existent, despite their surging importance in the GPU market. Still, mobile GPU designers are faced with the challenge of evaluating design alternatives involving hundreds of architectural configuration options and micro-architectural improvements under tight time-to-market constraints, to which currently employed design flows involving detailed, but slow simulations are not well suited. In this thesis we develop a full-system simulation environment for a mobile platform, which enables users to run a complete and unmodified software stack for a state-of-the-art mobile Arm CPU and Mali Bifrost GPU powered device, achieving 100\% architectural accuracy across all available toolchains. We demonstrate the capability of our GPU simulation framework through a number of case studies exploring modern, mobile GPU applications, and optimize them using functional simulation statistics, unavailable with other approaches or hardware. Furthermore, we develop a trace-based performance model, allowing architects to rapidly model GPU configurations in early design space exploration

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems

    Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

    Full text link
    The adoption of hardware accelerators, such as Field-Programmable Gate Arrays, into general-purpose computation pipelines continues to rise, driven by recent trends in data collection and analysis as well as pressure from challenging physical design constraints in hardware. The architectural designs of many of these accelerators stand in stark contrast to the traditional von Neumann model of CPUs. Consequently, existing programming languages, maintenance tools, and techniques are not directly applicable to these devices, meaning that additional architectural knowledge is required for effective programming and configuration. Current programming models and techniques are akin to assembly-level programming on a CPU, thus placing significant burden on developers tasked with using these architectures. Because programming is currently performed at such low levels of abstraction, the software development process is tedious and challenging and hinders the adoption of hardware accelerators. This dissertation explores the thesis that theoretical finite automata provide a suitable abstraction for bridging the gap between high-level programming models and maintenance tools familiar to developers and the low-level hardware representations that enable high-performance execution on hardware accelerators. We adopt a principled hardware/software co-design methodology to develop a programming model providing the key properties that we observe are necessary for success, namely performance and scalability, ease of use, expressive power, and legacy support. First, we develop a framework that allows developers to port existing, legacy code to run on hardware accelerators by leveraging automata learning algorithms in a novel composition with software verification, string solvers, and high-performance automata architectures. Next, we design a domain-specific programming language to aid programmers writing pattern-searching algorithms and develop compilation algorithms to produce finite automata, which supports efficient execution on a wide variety of processing architectures. Then, we develop an interactive debugger for our new language, which allows developers to accurately identify the locations of bugs in software while maintaining support for high-throughput data processing. Finally, we develop two new automata-derived accelerator architectures to support additional applications, including the detection of security attacks and the parsing of recursive and tree-structured data. Using empirical studies, logical reasoning, and statistical analyses, we demonstrate that our prototype artifacts scale to real-world applications, maintain manageable overheads, and support developers' use of hardware accelerators. Collectively, the research efforts detailed in this dissertation help ease the adoption and use of hardware accelerators for data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd
    corecore