358 research outputs found

    An Application-Specific Instruction Set for Accelerating Set-Oriented Database Primitives

    Get PDF
    The key task of database systems is to efficiently manage large amounts of data. A high query throughput and a low query latency are essential for the success of a database system. Lately, research focused on exploiting hardware features like superscalar execution units, SIMD, or multiple cores to speed up processing. Apart from these software optimizations for given hardware, even tailor-made processing circuits running on FPGAs are built to run mostly stateless query plans with incredibly high throughput. A similar idea, which was already considered three decades ago, is to build tailor-made hardware like a database processor. Despite their superior performance, such application-specific processors were not considered to be beneficial because general-purpose processors eventually always caught up so that the high development costs did not pay off. In this paper, we show that the development of a database processor is much more feasible nowadays through the availability of customizable processors. We illustrate exemplarily how to create an instruction set extension for set-oriented database rimitives. The resulting application-specific processor provides not only a high performance but it also enables very energy-efficient processing. Our processor requires in various configurations more than 960x less energy than a high-end x86 processor while providing the same performance

    Verification of loop parallelisations

    Get PDF
    Writing correct parallel programs becomes more and more difficult as the complexity and heterogeneity of processors increase. This issue is addressed by parallelising compilers. Various compiler directives can be used to tell these compilers where to parallelise. This paper addresses the correctness of such compiler directives for loop parallelisation. Specifically, we propose a technique based on separation logic to verify whether a loop can be parallelised. Our approach requires each loop iteration to be specified with the locations that are read and written in this iteration. If the specifications are correct, they can be used to draw conclusions about loop (in)dependences. Moreover, they also reveal where synchronisation is needed in the parallelised program. The loop iteration specifications can be verified using permission-based separation logic and seamlessly integrate with functional behaviour specifications. We formally prove the correctness of our approach and we discuss automated tool support for our technique. Additionally, we also discuss how the loop iteration contracts can be compiled into specifications for the code coming out of the parallelising compiler

    Verified lifting of stencil computations

    Get PDF
    This paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and leverages counter-example guided inductive synthesis (CEGIS) to find provably correct translations. Lifting existing code to a high-performance description language has a number of benefits, including maintainability and performance portability. For example, our experiments show that the lifted summaries can enable domain specific compilers to do a better job of parallelization as compared to an off-the-shelf compiler working on the original code, and can even support fully automatic migration to hardware accelerators such as GPUs. We have implemented verified lifting in a system called STNG and have evaluated it using microbenchmarks, mini-apps, and real-world applications. We demonstrate the benefits of verified lifting by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.United States. Department of Energy. Office of Science (Award DE-SC0008923)United States. Department of Energy. Office of Science (Award DE-SC0005288

    Modular Synthesis of Sketches Using Models

    Get PDF
    One problem with the constraint-based approaches to synthesis that have become popular over the last few years is that they only scale to relatively small routines, on the order of a few dozen lines of code. This paper presents a mechanism for modular reasoning that allows us to break larger synthesis problems into small manageable pieces. The approach builds on previous work in the verification community of using high-level specifications and partially interpreted functions (we call them models) in place of more complex pieces of code in order to make the analysis modular. The main contribution of this paper is to show how to combine these techniques with the counterexample guided synthesis approaches used to efficiently solve synthesis problems. Specifically, we show two new algorithms; one to efficiently synthesize functions that use models, and another one to synthesize functions while ensuring that the behavior of the resulting function will be in the set of behaviors allowed by the model. We have implemented our approach on top of the open-source Sketch synthesis system, and we demonstrate its effectiveness on several Sketch benchmark problems.National Science Foundation (U.S.) (Grant NSF-1116362)National Science Foundation (U.S.) (Grant NSF-1139056)United States. Dept. of Energy (Grant DE-SC0005372

    Bridging the Gap Between General-Purpose and Domain-Specific Compilers with Synthesis

    Get PDF
    This paper describes a new approach to program optimization that allows general purpose code to benefit from the optimization power of domain-specific compilers. The key to this approach is a synthesis-based technique to raise the level of abstraction of general-purpose code to enable aggressive domain-specific optimizations. We have been implementing this approach in an extensible system called Herd. The system is designed around a collection of parameterized kernel translators. Each kernel translator is associated with a domain-specific compiler, and the role of each kernel translator is to scan the input code in search of code fragments that can be optimized by the domain-specific compiler embedded within each kernel translator. By leveraging general synthesis technology, it is possible to have a generic kernel translator that can be specialized by compiler developers for each domain-specific compiler, making it easy to build new domain knowledge into the overall system. We illustrate this new approach to build optimizing compilers in two different domains, and highlight research challenges that need to be addressed in order to achieve the ultimate vision
    • …
    corecore