76 research outputs found

    Application-Specific Memory Subsystems

    Get PDF
    The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware

    Unbounded Superoptimization

    Get PDF
    Our aim is to enable software to take full advantage of the capabilities of emerging microprocessor designs without modifying the compiler. Towards this end, we propose a new approach to code generation and optimization. Our approach uses an SMT solver in a novel way to generate efficient code for modern architectures and guarantee that the generated code correctly implements the source code. The distinguishing characteristic of our approach is that the size of the constraints does not depend on the candidate sequence of instructions. To study the feasibility of our approach, we implemented a preliminary prototype, which takes as input LLVM IR code and uses Z3 SMT solver to generate ARMv7-A assembly. The prototype handles arbitrary loop-free code (not only basic blocks) as input and output. We applied it to small but tricky examples used as standard benchmarks for other superoptimization and synthesis tools. We are encouraged to see that Z3 successfully solved complex constraints that arise from our approach. This work paves the way to employing recent advances in SMT solvers and has a potential to advance SMT solvers further by providing a new category of challenging benchmarks that come from an industrial application domain

    Maximally robust controllers for multivariable systems

    No full text
    Published versio

    Minotaur: A SIMD-Oriented Synthesizing Superoptimizer

    Full text link
    Minotaur is a superoptimizer for LLVM's intermediate representation that focuses on integer SIMD instructions, both portable and specific to x86-64. We created it to attack problems in finding missing peephole optimizations for SIMD instructions-this is challenging because there are many such instructions and they can be semantically complex. Minotaur runs a hybrid synthesis algorithm where instructions are enumerated concretely, but literal constants are generated by the solver. We use Alive2 as a verification engine; to do this we modified it to support synthesis and also to support a large subset of Intel's vector instruction sets (SSE, AVX, AVX2, and AVX-512). Minotaur finds many profitable optimizations that are missing from LLVM. It achieves limited speedups on the integer parts of SPEC CPU2017, around 1.3%, and it speeds up the test suite for the libYUV library by 2.2%, on average, and by 1.64x maximum, when targeting an Intel Cascade Lake processor
    • …
    corecore