95 research outputs found
Minotaur: A SIMD-Oriented Synthesizing Superoptimizer
Minotaur is a superoptimizer for LLVM's intermediate representation that
focuses on integer SIMD instructions, both portable and specific to x86-64. We
created it to attack problems in finding missing peephole optimizations for
SIMD instructions-this is challenging because there are many such instructions
and they can be semantically complex. Minotaur runs a hybrid synthesis
algorithm where instructions are enumerated concretely, but literal constants
are generated by the solver. We use Alive2 as a verification engine; to do this
we modified it to support synthesis and also to support a large subset of
Intel's vector instruction sets (SSE, AVX, AVX2, and AVX-512). Minotaur finds
many profitable optimizations that are missing from LLVM. It achieves limited
speedups on the integer parts of SPEC CPU2017, around 1.3%, and it speeds up
the test suite for the libYUV library by 2.2%, on average, and by 1.64x
maximum, when targeting an Intel Cascade Lake processor
Superoptimization of WebAssembly Bytecode
Motivated by the fast adoption of WebAssembly, we propose the first
functional pipeline to support the superoptimization of WebAssembly bytecode.
Our pipeline works over LLVM and Souper. We evaluate our superoptimization
pipeline with 12 programs from the Rosetta code project. Our pipeline improves
the code section size of 8 out of 12 programs. We discuss the challenges faced
in superoptimization of WebAssembly with two case studies.Comment: 4 pages, 3 figures. Proceedings of MoreVMs: Workshop on Modern
Language Runtimes, Ecosystems, and VMs (2020
Learned Query Superoptimization
Traditional query optimizers are designed to be fast and stateless: each
query is quickly optimized using approximate statistics, sent off to the
execution engine, and promptly forgotten. Recent work on learned query
optimization have shown that it is possible for a query optimizer to "learn
from its mistakes," correcting erroneous query plans the next time a plan is
produced. But what if query optimizers could avoid mistakes entirely? This
paper presents the idea of learned query superoptimization. A new generation of
query superoptimizers could autonomously experiment to discover optimal plans
using exploration-driven algorithms, iterative Bayesian optimization, and
program synthesis. While such superoptimizers will take significantly longer to
optimize a given query, superoptimizers have the potential to massively
accelerate a large number of important repetitive queries being executed on
data systems today
Application-Specific Memory Subsystems
The disparity in performance between processors and main memories has
led computer architects to incorporate large cache hierarchies in
modern computers. These cache hierarchies are designed to be
general-purpose in that they strive to provide the best possible
performance across a wide range of applications. However, such a memory
subsystem does not necessarily provide the best possible performance for
a particular application.
Although general-purpose memory subsystems are desirable when the
work-load is unknown and the memory subsystem must remain fixed,
when this is not the case a custom memory subsystem may be beneficial.
For example, in an application-specific integrated circuit (ASIC) or
a field-programmable gate array (FPGA) designed to run a particular
application, a custom memory subsystem optimized for that application
would be desirable. In addition, when there are tunable
parameters in the memory subsystem, it may make sense to change these
parameters depending on the application being run. Such a situation
arises today with FPGAs and, to a lesser extent, GPUs, and it is
plausible that general-purpose computers will begin to support
greater flexibility in the memory subsystem in the future.
In this dissertation, we first show that it is possible to create
application-specific memory subsystems that provide much better
performance than a general-purpose memory subsystem. In addition,
we show a way to discover such memory subsystems automatically using
a superoptimization technique on memory address traces gathered
from applications. This allows one to generate a custom memory subsystem
with little effort.
We next show that our memory subsystem superoptimization technique can
be used to optimize for objectives other than performance. As an example,
we show that it is possible to reduce the number of writes to the main
memory, which can be useful for main memories with limited write
durability, such as flash or Phase-Change Memory (PCM).
Finally, we show how to superoptimize memory subsystems for streaming
applications, which are a class of parallel applications. In particular, we
show that, through the use of ScalaPipe, we can author and deploy streaming
applications targeting FPGAs with superoptimized memory subsystems.
ScalaPipe is a domain-specific language (DSL) embedded in the Scala
programming language for generating streaming applications that can be
implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we
are able to demonstrate actual performance improvements using the
superoptimized memory subsystem with applications implemented in hardware
Are There Good Mistakes? A Theoretical Analysis of CEGIS
Counterexample-guided inductive synthesis CEGIS is used to synthesize
programs from a candidate space of programs. The technique is guaranteed to
terminate and synthesize the correct program if the space of candidate programs
is finite. But the technique may or may not terminate with the correct program
if the candidate space of programs is infinite. In this paper, we perform a
theoretical analysis of counterexample-guided inductive synthesis technique. We
investigate whether the set of candidate spaces for which the correct program
can be synthesized using CEGIS depends on the counterexamples used in inductive
synthesis, that is, whether there are good mistakes which would increase the
synthesis power. We investigate whether the use of minimal counterexamples
instead of arbitrary counterexamples expands the set of candidate spaces of
programs for which inductive synthesis can successfully synthesize a correct
program. We consider two kinds of counterexamples: minimal counterexamples and
history bounded counterexamples. The history bounded counterexample used in any
iteration of CEGIS is bounded by the examples used in previous iterations of
inductive synthesis. We examine the relative change in power of inductive
synthesis in both cases. We show that the synthesis technique using minimal
counterexamples MinCEGIS has the same synthesis power as CEGIS but the
synthesis technique using history bounded counterexamples HCEGIS has different
power than that of CEGIS, but none dominates the other.Comment: In Proceedings SYNT 2014, arXiv:1407.493
Populating the Peephole Optimizer of a Smart Contract Compiler
Developing compiler optimizations, especially for new, rapidly evolving smart contract languages, can be onerous and error-prone, but is especially important for smart contracts, where deployment and execution directly translate to monetary cost and which cannot change once deployed. One common optimization technique is the use of peephole optimizations, replacement rules that are applied using pattern-matching. These rules are normally constructed using human expertise, which is both time-consuming and far from systematic in exploring opportunities for optimization. In this work we propose a pipeline to automatically populate the peephole optimizer of a smart contract compiler. We apply superoptimization to an existing code base to obtain sequences of instructions, which can be replaced by cheaper, observationally equivalent instructions. We then generate peephole optimization rules by extracting the underlying patterns of these optimizations. We provide a case study of our approach and a prototype implementation for bytecode of the Ethereum Virtual Machine, the tool ppltr, which combines the superoptimizer ebso and the rule generator sorg. Then we evaluate our approach by generating and applying nearly 1k peephole optimization rules extracted from 2k optimizations obtained from deployed bytecode
- …