13 research outputs found
Recommended from our members
A spill code minimization algorithm for loops
Loops are the main source of parallelism in applications. The issue of finding an optimal register allocation to loops has been an open issue for some time. In this case optimal refers to the minimization of spills from registers to memory. In this paper we address this issue and present an optimal, but exponential algorithm which allocates registers to loop bodies such that the spill code is minimal. We also show heuristic modifications to the algorithm which perform in practice as well as the exponential approach. Finally, we examine this algorithm's feasibility in production compilers
Path splitting--a technique for improving data flow analysis
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 83-87).by Massimiliano Antonio Poletto.M.Eng
Eliminating Branches using a Superoptimizer and the GNU C Compiler
this paper uses the RS/6000 for all its examples, the techniques described here are applicable to most machine
Application-Specific Memory Subsystems
The disparity in performance between processors and main memories has
led computer architects to incorporate large cache hierarchies in
modern computers. These cache hierarchies are designed to be
general-purpose in that they strive to provide the best possible
performance across a wide range of applications. However, such a memory
subsystem does not necessarily provide the best possible performance for
a particular application.
Although general-purpose memory subsystems are desirable when the
work-load is unknown and the memory subsystem must remain fixed,
when this is not the case a custom memory subsystem may be beneficial.
For example, in an application-specific integrated circuit (ASIC) or
a field-programmable gate array (FPGA) designed to run a particular
application, a custom memory subsystem optimized for that application
would be desirable. In addition, when there are tunable
parameters in the memory subsystem, it may make sense to change these
parameters depending on the application being run. Such a situation
arises today with FPGAs and, to a lesser extent, GPUs, and it is
plausible that general-purpose computers will begin to support
greater flexibility in the memory subsystem in the future.
In this dissertation, we first show that it is possible to create
application-specific memory subsystems that provide much better
performance than a general-purpose memory subsystem. In addition,
we show a way to discover such memory subsystems automatically using
a superoptimization technique on memory address traces gathered
from applications. This allows one to generate a custom memory subsystem
with little effort.
We next show that our memory subsystem superoptimization technique can
be used to optimize for objectives other than performance. As an example,
we show that it is possible to reduce the number of writes to the main
memory, which can be useful for main memories with limited write
durability, such as flash or Phase-Change Memory (PCM).
Finally, we show how to superoptimize memory subsystems for streaming
applications, which are a class of parallel applications. In particular, we
show that, through the use of ScalaPipe, we can author and deploy streaming
applications targeting FPGAs with superoptimized memory subsystems.
ScalaPipe is a domain-specific language (DSL) embedded in the Scala
programming language for generating streaming applications that can be
implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we
are able to demonstrate actual performance improvements using the
superoptimized memory subsystem with applications implemented in hardware