2 research outputs found
Re-targetable tools and methodologies for the efficient deployment of high-level source code on coarse-grained dynamically reconfigurable architectures
Reconfigurable computing traditionally consists of a data path machine (such as an FPGA)
acting as a co-processor to a conventional microprocessor. This involves partitioning the application such that the data path intensive parts are implemented on the reconfigurable fabric, and
the control flow intensive parts are implemented on the microprocessor. Often the two parts
have to be written in different languages. New highly parallel data path architectures allow parallelism approaching that of FPGAs, but are able to be reconfigured very rapidly. As a result, it
is possible to use these architectures to perform control flow in a manner similar to a microprocessor, and thus a complete program can be described from an unmodified high-level language
(in particular C). This overcomes the historical instruction-level parallelism (ILP) wall.To make full use of the available parallelism , existing microprocessor tool flows are insufficient.
Data path machines are typically programmed via HDL tools from the ASIC design world.
This expresses algorithm s at a low er level than the application algorithm s are typically developed in. The work in this thesis builds upon earlier work to allow applications to be described
from high-level languages, by employing low-level optimisations in the compiler back-end and
working from the assembly, to maximise parallel efficiency. This consists of scheduling, where
known techniques are used to pack instructions into basic blocks that map well to the reconfigurable core (optimising spatial efficiency); then automatic pipelining is applied to dramatically
improve the achievable throughput (optimising temporal efficiency). Together these can be
thought of as “instruction-level parallelism done right”. Speed-ups of more than an order of
magnitude were achieved, yielding throughputs of 180-380M Pixels/s on typical image signal
processing tasks, matching the performance of hard-wired ASICs.Furthermore, conventional software-based simulation technologies for data path machines are
too slow for use in application verification. This thesis demonstrates how a high-speed software
emulator can be created for self-controlled dynamically reconfigurable data path machines,
using a static serialisation of the data paths in each configuration context. This yields run-time
performance several orders of magnitude higher than existing techniques, making it suitable for
use in feedback-directed optimisation
Execution-based Scheduling for VLIW Architectures
We describe a new dynamic software scheduling technique for VLIW architectures, which compiles into VLIW code the program paths that are actually executed. Unlike trace processors, or DIF, the technique executes operations speculatively on multiple paths through the code, is resilient to branch mispredictions, and can achievevery large dynamic window sizes necessary for high ILP. Aggressive optimizations are applied to frequently executed portions of the code. Encouraging performance results were obtained on SPECint95 and TPC-C.Thetechnique can be used for binary translation for achieving architectural compatibility with an existing processor, or as a VLIW scheduling technique in its own right