1 research outputs found
Datapath and memory co-optimization for FPGA-based computation
With the large resource densities available on modern FPGAs it is often the available
memory bandwidth that limits the parallelism (and therefore performance) that can be
achieved. For this reason the focus of this thesis is the development of an integrated
scheduling and memory optimisation methodology to allow high levels of parallelism to be
exploited in FPGA based designs.
A manual translation from C to hardware is first investigated as a case study,
exposing a number of potential optimisation techniques that have not been exploited in
existing work. An existing outer loop pipelining approach, originally developed for VLIW
processors, is extended and adapted for application to FPGAs. The outer loop pipelining
methodology is first developed to use a fixed memory subsystem design and then extended
to automate the optimisation of the memory subsystem. This approach allocates arrays
to physical memories and selects the set of data reuse structures to implement to match
the available and required memory bandwidths as the pipelining search progresses. The
final extension to this work is to include the partitioning of data from a single array across
multiple physical memories, increasing the number of memory ports through which data
my be accessed. The facility for loop unrolling is also added to increase the potential for
parallelism and exploit the additional bandwidth that partitioning can provide.
We describe our approach based on formal methodologies and present the results
achieved when these methods are applied to a number of benchmarks. These results show
the advantages of both extending pipelining to levels above the innermost loop and the
co-optimisation of the datapath and memory subsystem