12 research outputs found
From loop transformation to hardware generation
Multimedia applications are examples of a class of algorithms that are both calculation and data intensive and have real-time requirements. As a result dedicated hardware acceleration is often needed. Usually the on-chip memory is not sufficient to store all data and has to be extended with external memory. The bandwidth to this memory often becomes a bottle neck. Loop transformations are needed to reduce this bandwidth, by improving the temporal and spatial data locality. They can also unveil the parallelism present in the algorithm. The polyhedral model offers a flexible program representation that allows to automate this kind of transformations. The class of applications that can be transformed with the polyhedral model fits very well into the class of applications that can benefit from hardware acceleration. This paper describes how the existing tools, that generate software from a polyhedral program representation, have been extended to generate a VHDL description of a hardware controller. The corresponding data path is generated semi-automatically. Combining the generation of controller and data path creates a fast path to hardware. Our techniques enable an easy exploration of the design space, by generating a lot of implementation variants. The techniques are demonstrated on an inverse discrete wavelet transform resulting in several synthesizable designs, of which one has been hand-optimized towards a FPGA implementation. The results outperform those of a commercial C-to-VHDL compiler. The generated variants run 5 to 10 times faster while consuming less resources
Hardware Generation from the Polyhedral Model
Multimedia applications are examples of a class of algorithms that are both calculation and data intensive and have real-time requirements. As a result dedicated hardware acceleration is often needed. Usually the on-chip memory is not sufficient to store all data and has to be extended with an external memory. The bandwidth to this memory often becomes a bottle neck. Loop transformations are needed to reduce the bandwidth, by improving the temporal and spatial data locality, and may also help to unveil the parallelism present in the algorithm. The polyhedral model offers a flexible program representation that allows to automate this kind of transformations. The class of applications that can be transformed with the polyhedral model fits very well into the class of applications that can benefit from hardware acceleration. This paper describes how the existing tools, that generate software from a polyhedral program representation, have been extended to generate a VHDL description of a hardware controller. The corresponding data path is generated semi-automatically. Combining the generation of controller and data path creates a fast path to hardware. Our techniques enable to easily explore the design space, by generating a lot of implementation variants. After selection of a promising candidate, hand-optimizations may lead to a more optimal final implementation. The techniques are demonstrated on an inverse discrete wavelet transform resulting in several synthesizable designs, of which one has been hand-optimized towards a FPGA implementation
Finding and applying loop transformations for generating optimized FPGA implementations
When implementing multimedia applications, solutions in dedicated hardware are chosen only when the required performance or energy-efficiency cannot be met with a software solution. The performance of a hardware design critically depends upon having high levels of parallelism and data locality. Often a long sequence of high-level transformations is needed to sufficiently increase the locality and parallelism. The effect of the transformations is known only after translating the high-level code into a specific design at the circuit level. When the constraints are not met, hardware designers need to redo the high-level loop transformations, and repeat all subsequent translation steps, which leads to long design times.
We propose a method to reduce design time through the synergistic combination of techniques (a) to quickly pinpoint the loop transformations that increase locality; (b) to refactor loops in a polyhedral model and check whether a sequence of refactorings is legal; (c) to generate efficient structural VHDL from the optimized refactored algorithm.
The implementation of these techniques in a tool suite results in a far shorter design time of hours instead of days or weeks. A 2D-inverse discrete wavelet transform was taken as a case study. The results outperform those of a commercial C-to-VHDL compiler, and compare favorably with existing published approaches
Reconfigurable optical interconnects for parallel computer systems: design space issues
In highly parallel computer systems, reconfigurable interconnect network topologies can improve the. performance by adaptively increasing the communication bandwidth where it is most needed. In electrical reconfigurable interconnect networks (e.g. crossbars or multi-stage networks), a high reconfigurability can only be achieved at the cost of both chip area and network latency. The facts that short-distance optical link latencies are rapidly decreasing and that new technologies allow optical reconfigurability, make optical interconnects an interesting alternative to overcome these interconnection issues. Optical interconnection technologies indeed offer several possibilities to increase network connectivity without drastically increasing the chip area and the delay costs. In this paper we study the bandwidth and latency requirements of-inter-processor and processor-memory interconnect for shared-memory parallel computers when the processor clock increases up to 10 GHz. We also investigate new enabling technologies and discuss their potential use in architectures based on reconfigurable optical interconnects