Introduction
General purpose microprocessors are designed to deliver low latency computation with maximal clock frequencies leading to high power consumption. In terms of performance and power consumption, latency tolerant applications can be efficiently implemented with architectures that provide high throughputs, such as Imagine [4] , Score [3] , RaPiD [S] , and the PCI-PipeRench [Z] .
Stream architectures [2] are fully pipelined, high throughput microarchitectures. Algorithms are executed by mapping dataflow graphs to hardware and streaming the data through the architecture. In the best case, when all the loops are fully unrolled, the dataflow graph is acyclic and the clock frequency of the design is the data-rate.
In this work we develop a domain specific compiler for programming FF'GAs, StReAm. StReAm is build on top of the module generation environment PAM-Blox[ 11.
StReAm: Object-oriented Programming FPGAs
StReAm is a domain specific tool build on top of PAM-Blox. In case of an adder, the template parameter is the bit-width of the adder. The instantiation of a particular object based on the template class creates an adder of the appropriate size. Thus, the designer can adapt the arithmetic units to the specific needs of the application. The benchmarks in section 6 give a number of examples for application specific extensions and demonstrate the features mentioned above. required for automatic scheduling and placement. The schedu l e ( ) function also creates all the required FIFO buffers and supplies the sequential and serial components with start signals. Finally, placement methods of hardware objects determine relative placement within the hardware objects.
StReAm currently supports arrays of the hardware integer type m i n t , expressions with m i n t ' s and C++ integers resulting in hardware constants, and static 'for' loops.
'
Calling the <Name : :build ( ) function creates the dataflow
Scheduling Stream Architectures
The scheduling algorithm creates distributed FIFO buffers and component start signals (for sequential components). The scheduler also calculates overall latency and data-rate for a given stream architecture.
Operator overloading creates a dataflow graph with arithmetic units as nodes. The scheduling algorithm traverses the dataflow graph including pointers to the components, data dependencies and sequencing information such as component latency and through-put. The scheduler retrieves the sequencing information from the state of the hardware objects.
Example: Inverse Discrete Cosine Transform (IDCT)
The IDCT is used in signal and image processing (e.g. MPEG, H.263 standards). We implement an 8x8 I-dimensional IDCT. The StreaModule below is based on an optimized IDCT implementation [6] . 
Conclusions and Future Work
StReAm applies the object-oriented design methodology to programming FPGAs. FPGAs offer the flexibility to adapt the number representation, precision, and arithmetic algorithm to the particular needs of the application. Yet, in general it is difficult to explore completely different arithmetic solutions. Combining module generation with a high-level programming tool in C++ gives the programmer the convenience to explore the flexibility of FPGA on the arithmetic level and write the algorithms in the same language and environment. Thus , StReAm enables the design of very efficient P G A circuits at a fraction of the design effort for hand-designed solutions.
For StReAm the key enabling C++ technologies are dynamic operator overloading and template functions. Furthermore, just as in PAM-Blox, the class hierarchy, inheritance, template classes and method overloading enable efficient code-reuse and management of large systems. 
References

