Abstract. MorphoSys is a reconfigurable architecture for computation intensive applications. It combines both coarse grain and fine grain reconfiguration techniques to optimize hardware, based on the application domain. M2, the current implementation, is developed as an IP core. It is synthesized based on the TSMC 0.13 micron technology. Experimental results show that for multimedia applications MorphoSys has a performance comparable to ASICs with the added benefit of being able to be reconfigured for different applications in one clock cycle.
Introduction
Reconfigurable systems are an intermediate approach between the Application Specific Integrated Circuits (ASICs) and general purpose processors. They have wider applicability than ASICs while their performance is comparable to them. On the other hand, multimedia applications comprise of several subtasks with different characteristics. This feature in addition to a large set of input and output data, lead to an uneconomical solution in ASIC, and low performance solution on general purpose architectures. Reconfigurable systems are considered as an alternative approach for developing architectures for multimedia and DSP applications, Raw [1] , PipeRench [2] , Garp [3] and MorphoSys [4] are ongoing research projects in this area. In this paper M2, a new implementation of MorphoSys is introduced. Comparison to M1, the previous implementation, M2 consists of high performance functional units, and optimized memory architecture. M2 implementation goes through a fully automated methodology starting from an IP core.
In Sections 2 basic structure of MorphoSys are described. Section 3 describes M2 reconfigurable cell with emphasis on the new features. Section 4 discuses the parallel structure of MorphoSys. Section 5 evaluates some multimedia applications on the M2. Figure 1 shows the basic building blocks and their connections in MorphoSys. RC-Array is the reconfigurable part of the system. It consists of an 8 by 8 array of reconfigurable cells (RCs). The configuration data is stored in the context memory. During the execution, the context word is loaded from the context memory to the context registers of the reconfigurable cell. Frame buffer is an embedded data memory in the MorphoSys. It gets the data from the external memory and feed the RC-Array with the appropriate data. All the data movements between the MorphoSys memory elements and the external memory are handled by the DMA controller. TinyRisc [5] is a general purpose 32-bit RISC processor. It controls the sequence of operations in MorphoSys as well as executing non-data parallel operations.
MorphoSys Architecture

Reconfigurable Cell Architecture
Reconfigurable Cells (RCs) are the main processing units in MorphoSys. Each RC consists of four types of basic elements. Functional units for arithmetic and logic operations, Memory element to feed the functional units and store their results, input and output modules to connect cells together to form the RC-Array architecture and a fine grain reconfigurable logic block. 
MorphoSys Parallel Architecture
MorphoSys parallel architecture is based on the connection of reconfigurable cells and the way that they are connected to the memory. In this section these structures will be discussed.
Reconfigurable Cell Array
RC-Array forms the parallel architecture of MorphoSys. In M2 all RCs in a single row or column of RC-Array are connected together, while in M1 the connection is only pyramid based. High connectivity in M2 simplifies data movement between RCs. Another new feature in RC-Array is register sharing between RCs in a row or column. By this feature all RC registers in a row or column can be accessed by other RCs in a single cycle. These features simplifies the programming of the system.
Frame Buffer
Frame buffer is a dual port memory architecture. It gets the data from the external memory and provides them for RC-Array and TinyRisc. Two ports can access the frame buffer simultaneously, so reading or writing can be done by the RC-Array or TinyRisc and at the same time DMA is transferring data between the frame buffer and external memory. Figure 3 shows the frame buffer interface in the system. Frame buffer interface to RC-Array is through a reconfigurable bus. To exchange data between the frame buffer and the RC Array first the bus configuration should be established by loading the appropriate data to the frame buffer configuration tables and then the read or write operation can be done. The flexibility of the frame buffer makes it simple to access data, reorder them and feed the processing elements as fast as possible. input stream is loaded to all RCs. After increasing the address pointer of the first input, then the second input is loaded one by on to the whole RC and multiply add (MAC) operation is performed. The total number of cycles in this approach is 256 cycles and the result is 32 bit. Figure 4 compares the result of these benchmarks with TMS320 [7] .
Algorithm Mapping and Performance Analysis
Conclusion
In this paper M2, a new implementation of MorphoSys has been introduced. M2 follows the basic concepts of MorphoSys, but it is optimized for computation intensive applications. It is a coarse grain reconfigurable architecture, with a set of fine grain blocks. Its memory architecture is highly optimized to overcome the high demands for data movement and shuffling in multimedia applications. Experimental results show that Morphoys architecture has a performance comparable to multimedia processors and ASICs.
