Abstract. This paper presents new mappings of 2D and 3D geometrical transformation on the MorphoSys (M1) reconfigurable computing (RC) prototype [2] . This improves the system performance as a graphics accelerator [1] [2] [3] [4] [5] . Three algorithms are mapped including two for calculating 2D transformations, and one for 3D transformations. The results presented indicate an improved performance. The speedup achieved is explained as well as the advantages in the mapping of the application. The transformations on an 8x8 RC array were run, and numerical examples were simulated to validate our results, using the MorphoSys mULATE program, which simulates MorphoSys operations. Comparisons with other systems are presented, namely, with Intel processing systems and Celoxica RC-1000 FPGA.
Introduction
Reconfigurable computing (RC) is becoming more popular and increasing research efforts are being invested in it. It employs reconfigurable hardware and programmable processors. The application is mapped such that the workload is divided between the general-purpose processor and the reconfigurable device. The use of RC opens the way for an increased speed over general-purpose processors and a wider functionality than application specific integrated circuits (ASICs). It is a good solution for applications requiring a wide range of functionality and speed at the same time [1] .
MorphoSys Design
One of the emerging RC systems includes the MorphoSys designed and implemented at the University of California, Irvine. It is composed of: 1) an array of reconfigurable cells called the RC array, 2) its configuration data memory called context memory, 3) a control processor (TinyRISC), 4) a data buffer called the frame buffer, and 5) a DMA controller [2] .
The link to the formal publication is via https://doi.org/10.1007/3-540-46117-5_111
A program runs on MorphoSys in the following manner: General-purpose operations are handled by the TinyRISC processor, while operations that have a certain degree of parallelism, regularity, or intensive computations are mapped to the reconfigurable array (RC-Array).
Geometrical Transformations

First 2D Algorithm Mapping
The main usage of the MorphoSys is, as any parallel processor, to perform fast computations of algorithms that need a certain computational power requirement. Computer graphics algorithms represent one of these families. A basic part in computer graphics operations is geometrical transformations, which require fast computations of matrix operations, namely, matrix multiplication which is the core part of any geometrical transformation. The emphasis in this paper is the mapping of matrix multiplication on the MorphoSys for the use with computer graphics 2D transformations, using the supported internal configuration of the RCs. This Algorithm could be mapped onto the M1 RC-Array as follows: The contents of the matrix A are passed row by row through the context words. The contents of matrix B are broadcasted row by row to the columns of the RC-Array. The multiplication stage (row x column) is done by using the CMUL ALU operation where the output of the reconfigurable cell is: Out (t) = A x B. Then, the results output from the ALU of each RC need to be accumulated in a row-wise manner so we can get the first row of the output matrix C from the last column of the RC-Array. This is done by using the ALUoperation CMULOADD where Out (t + 1) = Out (t) + Out [From Left Cell] . Indeed, the contents of column 7 of the RC-Array are stored back to the frame memory and then to the main memory. The same steps are repeated with the same context word but with different constant field containing the data from matrix A until obtaining the resultant matrix C.
Second 2D Algorithm Mapping
Using the same above algorithm, in this section, we introduce a new mapping, taking the advantages of the MorphoSys reconfigurable array topologies. The new mapping uses the upper left quadrant along with the bottom right quadrant. Where, the matrices are considered to be of size 4x4.
3D Geometrical Transformations
The basic purpose of composing transformations is to gain efficiency by applying a single composed transformation to a point, rather than applying a series of The link to the formal publication is via https://doi.org/10.1007/3-540-46117-5_111 transformations, one after the other. Our proposed mapping assumes column broadcast mode where all the cells in the same column perform the same function. The desired functions of the interconnection are: Out(t+1) = AC, Out(t+1) = AC + Out(t), and Out(t+1) = AC -Out(t).
Performance Analysis
The performance is based on the execution speed of the algorithms. The MorphoSys system is considered to be operational at a frequency of 100 MHz. Comparisons among the suggested systems are given in Table 1 recalling some previous findings from [4] [5] , showing the speedup factor of the M1 over the other suggested systems. The speedup factor is calculated as the ratio in number of cycles between the M1 and the other suggested systems. Comparisons with the RC-1000 Celoxica FPGA are shown in Table 2 . 
