A new techni ue for map ing combinational circuits to Fine-Grain 8ellular-Arcktecture FPGAs is p r e sented. The proposed tree restructuring algorithm preserves local connectivity and allows direct ma ping of the tree to $e cellular array, thus eliminating tEe tmiitional routm phase. The developed bus assi nment algorithm efacientl utilizes medium and long %;stance routing resources &uses). The-method is enefal and can be used for any Fine-Gram Cellular-kchtecture P G A . To demonstrate our techniques, the L 6000 series P G A was used as a target architecture. The results are very encouraging.
INTRODUCTION
Field Programmable Gate Arrays (FPGAs) have become a popular design technology for designers seeking fast and cost effective im lementation of their circuits. In recent years a lot of effort was spent on the development of the technology mappin and layout synthesis methods for two categories of 8ese devices, namely Look-Up-Table based CUT-based) and rowbased F'PGAs. A number of the architecture-s cific technology mapping approaches were developG6,7l, but most of the placement and rouhng techni ues were ado ted from the semi-custom design stylesyike standari cells and gate arrays, with some modifications. The other trpes of the rogrammable architectures, Cellular-Architecture (CAT type FPGAs and the Complex Programmable Lo ic Devlces (CPLD) have not drawn much attention of &e CAD research community. In this paper, we focus on layout synthesis for one of these rogrammable architectures, namely CA-type Y G p S f2]. The m a n features of these devices, whch disunguish them from the other es of FPGAs is the local connectivi between logic ?f&ks placed as a symmetrical array.%ogic blocks are usually of small granularity and of the standard-cell t e with a limited number of inputs and outputs. L o c p o r gIobal buses are used for distance connecbons.
The "macro block" a proach which is currently used in the industry to soge the layout problem for these devices is based on macro-generators. A technology independent representation of a circuit design is covered with a minimum number of relative1 small standard subfunctions (macros) which usu& have non-uniform shapes. Placement of macros IS usually performed using a simulated annealin algorithm, which places %e macros far from each otfer to assure that in the rouhng phase all connections between mar-M. Chrzanowska- [F,lO] . In order to generate the circuit layout some of these methods re uire an additional layout synthesis step and some ozers do not. For example, if logic functions are represented as a two-dimensional array, the can be dlrectly mapped onto a iven CA-type arcktecture. The spectral methods basefon orthogonal ex ansions [lo] , and restricted factorization method [8] berong to the latter type. The ap roaches [ 1,5,10] based on trees.apd decision diagrams 61 which preserve local connechvity but still require the placement and routing steps, have also been reported. In most cases, however, when the tree is finally ma ped to a rectangular area, the triangular structure of tie tree may waste a large amount of area. Therefore, new comprehensive solutions to the optimized mapping of such trees to the regular, locally-connected arrays are of interest.
In this pa er, we propose a new a roach to the ma ping of +e Einary trees onto the ~2Xs with local- . I I I I 1 I Fig. 1 A generic CA-type P G A
The set of lo 'c functions which can be implemented in one logickck is defined by the block architecture. All prixnitive logic functions like OR, NOR, NAND, AND, EXOR, 2-input mux, and a few combinations of the above gates can be realized by a logic block All gates are two-input ates and the maximum number of outputs from a logic {lock is two.
useful for mapping circuits to the W -w e ~7 L s as the connections between logic blocks, represented as the vertices of the tree, are local and each node in the tree has only connections to its parent and child nodes. In a binary tree, the maxiqum node degree is three, -and therefore this aon can be realized-by ping onl adjacent f t $ and local connecQons ~I I the cel&Iar array.
The exponential growth of the number of nodes as a funcaon of the tree level can, however, result ~I I a very inefficient mapping. For most of the real functions, the shape of the tree is not gettin as much wider at the bottom as could be expected. ' h e e is expanding from the root for a few levels, then the width of the tree tends to stay constant, and decreases towards the leaves of the tree. Therefore. by developing a good restructuring method that sha can be easily mapped to the cellular architecture Wigut wasting many logic blocks for routing.
In eneraI, any binary decision diagram or binary tree can f e used as an input to our rrestructunng algo-A tree structure is ve rithm. However, to get beer results a tree which can provide the best matching between a structure of the tree and functionality of logic blocks in a given FPGA should be selected. 
MAPPING PROBLEM FORMULATION
PRMT which ?presents a given Boolean function is modeled as a b m q tree T = 0). E) which consists of the ordered set of nodes , and the set of directed edges E defined as follows: gram & is used to produces the P h representanon of the given function. Next, an optional technoIogy mapping step is introduced to perform technoloy specific o timization. The tree is then restructure using MSBftechnique, and then mapped to the target archture. Finally, the pnmary inputs are assigned to the local buses in the bus assignment phase.
Technology Mapping
This phase is specific to the architeture of the tar- 
Squashed and Modified-Squashed Binary Tree
The Squashed Binary Tree [3] approach was chosen as it gives us a possibili to sha e the tree into a rectangular form whch closJy resemgles the CA-type architecture. The rectangular shape can be directly mapped to the array satis ng the design restrictions. Mapping a SBT to the &type architecture is just a s t r a~ ht-forward process as we place each node of the S B f i n one column of the tar et array and then make the neccessary connections. &e deare explained in Sec 4.3.
The squashed binary tree is formed by projecting the binary tree onto its leaves. Starting from the root, nodes a n projected onto their left-most descendants.
Next, the-tree is traversed in the bottom-up direction [5], and if a node has two children, the rocess is repeated start+g with the child node w h i c l ! was not rojected ear!ier. Fi ures 4(b) and 4(c) represent. the !quashed Bm free (SBT) and the Modified Squashed Binaryyree (MSBT) of the binary tree shown in Fig. 4(a) . Fig. 5 shows the squashed binary tree representation of the PRM tree from Fig. 3 . To obtain a more compact shape of the ma ped the o n 'nal SBT al onthm was modified an$ $e $%%ed SBp(MSBT) #orithm was implemented in our package. The modi ed s uashed binary tree is formed by pro'ectin nodes of%e binary tree onto its leaves in the Jepth-kst manner. The node can be projected into its left or right descendent depending on the situtation (i.e left most descendant constraint is relaxed). As a result, a tree with a smaller number of nodes is generated. mapping is perfoFed. The redictability of the signal delays is a very unportant azvantage of thls approach. The area and delays can be optlmized by properly choosing the order of the projected nodes.
Bus Assignment
Once MSBT is created it can be directly ma ped to the CA-type array, and only the necessary bgic blocks required for routing need to be added. The exact number and the exact locations of these additional routing blocks is defined by the MSBT and hence is known a priori. Then, to complete all the connections, the bus assignment has to be performed. The rim (decomposxtion variables from the on 'nal P h y a v e to be assigned to the local buses. 6 have developed efficient heuristic which assigns the variables to the local buses such that the number of local buses used by the same variable and the number of cells needed to distribute the same signal to different buses are minimized.
The steps of the bus assignment algorithms are as follows:
Step a : For each decomposition variable di; calculate Mi, the totaI number of nodes in PRMT to which the variable dj is assigned. Form a list 'L' of variables for which bus assignment has to be done. Initially this list conmns all the p m a r y vanables.
Step Step e: If the list is not empty, repeat
Step c and
Step d until all assignments are completed. The completed ma in of the MSBT of our leading example to the 6000 series P G A is shown in Fig. 9 . 
RESULTS
The, TRh4 was tested on a set of MCNC benchmarks. Since at present TRM can handle only sin le output functions, we modified MCNC benchmarks %y extracting single output functions. We com ared the area and the number of local buses used by $quashed Binary Tree (SBT) method versus Modified Squashed Binary Tree (MSBT) ap roach, and MSBT ap roach versus commerciall ava&ble ATMEL (II?S) toofs. We assumed AWL &OOO as the tar et arcbtecture. The results are presented in Table I ancfTable II. Table I  Table I shows the results of the SBT and MSBT approaches for the modified set of MCNC benchmarks.
The second column, "PRMT, shows the number of ates required to im lement the function in PRMT form $tree generated by R% MIT). In the GROUPING section of the Table I , column "L" shows the number of logic blocks used to implement the lo ic and column "C" shows the number of connecting c&s added due to A-B restrictions of the A W L architecture [2] . Column "GT" shows the total number of cells. SBT and MSBT column sections present the results of SBT and MSBT approaches, respecbvely. R is the number of routm logic blocks added when constructing SBT or MSB? T is the total number of logic blocks required to realize the function. RT represents the size (in terms a of number of cells) of the smallest rectangle enclosin the mapped circuit, The results clearly show that d S B T approach has significantly reduced the total number of logic blocks required to implement a iven function and the size of the enclosing rectangle is $so smaller.
The layouts of the leading example are shown to illustrate the differences between our final layout and the layout generated by ATMEL tools The AThIEL generated layout is shown in Fig. 10 , and the layout generated. by our TRM package m Fig. 11 . It c q be easily notxed that our la out is more compact and gives the better utilization of d e chi resources, and therefore better perfromance. In Table Ifthe comparison between our method Tree Restructuring Mappin (TRh4) and the commercial tools packa e $S) is presented using the MCNC benchmarks. 5 is the number of buses and L is the number of logic blocks used for implementing the logic, C is the number of cells used for routing, and A re resents the rectangular area occupied by the core of %e desi n (without I/Q pins). We com ared only core area of h e mapped design because the !h' MEL tools perform bus assignment in an inefficient way. The resulting area is very large and contains a lot of unused logic cells. As it can be seen from the Table 11 our methods give much more compact layouts than the ATMEL tools. The number of local buses and logic blocks used for routing is much smaller for all run examples. The U R ratio, logic blocks to routing blocks, is high for our method, which it gives more "logic power" to the implementation of the designs and improves performance.
To compare the perfromance im rovement we have done tiqung analysis using ATME$. ( I D S package. The design, generated by our TRM pac k a e is entered using interactive editor of the A?TvfEL bS) tools, and then we run the timng analysis from the ATMEL (IDS) ackage on both im lementaions. The longest path deyays obtained .=e ,{own in Table JII. The avera e improvement achwed with our approach is around ~oo%.
CONCLUSIONS
We pro osed a new tree restructurin method for mapping cqmiinatorial circuits o?to CA-&~ FPGA~. B resemng the local connectivity among the logic b a s th e routing phase was completely eliminated.
Ma ing process is straight forward, and therefore enagyes predictability of the signal delays, which is very important advantage of this method.
Our TFUvl pro am is independent of the logic optimization steps a s L g as the function is represented as a binary tree. Our method is a general method and can be a plied for a general class of CA-type FPCAs. The resurts on some MCNC benchmarks shows that our method is better both in area and dela when compared to commericall available tools. Arrently, we are working toward extending the TRM for multi-output funcQons.
Lo ic Synthesls for Cellular FPGAs Based on 0x810 onal Expansions," Pmc. IFIP WG 10.5 Works%op 
