Tree Restructuring Approach to Mapping Problem in Cellular Architecture FPGAS by Ramineni, Narahari
Portland State University 
PDXScholar 
Dissertations and Theses Dissertations and Theses 
2-10-1995 
Tree Restructuring Approach to Mapping Problem in 
Cellular Architecture FPGAS 
Narahari Ramineni 
Portland State University 
Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds 
 Part of the Electrical and Computer Engineering Commons 
Let us know how access to this document benefits you. 
Recommended Citation 
Ramineni, Narahari, "Tree Restructuring Approach to Mapping Problem in Cellular Architecture FPGAS" 
(1995). Dissertations and Theses. Paper 4914. 
https://doi.org/10.15760/etd.6790 
This Thesis is brought to you for free and open access. It has been accepted for inclusion in Dissertations and 




The abstract and thesis of Narahari Ramineni for the Master of Science in Electrical 
and Computer Engineering were presented February 10, 1995, and accepted by the thesis 






Representative of the Office of Graduate Studies 
DEPARTMENT APPROVAL: 
 
Rolf Schaumann, Chair 
Department of Electrical Engineering 
************************************************ 
ACCEPTED FOR PORTLAND STATE UNIVERSITY BY 11IB LIBRARY 
by
(.. 






An abstract of thesis of Narahari Ramineni for the Master of Science in 
Electrical and Computer Engineering presented on February 10, 1995. 
Title:Tree Restructuring Approach To Mapping Problem In Cellular_Architecture 
FPGAS 
This thesis presents a new technique for mapping combinational circuits to 
Fine-Grain Cellular-Architecture FPGAs. We represent the netlist as the binary tree 
with decision variables associated with each node of the tree. The functionality of 
the tree nodes is chosen based on the target FPGA architecture. The proposed tree 
restructuring algorithms preserve local connectivity and allow direct mapping of the 
trees to the cellular array, thus eliminating the traditional routing phase. Also, pred-
ictability of the signal delays is a very important advantage of the developed 
approach. The developed bus-assignment algorithm efficiently utilizes the medium 
distance routing resources (buses). The method is general and can be used for any 
Fine Grain CA-type FPGA. To demonstrate our techniques, ATMEL 6000 series 
FPGA was used as a target architecture. The area and delay comparison between 
our methods and commercial tools is presented using a set of MCNC benchmarks. 
Final layouts of the implemented designs are included. Results show that the pro-
posed techniques outperform the available commercial tools for ATMEL 6000 
FPGAs, both in area and delay optimization. 
" 




A thesis submitted in partial fulfillment of the 
requirements for the degree of 
MASTER OF SCIENCE 
in 
ELECTRICAL AND COMPUTER ENGINEERING 





I would like to thank Dr. Malgorzata Chrzanowska-Jeske, my advisor, for provid-
ing guidance in my research work and supporting my career growth. I thank her for 
the methodical introduction to the latest research by organizing seminars and the read-
ing and conference group, which had help me to gain knowledge beyond the research 
area. I would like to thank Dr. Marek A. Perkowski and Dr. Maria Balogh for serving 
on my committee and for their numerous suggestions in the preparation of the thesis. 
I am grateful to Naveen Buddi for his patience in going through my thesis, 
papers, presentations and making valuable suggestions and corrections. A special per-
son I want to mention here is Ajith Kumar Dasari. We have been supportive of each 
other in completing the academic work and building professional career. I also thank 
all my co-researchers for their suggestions in my thesis work. 
I would like to acknowledge the travel grants I received from Academically Con-
trolled Activities Committee at Portland State University. It was through their gen-
erous funding of my participation in 36th Midwest Symposium On Circuits & Systems 
and the first publication of this thesis work was made possible. 
And my special thanks go to Ms. Shirley Clark for her support through these 
years. I also would like to thank Laura Riddell, and all the office staff for providing the 
facilities and help. 
This thesis work is dedicated to my mother who provided me with the best educa-
tion right from my primary school and consistently encouraged and supported me to 
pursue higher education without which, this work would not have been possible. 

TABLE OF CONTENTS 
PAGE 
LIST OF TABLES .. .. .. .. .. .. .. .. .. .. .. .. .. .... .. .. .. .... .... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .... .. .. .... ... vi 
LIST OF FIGURES................................................................................................ vii 
CHAPTER 
I Introduction . .... .. ...... .... .... .. .. ...... .... .. .. ...... .......... .... ........ ............ ......... 1 
II Cellular Arrays ................................................................................... 7 
Il.1 Cellular Logic ........................................................................... 7 
II.2 Generic CA-type FPGA ........................................................... 11 
III Layout-Driven Logic Synthesis ......................................................... 15 
IV Architecture of ATMEL 6000 FPGA ....... ............ ........ ........ ............. 22 
IV.1 The Symmetrical Array .......................................................... 22 
IV.2 The Bussing Network ............................................................. 22 
IV.3 Cell States ............................................................................... 24 
IV .4 Architecture Restrictions ................................. .... .. .... .... .. ....... 28 
V Our Approach ..................................................................................... 29 
V .1 Tree Restructuring .. .. .. .. .. .. .. .. .. .. .. .. .. .... .... .. .. .. .. .. .... .. .. .. .. .. .. .. .. .. . 29 
V.2 Problem Formulation .............................................................. 31 
~ 




VI Description of Algorithms .. ...... .. .... .. .. .. .. .. .. .. .. .. .... .... .... .. .. .. .. .... .. .. .. .. . 34 
VI.1 Technology Mapping ............. .. ................ .... .................... ....... 34 
VI.2 Squashed Binary Tree (SBT) .................................................. 41 
VI.3 Modified Squashed Binary Tree (MSBD ............................... 44 
VI.4 Bus Assignment . ...... .... .... ........ .... .. .................. ...... ...... .... ....... 45 
VI.5 Multi-Output Functions ....... ...... .... ........ .. ........ .. ...... ............... 48 
VI.5 Comparision with Other Methods ........................................... 48 
Vil Results ................................................................................................ 52 
VIII Conclusions ........................................................................................ 57 
References .. ........ .... .... .. .. .. ........ .. .. .. .. .. ................ .......... .. .. .......... .... ...... ............ ..... 59 
APPENDIX A ··································································································· 64 
APPENDIXB ··································································································· 71 
LIST OFT ABLES 
TABLE PAGE 
I Cell Utilization SBT Vs. MSBT ........................................................ 53 
Il Area Comparison TRM Vs. ATMEL (IDS) ...................................... 54 
Ill Delay Comparison TRM Vs. ATMEL (IDS) ........ .. .. .. .. .......... .... ....... 55 
LIST OF FIGURES 
FIGURE PAGE 
1.1 Typical CAD System for FPGAs ........ ............................................... 3 
2.1 Maitra Cascade .... .... .. ...... ...... ............ ................................................. 8 
2.2 Redundant Maitra Cascade ........... ...... .. ...... ........ .... .... ....................... 8 
2.3 Two Rail Cascade ... .. .... .. ...... .. ........ ........ ............ .... ........ .... ............... 9 
2.4 A Generic CA-cype FPGA ................................................................. 12 
2.5 Logic Block of Algotronix FPGA ...................................................... 13 
2.6 Logic Block of Motorola FPGA ........................................................ 14 
3.1 Circuit Realization of Shannon and Davio I and II Expansion .......... 18 
3.2 PRM Tree of the Boolean Function from Example 1 ....................... 21 
4.1 Symetric Array of the ATMEL 6000 Chip ................................... ..... 23 
4.2 Bussing Network ............................................................................... 24 
4.3 Basic Cell ........................................................................................... 25 
4.4 Combinatorial States ......... .......... ........ ........ .... .... .... ........ ........ ........ ... 25 
4.5 AND-EXOR Realization ........... ........ ........ ........ ........ .......... ............... 26 
4.6 Register and Constant States .............................................................. 27 
4. 7 Four Possible Input Configurations to a Cell ...... .... .................... .... ... 28 
5.1 Binary Tree to Squashed Binary Tree Transformation ...................... 30 
5.2 TRM Flowgraph ................................................................................. 33 
6.1 Illustration of Type I Mismatch ... .................... .............. ................... 35 
6.2 "Switch Cell" - Solution to Type I Mismatch ................................... 36 
6.3 A Two Cell Realization of the EXOR-AND Chain........................... 37 
vm 
6.4 One Cell Realization of the EXOR-AND Chain ............................... 38 
6.5 Grouped PRMT of the Example from Figure 3.2 .............................. 39 
6.6 A Grouped PRMT as a Binary Tree ................................................... 40 
6.7 Squashed Binary Tree (SBT) ............................................................. 41 
6.8 Mapping from SBT to the Cellular Array ......................................... 42 
6.9 Critical Path Control .......................................................................... 43 
6.10 Modified Squashed BinJry Tree (!\ fSBD .......................................... 44 
6.11 1\fapping MSB T to the Ceilubr Array ............................................... 45 
6.12 Final Mapping of !\,1SBT with Bus Assignment ................................ 47 
6.13 Ci\1LA: Two-Dimensional Array....................................................... 49 
6.14 CMLA: Realization of a Function ..................................................... 51 
7 .1 Layout of MCNC Sx 1 Benchmark Generated by ATMEL Tools ..... 56 
7.2 Layout of MCNC Sxl Benchmark Generated by TRM Package ...... 56 
CHAPTER I 
INTRODUCTION 
Field Programmable Gate Arrays (FPGAs) have become a popular design tech-
nology for designers seeking fast and cost effective implementations of their circuits. 
FPGAs are regular arrays of programmable logic blocks connected by programmable 
interconnections. Both, logic blocks and interconnections are programmable by the 
user, circuit or system designer. The available FPGA architectures differ in sizes and 
functionality of the logic blocks, in the floorplan of the chip, and in the structure of the 
routing domain. Based on these differences, FPGAs can be classified into four basic 
groups; Complex PLDs, Look-Up-Tables, Row-Based, and Fine-Grain Cellular-
Architecture FPGAs [7]. Programming technology available to program these devices 
are anti-fuses, static RAM cells, EPROM transistors, and EEPROM transistors. The 
existence of the different chip architectures combined with the fast and cheap redesign-
ing gives the designer possibility to choose an FPGA which has the optimal features 
for the specific application. Therefore, FPGAs provide an excellent solution to the 
demands of shrinking development cycles, rapid prototyping, and applications that 
benefit from re-programming in field. 
In recent years a lot of effort was spent on the development of the technology 
mapping and layout synthesis methods for the two most popular categories of these 
devices, namely look-up-table based (LDT-based) and row-based FPGAs. A number 
of the architecture-specific technology mapping approaches were developed 
2 
[9,10,ll,37,38,39,40], but most of the placement and routing techniques were adopted 
from the semi-custom design styles like standard cells and gate arrays, with some nec-
essary modifications. The other two types of the programmable architectures have not 
drawn so far much attention of the research community. In this thesis we focus on lay-
out synthesis for one of these other categories of FPGAs, mainly Cellular-Architecture 
(CA) FPGAs [2,8,13]. The main feature of these devices, which distinguished them 
from the other groups is the local connectivity between logic blocks placed in a sym-
metrical array. Logic blocks are usually of the small granularity and of the standard-
cell type with a limited number of inputs and outputs. Local or global buses are used 
for distance connections. The method of separate technology mapping, placement and 
routing, used for other FPGAs, become of little value for these devices primarily due to 
the local connectivity. Therefore, new comprehensive methods to the layout-synthesis 
problem need to be developed to efficiently utilize the potential of CA-type FPGAs. 
Typical steps involved to implement a circuit using programmable hardware 
devices are shown in Fig 1.1. If this traditional approach is used for CA-type FPGAs, 
a large number of logic cells is used for wiring connections or left unused [13]. This 
problem is mainly caused by not preserving local connectivity during the synthesis 
steps. Frequently, local buses need to be used to complete even very short connections, 
which increases circuit delay. To avoid the excessive usage of the local buses a better 
solution is to use different logic implementations with logic cells used as wiring cells. 
But this will lead to wasting a large number logic cells for wiring in the technology 
mapping. 
The "macro block" approach which is currently used in the industry [13), to solve 
the layout problem for these devices is based on macro-generators. A technology inde-







Programming unit to configure FPGA 
Figure 1.1 Typical CAD System for FPGAs 
relatively small standard subfunctions (macros) which usually have non-uniform 
shapes. Placement of macros is usually performed using simulated annealing algo-
rithm, which places the macros far from each other to assure that in the routing phase 
all connections between m~}s can be completed. This technique leaves a lot of 
unused cells around the placed macros for possible use as routing blocks. Conse-
quently, the number of cells which need to be used for routing between macros is very 
large. In ATMEL 6000, series on average, about 70% of the area occupied by a design 
is used for wiring connections or left unused [2,13]. This problem is mainly caused by 
not creating locally connected netlist during the synthesis steps. As the routing 
4 
resources are very limited, efficient usage of these resources can significantly reduce 
the area occupied by the design and thereby increase the capacity of the chip and 
improve circuit performance. 
Recently, several logic synthesis approaches applicable to CA-type FPGAs have 
been presented [l,11,17,43,44]. In order to create a circuit layout some of these meth-
ods require an additional, separate layout synthesis step and some others do not For 
example, if logic functions are represented as a two-dimensional array, they can be 
directly mapped onto a given CA-type architecture. The spectral methods based on 
orthogonal expansions [17] and Universal XOR Forms, [16] and restricted factorization 
method [11] based on the classical cellular arrays [15] belong to the latter type. While 
the spectral methods are more general and usually lead to a better solution, the alge-
braic one leads to much more efficient algorithms. The approaches [1,6,12,17] based 
on trees and decision diagrams [14] which preserve well local connectivity have also 
been reported. In most cases, however, when the tree is finally mapped to a rectangular 
area, the triangular structure of the tree may waste a large amount of area. Therefore, 
new comprehensive solutions to the optimized mapping of such trees to the regular, 
locally connected arrays are of interest. 
Among new approaches to the logic synthesis for CA-type FPGAs recently intro-
duced decision diagrams are based on the decomposition of Boolean functions using 
combinations of Shannon and two Davia expansions. Function representations based 
on EXOR gates are especially attractive for CA-type FPGA, because EXOR gates are 
usually available in logic blocks of these devices. The cost of these EXOR gates is 
comparable with other gates, because fan-in is usually low. In addition, EXOR gate 
based representations usually give the more compact implementation of the function. 
These various decision diagrams have structure of the binary tree with decomposition 
5 
variables associated with each node of the tree. 
In this thesis, we propose a new approach to the mapping of the above mentioned 
decision diagrams onto the FPGAs with localized connections, like the CA-type 
FPGAs. The approach is based on restructuring of the binary tree before final mapping 
is performed. A Squashed Binary Tree (SBT) [3] and a new Modified Squashed 
Binary Tree (MSBT) approaches are used to restructure a binary tree, such that when it 
is mapped to the CA-type FPGA it will result in a rectangular shape rather than trian-
gular shape of the occupied area. The method developed here is applicable to any gen-
eral binary tree. To illustrate our approach, we present results for the input represented 
as Permuted Reed-Muller Tree (PRMT), which is obtained by applying Davia I Expan-
sion [14] to a Boolean function. Our general approach is presented using ATMEL 
6000 Series of FPGAs as the target architecture, but it has to be underlined that our 
approach, can be adopted to other CA-type FPGAs, such as Motorola, Algotronix or 
Pilkington. Algorithms are written in C and implemented on SPARC 10 workstation. 
The thesis is organized as follows. In Chapter II we present a generic model of 
the CA-type FPGA and discuss research on cellular arrays. Chapter III describes the 
logic synthesis methods to generate various binary decision diagrams especially Per-
muted Reed-Muller Tree and its relation to the architecture of the CA-type FPGAs. In 
Chapter IV architecture and restrictions of the ATMEL 6000 FPGA are discussed. Our 
general Tree Restructuring Method (TRM) approach based on the Squashed Binary 
Tree (SBT) algorithm together with formulation of the problem are presented in Chap-
ter V. Chapter VI discusses the solution methods to the tree restructuring problem. 
The new solution method introduced by the author i.e. Modified SBT (MSBT) 
approach will be discussed along with Bus-Assignment algorithm. We also compare 
our method to the other methods being currently developed for CA-type FPGAs. 
6 
Results comparing SBT and MSBT approaches are given in Chapter VII to show 
that our approach gives better results both in area and delay optimization compared to 
the ATMEL IDS tools for all examined designs. Conclusions from our research are 
given in Chapter VIII. 
CHAPTER II 
CELLULAR-ARRAYS 
In this chapter we describe a generic CA-type FPGA. The local connectivity 
among the logic blocks makes these FPGAs to resemble cellular arrays which were 
studied in the sixties and seventies. In Section 2.1 the methods to realize digital logic 
in cellular arrays will be described. The current research in logic synthesis (11,12] for 
these new CA-type FPGAs are an extension to the methods developed for cellular 
logic. In Section 2.2 we present a generic model of the CA-type FPGA. 
2.1 Cellular Logic 
Cellular logic deals with mathematical models as well as synthesis and analysis 
techniques of digital networks in cellular array. "A Cellular array is a 1-, 2-, or 3-
dimensional iterative arrangement of similar or identical logic cells with a uniform 
interconnection pattern on the cells" (32]. Synthesis methods for cellular arrays, 
namely the Maitra cascades [27], Two-Rail cascades [24], Cutpoint cellular logic [28] 
have been developed and studied, however they were never used in any practical design 
[24,26,27 ,28,29 ,30,31,32,33,34,36]. 
The cellular arrays can be classified into simple one-dimensional cellular arrays, 
"Multi-rail cascades", and two-dimensional arrays. The simple one-dimensional arrays 
are also known as Maitra Cascades [27]. A Maitra Cascades is a one-dimensional 
array of 2-input, 1-output binary combinational cells and is shown in Fig. 2.1. The ver-




y= F (x,.y) 
Figure 2.1 Maitra Cascade 
input variables. In this cascade, each cell is capable of producing any one of the six-
teen possible binary functions of two inputs. It was shown, however, by Stone and 
Korenjak [34] that even using redundant cascades, in which certain vertical inputs are 
connected to more than one cell, not all functions are realizable in this cellular array. 




F (xl, x2, x3) 
Figure 2.2 Redundant Maitra Cascade 
The same deficiency exist for generalized Maitra cascades [31] which are used to real-
ize multi-valued functions. 
9 
One attempt to overcome the logical incompleteness of simple one-dimensional 
arrays is the two-rail cascades method. Short [33] has shown that every binary func-
tion is realizable by means of 3-input, 2-output cells, as shown in Fig. 2.3. 
INPUTS 
~------
INPUTS CELL 1 CELL2 ~-----· CELLn 
..__ __ __,OUTPUTS 
Figure 2.3 Two Rail Cascade 
To realize a single binary function in the synthesis methods developed by Short, 
only one of the final outputs was required. Yoeli and Turner [36] extended the treat-
ment of two-rail cascades to both output signals and showed that two-rail cascades are 
functionally complete for realizing an arbitrary pair of Boolean functions of any num-
ber of variables. Among the synthesis methods for two-rail cascades, four major 
approaches can be noted. These approaches are the ones introduced by Short [33], 
Yoeli [36], Elspas [25] which generalizes Yoeli, and Dvorak [24]. An extension to two-
rail cascades is the multi-rail cascades. Here, instead of two rails, the cells are assumed 
to have more than two horizontal inputs and outputs. The disadvantage of the multi-rail 
cascades is their serial structure, which makes them slow, the methods developed were 
not feasible as the number of cells required to realize the functions had exponential 
growth with the increase in the number of variables. 
Two-dimensional arrays (27 ,28,29,30] provide another attempt to overcome the 
limitation of Maitra cascades. These arrays realize a two-level representation of a func-
tion where each minterm is realized in one column of the arrays and the outputs are 
10 
ORed in a row (called the collector row) of gates to the Maitra cascade columns. This 
approach was mostly used for two-level Sum of Products and positive polarity 
AND/XOR representations rather than the more general multi-level ones. It was 
shown by Minnick [29] that an arbitrary function of n variables can be realized with no 
more than (n + 1) 2°-2 cells with the two dimensional approach. 
Some multi-level representations geared towards specialized cellular arrays. One 
such example are the functionally complete cutpoint arrays [28]. These arrays are 
composed of columns of Maitra cascades where each cell needs to realize only six pos-
sible functions of two input variable. The "cutpoint" in this array refers to the specifica-
tion bits in each cell to program the type of the operation it will be perfonning. The 
main deficiency of this architecture is the large number of cells that do not perform any 
actual function. Furthermore, there exist no communication between the horizontal 
and vertical inputs. Hence if a signal needs to be connected to both horizontal and ver-
tical inputs then it results in unneccessary logic duplication. 
Another attempt at multi-level representation of the functions was the Unate Cel-
lular Logic by Mukhopadhyay [31]. In this approach the ceIIs are assumed to be unate 
two-input functions, i.e. all functions except XOR and XNOR. Each cascade in this 
array can realize a unate function and the whole array is considered to be a two-
dimensional arrangement of the unate cascades. In the synthesis method, a test for 
unate cascade realizability is provided. 
Minnick introduced the cobweb array [29] where the basic structure of the cut-
point array is modified to allow more interconnection among the cells to provide com-
munication between more ceIIs. In the same line, Akers [23] introduced the "Rectan-
gular Logic Array" where each cell in the cutpoint array receives an additional input 
from a non-immediate neighboring cell. This in practice makes the array to resemble a 
11 
three-dimensional structure. Other than the above mentioned methods certained spe-
cial purpose arrays have be reported [32] which utilize special features of their target 
structures and include Adder, Multiplier, threshold, sorting, coding, interconnection 
arrays, etc. 
2.2 Generic CA-type FPGA 
The Cellular FPGA's in the market inherit some of the characters of the above dis-
cussed cellular array. Such as identical cells with uniform interconnections between the 
cells, low fan-in/fan-out, and the fine granularity to implement the simple logic func-
tion or a storage element. The added feature of local and express busses for medium 
and long range communication makes them flexible from the synthesis point of view 
than the cellular arrays reviewed above. This makes them well suited for regular struc-
tured logic like datapath and cellular automata and for computer intensive applications 
[2] like convolution filtering, motion estimation of real-time video and real time image 
manipulation. 
The generic cellular FPGA looks like the one shown in Fig 2.4. The CA-Type 
FPGA is regular array of locally connected programmable logic blocks. Each logic 
block is directly connected to a limited number of neighbors, usually four or eight, and 
to a small limited number of local and global (express) buses, usually four or eight, 
which are used for medium and long distance connections. Local and global buses run 
horizontally and vertically. In addition to logic and storage functions, cells can also be 
used as wires. These buses provide also link between the array and the I/O blocks. In 
homogeneous FPGAs, all logic blocks are identical except input and output ones. A 
set of logic functions which can be implemented in one logic block is defined by the 
block architecture. 
12 
Vertical Buses Horizontol Bus 
/1 '" 1 
Figure 2.4 A Generic CA-type FPGA 
Fig 2.5, Fig 2.6 and Fig 4.3 show the logic block from Algotronix [41], Motorola 
[ 42] and ATMEL [2] FPGA's, respectively. The Function Unit shown in Fig 2.5 imple-
men ts the expected function, while the Mux 's select the input signals to be applied to 
the function unit [41]. The logic cell shown in Fig 2.6 can be programmed as an AND 
gate. Alternately, each cell is capable of a secondary function such as D filp-flop, XOR 
and wired-OR gates. Logic synthesis and techonolgy mapping methods which can effi-
ciently realize the functionality in logic blocks and optimally utilize routing resources 
should be developed. 
North In 
South In 
Eastln -1 ~ 













Figure 2.5 Logic Block of Algotronix FPGA 
13 
Out 
The industrial method for mapping functions into ATMEL FPGA is based on 
"Macro Generator Approach". The current research involved in logic and layout syn-
thesis of these CA-type FPGAs are two-level AND/XOR realization, universal XOR 
(canonical forms of functions based on XOR) forms [16,17], Complex Mattia Logic 
Array (CMLA) approach [12,11] and Simulated Evolution (SE) [12] method. The 
AND/XOR and the universal XOR form methods lead to fewer products than their 
AND/OR counterparts [45,46], therefore requiring less layout area to implement the 
circuit. But to identify the minimal multi-level XOR canonical representation of a 
function with larger number variable the computation time is expensive. CMLA is 








1------.i... - - - - - Secondary Function 
B 
2M-Bus 
Figure 2.6 Logic Block of Motorola FPGA 
complex term is a product term and is a sequence (row) of AND, OR and XOR opera-
tors with corresponding literals. In the CMLA, the input variables of the Boolean func-
tion are in vertical buses. The input plane consists of rows of logic cells representing 
the multi-level terms (Complex terms). The terms are then XORed/ORed together in 
the collecting (output) plane. A method based on linear sorting and SE performs map-
ping of a general DAG representing a Boolean function onto the chip considering the 
restrictions during the placement phase. Comparisons of these methods to our approach 
are made in Chapter VI. 
CHAPTER III 
LAYOUT-DRIVEN LOGIC SYNTHESIS 
A binary tree representation of the circuits is a very useful structure for mapping 
them to the CA-Type FPGAs, as the connections between logic blocks, represented as 
the vertices of the tree, are local and each node in the tree has only connections to its 
parent node and children nodes. In a binary tree the maximum number of connections 
for a node is three and therefore this configuration can be realized by using only adja-
cent logic blocks and local connections in the cellular array. The exponential growth 
of the number of nodes as a function of the tree level can, however, result in a very 
inefficient mapping. Fortunately, for most of the real functions, the shape of the tree is 
not getting as wide at the bottom as could be expected. The tree is expanding from the 
root for a few levels, then the width of the tree tends to stay constant, and decreases 
towards the leaves of the tree. Therefore, by developing a good restructuring method a 
binary tree can be easily mapped to the cellular architecture without wasting too many 
logic blocks for routing. 
Logic synthesis methods based on EXOR gates are very attractive because they 
lead to functions being represent with a smaller number of gates (which means a 
smaller number of logic blocks needed to implement the circuit on the FPGA) and also 
because most of the FPGAs include EXOR gate in their logic blocks. For a long time, 
EXOR gate has been considered not useful for circuit implementation because its reali-
zation in silicon especially for large fan-in or fan-out was slow compared to other sim-
ple gates. However, with introduction of FPGAs, the delay of an EXOR gate became 
16 
similar to the delay of other gates. For some types of FPGAs, like for example LUT-
based Xilinx series, the delay of the logic bock depend on the number of input vari-
ables not on the functions realized by that block. In CA-type FPGAs which are based 
on small granularity of their logic blocks and localized connections, the fan-in and 
fan-out of the EXOR gate are low. Recently, EXOR gates have been more often used 
for the implementation of Boolean function due to easier testability [5]. It has been 
already shown that AND/EXOR representation of linear and nearly linear functions 
costs less (in a number of gates) than the inclusive (AND/OR) representation [5]. 
One of the most fundamental concepts for the decomposition of a logic function 
is the Shannon expansion. The Shannon expansion can always be applied to a logic 
function in contrast to other types of Boolean decompositions like the Ashenhurt (19] 
or the Curits [20] decomposition, which can be applied only to certain classes of func-
tions. By applying certain rules to the Shannon expansion we can generate the Davie I 
and Davie II expansions as shown below. The well-known Shannon expansion is 
given by [21,22) 
f (x1,, .. .X;, ... Xn) = 
Xi .f (xi. ... Xi = 1, .. .Xn) + X:.J (Xi, •• .X; = O,. . .Xn) 
By applying the rule 
a=16'a 
We get the Davie I expansion: 
f (x1, ... Xi, ... Xn) = 
f (x1 .... Xi = O, .. .Xn) EB X;. [/ (x1,. .. X; = O, .. .Xn) 9 (x1, .. .X; = l,. . .Xn)] 
Davio II expansion is derived applying the rule 
a= 19a 
f (x 1,. . .x, __ Xn) = 




In short form we can represent the above expansions as: 
Where 
f =Xifx; + X:Jx. 
f =ix, e X:tx/~Jx;=fx, ei;g 
f =Ix; e xifx, e ix;= !xi e i;g 
f x, = Cofactor w.r.t x; 
f x;= Co/ actor w.r.t X: 





The decompositions represented by equations (5) and (6) are called Davio I and 
Davio II, respectively. The circuit realization of equation (4) is given by a multiplexer 
gate while Equations (5) and (6) describe and AND-EXOR gate structure, as shown in 
Fig 3.1. Since we have chosen to use ATMEL 6000 series FPGA as our target archi-
tecture, and AND-EXOR combination can be realized in one logic cell of that archi-
tecture, we have selected input format accordingly. TRM package details will be 
explained in detail in the later chapters. 
Any combination of Shannon and Davia I and II expansions can be used to pro-
duce decision diagrams [1,14]. A Binary Decision Diagram (BDD) is a Directed Acy-
clic Graph (DAG) with a single root node. The terminal (leaf) nodes represent the 
values 0 and 1, while non-terminal nodes represent Boolean functions. The function 
associated with the root node specifies the function represented by the entire BDD. 
Each non-tenninal node has an associated variable and two outgoing edges. The func-
tion represented by the non-terminal node is specified by its cofactors with respect to 
its associated expansion variable. 
18 
a. Circuit Realization of Equation (4): 
fx; 
/xi ---l f 
Xi 





c. Circuit Realization of Equation (6): 
X;----, 
f: j ~ f 
Figure 3.1 Circuit Realization of Shannon and Davio I and II Expansions 
By applying only Davio I expansion to a given Boolean function, a Functional 
Decision Diagram (FDD), which is a binary tree with nodes representing AND and 
EXOR gates is created. This FDD, with only positive decomposition variable associ-
ated with each node of the binary tree is called Reed-Muller Tree (RMT) [1]. This tree 
is called Reed-Muller tree because once it is flattened, it represents a Reed-Muller 
canonical fonn. If the order of variables used for decomposition is the same in all 
branches of the tree then it is called Non-Permuted Reed-Muller-Tree (RMT). If the 
orders are not the same it is called Permuted-Reed-Muller-Tree (PRMT). 
By applying different combinations of Shannon and Davio I and II expansions 
19 
different decision diagrams can be obtained. 
Generalized Reed-Muller Tree (GRMT): The expansion tree in which each of the 
variables appears only either in positive polarity or negative polarity. This is obtained 
by applying Davio I and II repeatedly. This results in a tree with AND-EXOR gate 
structure. 
Pseudo-Kronecker Reed-Muller Tree (KRMT): The expansion tree in which all 
the variables appear either in positive polarity, negative or both polarities, but having a 
single fixed order of expansion variables in the tree levels. This is obtained by apply-
ing Shannon, Davio I and Il repeatedly. This results in a tree with combination of 
AND, EXOR gates and Multiplexers. 
All the above mentioned Decision Diagrams can be used for ATMEL 6000, by 
introducing an appropriate technology mapping (Grouping) algorithm to efficiently 
map the Tree onto the FPGA. We have chosen PRMT as the input to our algorithm as 
the technology mapping [4] was ready for the PRMT, at the start of this work. 
PRMT of a given Boolean function, which is generated by the program REMIT 
[l], is used as the input to our algorithm. PRM Tree structure is very well suited for 
ATMEL FPGAs as it preserves the local connectivity between the cells and what is 
even more important matches the AND-EXOR configuration of the ATMEL logic 
block to its AND-EXOR structure. 
Example 1 
The Boolean function in the SOP (Sum-of-product) representation, which is used 
as the example to all steps of our method is given. The example is the benchmark func-
tion "5xl" from MCNC. 
f = a bcdef g + a ef g + aef 









.i specifies the number of inputs (variables) . 
. o specifies the number of outputs . 
. p specifies the number of product terms (cubes) 
0 the variable is in negative polarity 
1 the variable is in positive polarity 
- the variable is not in the product term (don't care) 
20 
Using the program REMIT[l], the PRMT for the Boolean function from example 1 is 
generated, as shown in Fig. 3.2. 
The PRMT is used as the input to our algorithm. On Fig. 3.2 the root of the tree 
represents the output of the function. The logic gates are represented with generic logic 
gate symbols and the primary (expansion) variables (a, b, c ... ) which are associated 













Figure 3.2 PRM Tree of the Boolean Function from Example 1 
21 
CHAPTER IV 
ARCHITECTURE OF ATMEL 6000 FPGA 
The logic synthesis method discussed in Chapter III combined with the "Tree 
Restructing Method", which will be discussed in the later Chapters, will lead to a com-
plete comprehensive package which can take a function specified in pla format and 
generate the final layout in any cellular architecture. In this work, we have included 
technology mapping stage specific to ATMEL 6000 chip. Because, the technology 
mapping targets ATMEL 6000 chip, the general features of the architecture are 
reviewed and the main restrictions that lead to the presented algorithms are pointed out 
in this chapter. 
4.1 The Symmetrical Array 
At the heart of the ATMEL 6000 architecture is a symmetrical array of identical 
cells which is shown in Fig 4.1. Except for "repeaters" spaced every eight cells, the 
array is continuous and completely uninterrupted from one edge to the other. In addi-
tion to logic and storage functions, cells can also be used as wires. Buses support fast, 
efficient communication over medium and long distances. 
4.2 The Bussing Network 
There are two kinds of buses: local and express as shown in Fig 4.2. Express 
buses are not directly connected to the cells, they are used for global connections via 
the local buses. Local buses are the link between the array of cells and the global 
23 
[ J [ J I J [ J [ J [ J [ p [ JI p [ J [ 0 ( J [ p [ 0 I p [ 0 
...... ...... 
....... 
0 0 D 0 0 0 0 0 
.... 
..... -
....... ....... - ...... ....... 
0 0 0 0 0 0 0 0 
..... 
...... ..... 
- .... - ..... - 0 0 0 0 0 0 0 0 -...... ,.., 
....... .... 
...... ..... .... 
0 0 0 0 0 0 0 0 -..... ..... .... ..... 
...... ...... 
..... 
0 0 0 0 0 0 0 0 
..... 
...... -..._, ..... 
...... ...... 
....... 
0 0 D 0 0 0 0 0 
.... 
...... -....... ....... - -
....... 
0 0 0 0 0 0 0 0 
.... 
...... ...... 
....... -...... ...... 
- 0 0 D 0 0 0 0 0 -..... ...... 
..... ..... 
[ ] [ 0 [ J [ ][ J [ ][ D C 0 [ D C ][ 0 [ ][ D C 0 [ D C 0 
Figure 4.1 Symetric Array of the ATMEL 6000 Chip 
bussing network. There are two vertical local buses for every column of cells, and two 
horizontal local buses for every row of cells. Every cell in the array has a read/write 
access to two vertical and two horizontal buses. Each cell, in addition, provides the 
ability to make a 90 degree tum between either of the two vertical buses and either of 
the two horizontal buses. Express busses are the fastest way to cover long, straight-line 
distances within the array. Each express bus is paired with a local bus, so there are two 
express buses for each column and row of cells. Connective units called repeaters, 
spaced every eight cells, divide each bus, both local and express, into segments span-









p,.,r, '"" . 











4.3 Cells States 
E L E L E L 
j ~ j ~ JI ,, 
IR 
I I I ,, 
~ t:.W..l l::S AA - A'~ ::S t:.W..l J::S ; •• A 
B-:: B B,.;;: 
Cell NS2 - - NSl Cell NS2 -
B - B B 
A-:: A A --
l 1'.:'Wl R A J ~ PWl R i 
,, I j \ I ' 
IR 
- ---- -R R R - - -
r-
IR 
I '~ Re1pea er J I ,, 
::S t:.W..ll::S AA - A'~ ::S t:.W ..l tl ;~ A 
B ~ B B ~ 
Cell NS2 - - NSl Cell NS2 -B - B B -
A~ A .A -
~ PWl R ' ~ PWl R ~ 
















There is one base cell as shown in Fig 4.3 from which thirty five logical functions 
can be implemented. Out of these functions twenty are purely combinatorial cell states 
providing all primitive logic functions like NOR, NAND, AND, OR, 2-input multi-
plexer, and some combinations of the primitive gates as shown in Fig 4.4. 
A A A A B B B B 
Figure 4.3 Basic Cell 
4 4 A Lt 8 nn t. J. .J 
A. lo 8 A. lo B A. lo B A. lo B A. L 0 8 
A L1 8 A L1 8 A Li A L; A Li 
·: 1 o ' 'n lfQ t, Q, x Qo>< 
A. L0 8 ( A. lo ~ ) A. L0 8 
Al 8 ( 
. o A. ~o 8 
L1 8 L1 8 l1B A Li A Li B 
11 I I 9 ~ lflj ~A~ A. lo B A. lo A. Lo B 
A. lo B 
A l. 8 A e L, 8 AL, 8 A l, 6 
I~~~ 
!.....:.. I LJ-J_J VI j7 x A. L0 B A. l 0 B ~ l A. L0 A. L0 B A. lo B 






AND/EXOR realization is of special interest to us. The logic synthesis method 
REMIT discussed in Chapter III, decomposes a Boolean function to an AND/EXOR 
tree. The AND/EXOR realization in a logic cell of ATMEL 6000 is shown in Fig 4.5. 
'~,~~J:( 
- - _,, 
' 
'---,-+-, 
I l I 





" ... ,-" I 
I 1 ..!., I I 
r D-L - clock t 1 1 1 O 1 I 
I I I .-'I.-/ I I I I I 
IQ p- reset I ff.. I 1 _ •-~-' _ 
•., ---~--1---,, '-r" ~-~-,--- ,/ ' ' - - _, / I'• I 
-----------: : 3~:: .,- -----.... > - -,- .., . 
A A A A B B B B 









Also supported are six register states and four constant states shown in Fig 4.6. 
In addition to the four local bus connections, a cell receives eight inputs and pro-
vides two outputs to its North, South, East, and West neighbors. These ten inputs and 
outputs are divided into two classes: A and B. There is an A input and a B input for 
each neighboring cell and a single A output and single B output driving all four neigh-
bors. For outside connections, an A output is always connected to an A input and a B 








A, lo B 
A B 
A, L0 B 
·o· -o· 
I 
A. Lo B 
A ~ B 
~~~ 
A. lo B A, l 0 B A,L0 B 
l1 B A L1 
~ ~ 
A. L0 A, L0 
Lt B A Li B A L1 B 
f 
A, Lo 
A. L0 B 
A. L0 8 
-o· ·1 • ·1· ·o· ·1· ·1· 
I I I I I I 
A, L0 B A, lo B A.Lo B 
Figure 4.6 Register and Constant States 
Within the cell, the four A inputs and the four B inputs enter two separate, inde-
pendently configurable multiplexers. Cell flexibility is enhanced by allowing each 
multiplexer to select also the logical constant 1. The two multiplexer outputs enter the 
two upstream AND gates. The write access to the four local buses are controlled by 
the tri-state buffer. 




4.4 Architecture Restrictions 
Each node can have only one input from the local bus and at most two inputs 
from the neighbors. All possible input configurations can be described as follows and 
are shown in Fig 4.7. 
1. 2 inputs from adjacent cells (one must be to 'A' input and the other to 'B' input) 
+ 1 input from local bus. 
2. 2 inputs from adjacent cells (one must be to 'A' input and the other to 'B' input) 
3. 1 input from adjacent cell+ 1 input from local bus. 
4. 1 input from local bus. 
5. 1 input from adjacent cell. 
.. --- -




"" ',,' .. 
D D D 
Figure 4.7 Four Possible Input Configurations to a Cell 
CHAPTER V 
TREE RESTRUCTURING APPROACH 
In this Chapter the general tree restructing approach based on the Squashed 
Binary Tree (SBn developed here will be discussed, and the problem formulation is 
presented. 
5.1 TREE RESTRUCTURING 
In our approach, we generate the Boolean network in the AND/EXOR tree form 
which can easily be realized in the target ATMEL 6000 architecture. The main feature 
of architecture, local connectivity is also preserved in this AND/EXOR tree form. 
Furthermore, the tree restructuring method is used to generate the planar embedding of 
that Boolean network, such that no routing constraints are violated. The routing restric-
tions which are used in our algorithm are defined by the generic CA-type FPGA archi-
tecture and by specific limitation of the ATMEL chip. We assume that each gate has 
only "one output" in a generic CA-type FPGA. The inputs are limited to a maximum 
of "3", dictated by the limitations of ATMEL architecture which were stated in Chapter 
IV.4. If the tree is mapped directly onto the cellular array without restructuring, many 
cells and local buses are wasted for routing. This impacts directly the throughput of 
the chip, as the utilization factor L/R [a number of cells used for logic (L)/a number of 
cells used for routing (R)] will be low if more cells are used for routing than for logic 
in a given design. By using the stated constraints, the tree is restructured in such a way 
that when it is mapped to a flat surface the shape of the embedded tree is close to a 
30 
rectangle and routing resources are used efficiently. 
To achieve this goal, first the squashed binary tree algorithm [3] is used to reshape 
the representation of the binary tree. The squashed binary tree is formed by projecting 
nodes of the binary tree onto its leaves. Starting from the root, each node is projected 
onto its left-most descendant. Whenever a node has two child nodes, this process is 
repeated starting with the child node which was not projected earlier. Fig 5.1 (b) repre-
sents squashed binary tree for the binary tree shown in Fig 5.1 (a). 
3 
a Binary Tree 
(0,1,3) (4) (2) (5) 
b Squashed Binary Tree Representation of (a) 
Figure 5.1 Binary Tree to Squashed Binary Tree Transfonnation 
31 
5.2 MAPPING PROBLEM FORMULATION 
PRMT which represents a given Boolean function is modeled as a rooted acyclic 
graph T = (V, E) which consists of the ordered set of vertices V, and the set of directed 
edges E defined as follows: 
* V = { vjlvj represents nodes of the PRM Tree which can be realized in one logic block 
of a given architecture or a wire cell} 
* E = { ejlej is an edge from vj to Vj+1' which represents a functional relation between 
these two nodes}. Direction of the edge represents the direction of signal flow in the 
actual design. The vertices of the graph Tare labeled with the primary signals (expan-
sion variables) entering the nodes represented by the vertices. The fan-out of each vj is 
equal to 1 and the fan-in of each vj is not greater than 2. 
The physical resources of the CA-type FPGA are represented as the undirected 
graph GP(V P'EP) with the ordered set of vertices,V P' and a set of edges EP defined as 
*VP={ vP I vP represents a logic cell of the CA chip.} 
*EP ={ ep I eP represents the programmable connections between the adjacent cells} 
The vertices are numbered according to their positions in the column-row matrix of the 
CA-type chip. 
Mapping Problem Formulation: Given the undirected physical graph GP (V P'E p) 
and the network graph T(V,E) representing a design, find a mapping of the graph T to 
the physical graph GP that satisfies the routing constraints of the given architecture and 
that minimizes the rectangular envelope (area) covered by the design and the number 
of logic blocks used for routing. 
The main advantage of the described here approach is that we have eliminated the 
traditional routing phase to realize digital logic. Once SBT is created it can be directly 
mapped to the CA-type array, and only the necessary logic block assigned for routing 
32 
need to be added. 
The input to the SBT algorithm is the a T(V, E) which represents the netlist in the 
form of the Permuted Reed-Muller Tree generated by the REMIT program. The pri-
mary (expansion) variables associated with the vertices of the PRMT tree will be dis-
tributed on the chip using local buses. 
5.3 TRM Flowgraph 
The overall flow graph of the Tree Restructuring Method (TRM) is shown Fig. 
5.2. The input to the logic optimization phase is in pla format. The output of logic 
optimization phase is a binary tree. In this thesis the logic optimization program 
REMIT is used, to produce the PRMT representation of the given function. Next an 
optional technology mapping step is introduced to perform technology specific opti-
mization. The tree is then restructured using SBT technique and finally the primary 
inputs are assigned to the local buses in the bus assignment phase. The sub-sections in 
Chapter VI give a description of the algorithms used in the TRM. Formal description 
of the steps in TRM package are as follows 
STEP 1: Generate Squashed Binary Tree. 
The nodes of the SBT are labelled as vb1'vb2 ••••• ,vb0 , where n is the number of leaf 
nodes in the original PRMT tree. Each node vbi is a set of vertices(v 17v2, ..... vm) of the 
PRMT tree, which were collapsed to that node. An edge exists between vbi,vbj if there 
exists an edge ekl between vk and v1, where vk is a vertex which was collaps_ed to node 
vbi, and v1 is a vertex which was collapsed to node vbj· 
STEP 2: Perform mapping of the SBT to the target CA-type FPGA. 
For all nodes vbi of the SBT, place all vertices belonging to vi in the same column 
of the target cellular array. Node vbi will occupy as many rows as there are vertices 





1 Technology Mapping(Grouping) 1 '----------------------' 
Generate SBT 
Map SBT to CA 
Bus Asssignment 
Netlist to Programming Device 
Figure 5.2 TRM Flowgraph 
Then place vertices belonging to vb2 in column two staring with Row(j) defined by the 
edge between vb1 and vb2• The edge between v1 and v2 is defined by the edge between 
vk E vb1 and vj E vb2, where j is the number of the row vk e vb1 is placed on the cellu-
lar array. This mapping is continued until all nodes are placed. Add additional routing 
cells if required. 
STEP 3: Perform bus assignments. 
The steps of the bus assignment algorithm are explained in Chapter VI.4 
Based on the experiments we noticed that more compact shape with a smaller 
number of routing cells could be achieved, by relaxing the left-most-descendant con-
straint used to construct the SBT. We have introducted a new algorithm called Modi-
fied Squashed Binary Tree (1v1SBD which gives better results both in the rectangular 
area and the number of routing cells used by the given design (Ref. Tab I) than the SBT 
implementation. The MSBT will be explained in detail in Chapter VI. 
CHAPTER VI 
DESCRIPTION OF ALGORITIIMS 
In this Chapter we discuss the Tree Restructuring Method (TRM) package which 
includes Technology Mapping (Grouping), Modified Squashed Binary Tree (MSBT), 
and the Bus Assignment alogrithms. We discuss the extention of this method to multi-
output functions. We compare TRM with other methods used to realize digital logic in 
CA-type FPGAs. 
6.1 TECHNOLOGY MAPPING 
In general, any decision diagram or binary tree can be used as input to our 
restructing algorithm, but to get better results a tree which can provide the best match-
ing between structure of the tree and a given FPGA should be selected. PRM tree 
structure is very well suited for ATMEL FPGAs as it preserves the local connectivity 
between the cells and gives the flexibility to performing technology mapping for effi-
cient utilization of available resources, i.e both the cells and buses. 
The logic synthesis program REMIT produces a PRM tree with EXOR, AND and 
NOT gates. Though PRM tree preserves the local connectivity; it cannot be used 
directly as the input to the restructuring algorithm (which will be discussed later) as it 
violates certain architecture constraints of ATMEL 6000 series FPGAs. For example, 
in Fig 6.1 we see that from netlist of REMIT, output of one EXOR gate is directly con-
nected to the input of another EXOR gate but in ATMEL 6000 this cannot be realized. 
35 
The architecture constraint (type mismatch) states that Output type 'B' cannot be con-
nected to Input type 'A'. The "switch cell" is introduced between the parent cell and 





Figure 6.1 Illustration of Type I Mismatch 
Similarly, a "Connection cell" is added when two signals assigned to local buses 
have to enter the same cell; because only one signal can be taken from the local bus as 
shown in Fig. 6.2. In this phase a grouping of cells is done to save area as well as 









Figure 6.2 "Switch Cell" - Solution to Type I Mismatch 
and AND at level 2 of the tree, but in ATMEL 6000 this can be realized in one cell as 
shown in Fig 6.4. The effect of such grouping is that we save one cell (area) and 
improve performance because the delay for each cell in FPGA's is almost the same and 





i 11 I i2 
Figure 6.3 A Two Cell Realization of EXOR-AND Chain 
The technology mapping is performed using a simple grouping algorithm [4] 
which analyzes the tree and groups successive AND and EXOR gates into one node 
and adds wire cells to overcome the architecture constraints. 
In Fig 6.5 the grouped PRM tree for the example from Chapter III (Fig 3.2) is 
38 
ol 
i 1 i2 i3 
Figure 6.4 One Cell Realization of EXOR-AND Chain 
shown. To simplify the representation of the circuit shown in Fig 6.5, the binary tree 
representation of that circuit is shown in Fig 6.6, where nodes represent gates and the 
connections between gates are represented as directed edges indicating the direction of 
signal flow. For easy comparison of the two representations, the gate numbers from 












Figure 6.6 A Grouped PRMT as a Binary Tree 
41 
6.2 SQUASHED BINARY TREE 
The Squashed Binary Tree approach was chosen as it gives us a possibility to 
shape the tree into a rectangular form which closely represents the CA-type architec-
ture. The rectangular shape is easy to map to the array while satisfying the design 
restrictions. Mapping a SBT to the CA-type architecture is just a straight-forward pro-
cess as we place each node of the SBT into one column of the target array and make 
the neccessary connections. 
The squashed binary tree is formed by projecting the binary tree onto its leaves. 
Starting from the root each node is projected onto its left-most descendant. The process 
will be repeated starting with the child nodes which were previously not projected, by 
traversing the tree in the bottom-up direction. In Fig 6.7 the squashed binary tree rep-
resentation 
(2) (14,16,2016),17) 
Tl T2 T3 T4 TS T6 TI T8 T9 TIO Tl 1 Tl2 Tl3 Tl4 TIS 
Figure 6.7 Squashed Binary Tree (SBT) 
of the PRM tree from Fig 6.6 is shown. Tl ... T15 are the nodes of the SBT. Fig 6.8 
shows the mapping of the SBT from Fig 6.7 to the cellular anay. We map Tl on col-
umn 1 of the array; T2 to column 2 and continue up to Tl5. Since there is a directed 
edge e34 from Tl to T2, we connect the cell 4 from column 2 to cell 3 in column 1. 
Similarly, there is a directed edge between Tl and T7, so we connect them by using the 
cells in Row 1 between cell 'O' and cell '2' as "routing cells". The "routing cells" are 
indicated by 'R'. It can be observed from Fig 6.7, that one advantage of SBT approach 





,.. - - .,. - - "'t - - "1 - - "1 - - ""1- - -1- - -r-. 
I I I 231 241 I 281 
1-1-1"1"1"1"1 
I 
I I '--11-'--11-'--11 I I I'--" I '-'I 
- 1 - - 1 - - -. - - -,- - -,- - - ,- - - r - - r - - r - - r 
I 
I I I 
I I I I I I I I I I 21 I I I 11027 I I 
_L __ 1 __ J __ J __ J __ J ___ 1 ___ L--L--L--L--l--l-_J __ J __ J __ 
I I I I I I I I I I I I I I I I 
Figure 6.8 Mapping from SBT to the Cellular Array 
42 
Each node of the SBT will be implemented in one column of the CA-type array. The 
number of nodes of SBT is equal to the number of columns of the CA-type array used 
for design implementation, and the maximum number of vertices of the grouped PRM 
tree which are projected into one node of the SBT detennines the number of rows 
required. 
Using this approach the signal delays can be calculated before perf onning the 
actual mapping. We know that in FPGAs all cells have approximately the same delay. 
Assuming that the delay per each cell is 1 unit, the delay from cell '8' to the output cell 
'O' is 7 units. Further, area and delays can be optimized by choosing the proper order 
of the nodes to be mapped on the array. For example let us consider the path from cell 
'21' to cell 'O' in Fig. 6.8. The delay of this path is 14 units and let us assume further 
that it is a critical path. By changing the order of the node collapsing when SBT is 
formed we can get new order of the veritices T15,T14 ... T7 ,Tl,T2 ... T6. Using this 
new order the new mapping is obtained as shown in Fig 6.9. 
I 
I I I I 
- -, - - -. - - -1- - - .- - -
I I I I 
I I I I 
I I I I 
I I I I - -.- - -.- - -.- - -.- - -
I 28 I I 2A I 
I 
I 
I I I I - - -. - - -,- - - .- - -.- -
I I I I 
I I I I 
I I I I 
I I I I 
- - -1- - - ,- - - .- - - ,- -
9 I 10 I 
I 
I 
1........,1'-'1 I 1'-'1'-'1'-'1 I I 




I I 1027 I I I I 211 I I I I I I I I I 
- r - - r - - r - - 1 - - -, - - 1 - - -, - - -,- - -i- - - ,- - - r - - r - - r - - r - - 1 - - 1 - -
Figure 6.9 Critical Path Control 
43 
We can observe that delay in the path from cell '21' to cell 'O' is reduced from 14 
units to 9 units. Therefore, by adding the ordering heuristic we can minimize the 
length of the critical paths. The predictability of the signal delays is a very important 
advantage of this approach. 
44 
6.3 MODIFIED SQUASHED BINARY TREE 
To obtain a more compact shape of the design the original SBT algorithm was 
modified and the Modified SBT (MSBT) algorithm was implemented in our package. 
The modified squashed binary tree is fanned by projecting nodes of the binary tree 
onto its leaves in the depth-first manner. By allowing to choose which child node can 
be used for projection (i.e left most descendant constraint is relaxed) much better opti-
mization was achieved. 
The modified squashed binary Tree T b(V b,Eb) consists of the set of vertices Vb 
and the set of directed edges Eb. 
*Vb = {vb I vb represents the vertices of the Tree T(V,E), projected onto the same leaf} 
*Es = { ebl eb represents the directed edge from vbi to vbj if any of the vertices of the 
Tree T(V, E) which were collapsed to vertex vbi is connected to any of the vertices of 
the Tree T(V, E) which were collapsed to vertex vbj}. Here the directed edge represents 
the direction of signal flow. 





(2,14,16,2016,17,19,20,21) (1020) (23,24,2024,25,27,1027) (28) 
T3 T4 T5 T6 
Figure 6.10 Modified Squashed Binary Tree (MSBT) 
Mapping the MSBT to the cellular array is done in the same way as was explained for 
SBT previously. Fig.6.11 shows the mapping of the MSBT from Fig. 6.10 to the cellu-
lar array. 
I 
I I I I I I --r-----1-------,--------r-----r-----;-----,--
1 I I 2 I I I I 
I I I I I 
I I ---,--------.---
' : 20 
I I I I 
---~-----~-----~-----~--
' I I I 
I I I I 
I I I I 
---~-----~-----~-----~--
. 2Y-.... I I 
}E---q ) I I 
I I 
---..l------'--
~: ---.----u I 
I I 
- - - 1 - - - - - -, - -
I I 







• I I I Vf' I 
I I I I I I 
--~-----~-------~--------~-----~-----~-----~--
' I I I I I 
Figure 6.11 Mapping MSBT to the Cellular Array 
6.4 BUS ASSIGNMENT 
45 
Once MSBT is created it can be directly mapped to the CA-type array, and only 
the necessary logic blocks required for routing need to be added. The exact number 
and the exact locations of these additional routing blocks is defined by the MSBT 
structure and hence is known a priori. Then, to complete all connections, the bus 
assignment has to be performed. The primary (or decomposition) variables from the 
original PRMT have to be assigned to the local buses. An efficient heuristic has been 
developed which assigns the variables to the local buses such that the number of local 
buses used by the same variable and the number of cells needed to distribute the same 
signal to the different buses are minimized. Each cell in ATMEL 6000 has access to 
46 
four local buses, but only one of them can be used as an input. Once a signal is put on a 
bus that signal can be used only by the logic blocks in that particular row or column 
whichever the case may be. Bus assignment algorithm described below uses a greedy 
approach. Using this approach a feasible bus assignment for all examples has been 
achieved with a good utilization of bus resources. The steps of the bus assignment 
algorithms are given below. 
Step a: For each decomposition variable di; calculate Mi, the total number of ver-
tices in PRMT which the variable di is assigned to. Form a list 'L' of variables for 
which bus assignment has to be done. Initially this list contains all the primary vari-
ables. 
Step b: For all di; calculate Rij the total number of nodes of the MSBT named 
with variable di and placed in row rj of the cellular array. If Mi = Rij; then assign vari-
able di to the upper local bus URj of the row Rj. If another variable is already assigned 
to URj then assign di to the bottom local bus BRj of the row Rj. Remove di from the 
list 'L'. If BRj is occupied then take the next variable from the list. If the list is empty, 
EXIT. 
Calculate Cij - a total number of nodes of the MSBT named with variable di and 
placed in column Cj of the cellular array. If Mi = Cij; then assign variable di to the left 
local bus LCj of the column Cj. If another variable is already assigned to LCj then 
assign di to the right local bus RCj of the column Cj. Remove di from the list. If RCj 
is occupied then go to Step c. If the list is empty, EXIT. 
Step c: Calculate Ni for each variable di in the list, where Ni= M/2 (if Mi even); 
Ni = (M/2) + 1 (if Mi odd). Substitute Mi =Ni . 
Step d: For all di; If Mi~ Ru, then assign di to URj. If URj is already assigned 
47 
then assign the variable to BRj. Decrement Mi by Rij. If Mi = O; Remove di from the 
list. If BRj is occupied then take the next variable from the list If the list is empty 
then, EXIT. If Mi ~ Cij; then assign di to LCi. If LCi is already assigned then assign 
the variable to RCj. Decrement Mi by Cj. If Mi = 0; remove di from the list. If RCj is 
occupied then go to Step c. 
If the list is empty, EXIT. If the list is not empty, then Step e. 
Step e: If the list is not empty, repeat Step c and Step d until all connections are 
completed. 
The completed mapping of the MSBT of our leading example to the ATMEL 
6000 series FPGA is shown in Fig. 6.12. 
outl 
~--~-4-+-~-+4-+--~---1-4----+-+----+-+-





~ I I 11 11 I 11 I 11 I 11 11 
b~r-~~-t-r~~-t-+-~_._-t-r~~-r-t--~~-t-t-~~-t--t--
Figure 6.12 Final Mapping of MSBT with Bus Assignment 
48 
6.5 MULTI-OUTPUT FUNCTIONS 
The above discussed algorithms from the previous sections 6.3 and 6.4 work for 
multi-input and single-output functions. This restriction comes from the logic syn-
thesis method used. But most of the real circuits are multi-input and multi-output func-
tions, which are usually represented as a directed acyclic graph (DAGs). Therefore, 
one possible approach would be to decompose the DAG into a number of trees. This 
can be done using for example the cone partitioning method developed in [18]. The 
cone partitioning method was developed taking into account the I/O limitations of 
FPGAs and the critical paths within the circuit partitions. 
6.6 COMPARISON OF OUR METHOD WITH OTHER METHODS 
DAG-Approach: In the general DAG mapping method developed in [12], a 
Boolean function is modeled as DAG, where each node represents the logic which can 
be realized in one logic block of the target architecture and an edge represents the con-
nectivity between the cells. All the cells connected to the same primary input variable 
are placed on a single column of the target architecture using the linear column order-
ing method. The final placement is further tuned using an iterative improvement based 
approach (Simulated Evolution) [12]. 
In this method the initial placement is done using alternate columns to place the 
grouped cells, in order to accomodate routing cells later in the iterative improvement 
phase. This leads to a lot of unused cells, and they are considered to be wasted when 
the rectangular area of the mapped circuit is taken into account. 
The iterative method based on Simulated Evolution is a non-deterministic algo-
rithm, in the event of an incremental design change this method is disadvantageous as 
the routing paths change due to different placement and hence the timing of the entire 
circuit may change. 
49 
To reduce the complexity of placement only vertical buses were used to assign the 
input variables in this method. However, the I/O pads are usually placed on both hori-
zontal and vertical boundaries of FPGAs, therefore if a variable has to come from hori-
zontal boundary the number of cells used as wires increase, and the throughput of the 
chip is reduced. 
Two-Dimensional Array Approach: This method [ 11] synthesizes a given circuit 
in the form of multi-level SOP/ESOP form. A set of functions realizable by each of 
the logic cells is limited to a finite set of simple logic gates which includes an EXOR 
gate. The CA-type FPGA is represented as two distinct planes, input plane (columns 
of logic cells represent multi-level terms) and output plane (collecting terms), as shown 
in Fig. 6.13 
Inputs Outputs 
Figure 6.13 CMLA: Two-Dimensional Array 
A string of literals connected by a set of operators in which no literal appears 
more than once is defined as a complex term The operators in a complex term can be 
either AND, OR, EXOR or their complements. All operations in a comple..x term, 
must be performed in a sequence from left to right or from right to left and all 
50 
operators have the same priority. 
For example, 
(a E+ c), (a+ b)c 
are two complex terms, when the operations are performed from left to right. 
In a CA-type FPGA, due to a local connectivity between adjacent blocks, a com-
plex term can be realized using a column of logic cells and primary variables are 
assigned to local buses, one per each column of cells. Let us consider the example 
from (12]. The result from the factorization phase is: 
fo = (b d + a )c + b d a which is two complex terms; 
f 1 = b d which is one complex term; 
f 2 = b d a c which is one complex term. 
Fig. 6.14 shows the realization of the above complex terms with setting the variable order 
as ( b, d, a, c ). One advantage of this approach is that no separate routing step is required 
to realize a complex term [12]. 
This method was developed for generic CA-type FPGA. However, if we attempt a 
realization with ATMEL 6000 [2] FPGA as a target architecture, we see a Type I mis-
match in the last ro'W (EXOR-EXOR), in Fig. 6.14. To avoid the mismatch we must 
add a switch cell between the two EXOR gates. It means that we have to move the 
entire column, associated with the expansion varible "c" to the next column. In the 
ATMEL architecture AND-EXOR gate combination can be realized in one cell. The 
CMLA method does not take advantage of this powerful feature. As a result of that the 
additional cell used increases the delay of the design. Thus the developement of a gen-
eralized method should be followed by device specific technology mapping technique. 
51 
fO b d 3 c 




lliOllifDIW /II!' ~ln11oi 
11lllli j I tdQD! . ! I , I I . : : j 1 I i I ! I 1-ftDi 1L~j1i I 
I l I 
1111111 1111 =1 ~ _ 11 I ii Ill----+----'-E 
1 rr 1 r 
Figure 6.14 CMLA: Realization of a Function 
Macro-Generator Approach: The current industrial method (ATMEL) uses 
Macro-Generators approach, where basic modules are provided in the library and 
automatic placement routing techniques are used for placing and connecting modules. 
This method docs not provide any opporunity for synthesis of general purpose func-
tions where decompositions into submodules are not known. Secondly, the modules 
have irregular shapes and routing requires many cells to be used just for connections. 
Therefore, it is difficult to achieve a good performance of the implemented circuits. 
CHAPTER VII 
RESULTS 
TRM was applied to a set of MCNC benchmarks. Since TRM is applicable to 
single output functions, we modified MCNC benchmarks to extract single output func-
tions. We compare the area and the number of local buses used by Squashed Binary 
Tree (SBT) method versus Modified Squashed Binary Tree (MSBT) approach, and 
MSBT approach versus commercially available ATMEL (IDS) tools. We assume 
ATMEL 6000 as the target architecture. The results are presented in Table I and Table 
II. 
Table I shows the results of the SBT and MSBT approaches for the modified set 
of MCNC benchmarks. The second column, "PRMT", shows the number of gates 
required to implement the function in PRMT form (tree generated by REMIT). 
In the GROUPING section of the Table I, column "L" shows the number of logic 
blocks used to implement the logic and column "C" shows the number of connecting 
cells added due to A-B restrictions of the ATMEL architecture [2]. Column "GT" 
shows the total number of cells. SBT and MSBT column sections present the results of 
SBT and MSBT approaches, respectively. R is the number of routing logic blocks 
added when constructing SBT or MSBT. Tis the total number of logic blocks required 
to realize the function. RT represents the size (in terms of number of cells) of the 
smallest rectangle enclosing the mapped circuit The results clearly show that MSBT 
approach has significantly reduced the total number of logic blocks required to imple-
ment a given function and the size of the enclosing rectangle is also smaller. 
MCNC PRMT 
#of nodes 
Table 1: Cell Utilization SBT Vs. MSBT 
GROUPING SBT 
L I C I GT R I T I RT 
53 
MBST 
R I T I RT 
54 
To illustrate the differences between our final layout and the layout generated by 
ATMEL tools we have presented one detailed example. In Fig. 3.2 the View Logic 
schematic of the MCNC 5Xl benchmark represented as a PRMT was shown. The 
ATMEL generated layout for that example is shown in Fig. 7 .1, and the layout gener-
ated by our TRM package in Fig. 7.2. It can be easily noticed that our layout is more 
compact and gives the better utilization of the chip resources and therefore gives us 
better perf onnance. In Table II the comparison between our method Tree Restructuring 
Mapping (TRM) and the ATMEL commercial tools package (IDS) is presented using 
the MCNC benchmarks. B is the number of buses and L is the number of logic blocks 
used for implementing the logic, C is the number of cells used for routing, and A repre-
sents the rectangular area occupied by the core of the design (without I/O pins). We 
compare only core area of the mapped design because the ATMEL tools perform bus 
assignment in an inefficient way, and the resulting area is very large and contains a lot 
of unused logic cells. As it can be seen from the Table II our methods give much more 
compact layouts than the ATMEL tools. The number of local buses and logic blocks 
used for routing is much smaller for all run examples. The L/R ratio, logic blocks to 
routing blocks, is high for our method, and it gives more "logic power" to the imple-
mentation of the designs and improves performance. 
Table II Area Comparison TRM Vs. ATMEL (IDS) 
MCNC 
ATMEL(IDS) Our Program(TRM) 
B L c A B L c A 
5X10 46 17 23 90 19 15 6 27 
f5 28 11 12 45 10 8 6 12 
misex54 58 24 27 180 19 18 13 35 
55 
To compare the performance improvement we have done timing analysis using 
ATMEL (IDS) package. The design, generated by our TRM package is entered using 
interactive editor of the ATMEL (IDS) tools, and then we run the timing analysis from 
the ATMEL (IDS) package on both implementaions. The longest path delays obtained 
is shown are Table Ill. The average improvement achieved with our approach is 
around 50%. 
Table III Delay Comparison TRM Vs. ATMEL (IDS) 
MCNC ATMELns TRMns 
5X10 71.14 27.90 
f 5 57.30 26.70 






I J11! I r/11 
I ~1 LJI 
L--~---~--1~-
--n!!l-'1\~!_c:::j! ·;:ii (/ ._ - , __ '~-- '--- 1'.~. _J 
·, 
: ,!; ·--<·~} ~.":;.-7 ~~CJ \.:~J, 
\~JT::r1~1 l j I I' : 
I D I D1 "1SJ [S1 c1 
Id n D t24-'· ~ ___.:_: ;_.~~~].-~;,-·_---
1~--=~~-l'~'-'c-: L; 1 ., :~ 
Figure 7.1 Layout of MCNC 5xl benchmark generated by ATMEL (IDS) Tools 
:-~l~-D-J~: ;(] -:~.- ~ -:j: 
c-· .-.~-=-;. .-M. 
~~~~J: i:J 
- I ' .. ~:... - r· 
:~-- =·:: :-::__:_:_;: :{1._~l: r-1~~: 
o' A')"~ A'.; I' o A'C)'.' .,_, LJ 'bj'' J' 
:~.;,-=-=-!· ... ~~: ~w-.~ ~- _: 
J ; ..i.~-i· 1-i ;~. ' ' ·I .I ,, I ' 'I ' .ii 
' ' '-" ' : __ i - ~q~:~'>-~ '" 
;' .-n. -
1?: _1 






:1 ,,lJ: : , ... _.I. 
:l~_: ~ --1: 
I I 








We proposed a new tree restructuring method for mapping combinatorial circuits 
onto CA-Type FPGAs. Using this approach the routing phase was completely elimi-
nated, by preserving the local connectivity among the logic blocks. Mapping process 
is straight-forward and therefore enables predictability of the signal delays, which is 
very important advantage of this method. 
Research from [43,44] shows that by allowing various polarities of variables we 
can obtain trees with fewer nodes than the PRMT, and with proper technology map-
ping developed for this new methods we can obtain significant improvement in the 
final layout. Our program TRM is independent of the functionality of the logic gates in 
in the netlist obtained from logic optimization steps, as long as the function is repre-
sented as a binary tree. Our method is a general method and can be applied for a gen-
eral class of CA-type FPGAs. The results on some MCNC benchmarks shows that our 
method is better both in area and delay when compared to commercially available 
tools. 
Next phase of work should be towards extending the TRM for multi-output func-
tions. Further improvements can be can achieved by combining together the ordering 
of the node collapsing in MSBT and the bus assignment procedure. The minimization 
of the longest path can be used as a constraint in ordering of the collapsing nodes when 
MSBT is formed. The bus assignment heuristic could also be further improved. 
58 
My contribution in this thesis work, I investigated and implemented the SBT 
approach as a solution for mapping of a binary tree to Cellu!ar Arrays. Developed and 
implemented the MSBT and Bus-Assignment algorithms. Developed the framework 
for VHDL/SCHEMATIC design entry for the TRM package. 
REFERENCES 
[1] L. Wu, and M. A. Perkowski, "Minimization of Permuted Reed-Muller Trees for 
Cellular Logic Programmable Gate Arrays," In H. Gruenbacher and R. Hartenstein 
(eds.), LNCS, No. 705, Springer Verlag, pp. 78-87, 1993. 
[2] ATMEL Corporation CMOS Integrated Circuit Data Book, 1993:94. 2125 O'Neil 
Drive, San Jose, CA, 95131. 
[3] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, 
Hypercubes, Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1992. 
[4] Z. Songhua, "Grouping of The Permuted Reed-Muller Tree", Report, Portland State 
University., 1993. 
[5] T. Sasao and Ph. Besslich, "On The Complexity of Mod-2 Sum PLA's," IEEE 
Trans. Compu.t., vol. 39, No. 2, pp. 262-266, Feb. 1990. 
[6] N. Ramineni, and M. Chrzanowska-Jeske, "A Routing-Driven Mapping For Cel-
lular-Architecture FPGA's", 36th Midwest Symposium On Circuits & Systems pp. 
308-311, Aug 1993. 
[7] M. Chrzanowska-Jeske, "Architecture and Synthesis Issues in FPGAs," Proc. of 
Northcon, pp. 102-105, October 1993. 
[8] F. Furtek, "An FPGA Architecture for Massively Parallel Computing," 2nd Interna-
tional Workshop on Field Programmable Logic and Applications, August 1992. 
[9] N. B. Bhat and D. D. Hill, "Routable Technology Mapping for FPGA's," 1992 
ACM First International Workshop on FPGAs, pp. 143-148, 1992. 
60 
[10] R. J. Francis, J. Rose, and Z. Vranesic, "Chortle-crf: Fast Technology Mapping for 
Lookup Table-Based FPGAs," Proc. 28th DAC, pp. 227-233, June 1991. 
[11] A. Sarabi, N. Song, M. Chrzanowska-Jeske, M. Perkowski, "A Comprehensive 
Approach to Logic Synthesis and Physical Design for Two-dimensional Logic 
Arrays", Proc. of 31st DAC, pp. 321-326, June 199.:t. 
[ 12] A. K. Dasari, N. Song, M. Chrzanowska-Jeske, "Layout Driven factorization and fit-
ting for Cellular architecture FPGAs", Proc. of Norrhcun, pp. 106-111, 1993. 
[13] Concurrent Logic, Inc., Seminar at PSU, Nov. 17th, 1992. 
[14] U. Kebschull, E. Schubert, W. Rosenstiel, "Multilevel Logic Synthesis Based on 
Functional Decision Diagrams," Proc. IEEE European Design Automation Confer-
ence, pp. 43-47, 1992. 
(15] A. Mukhopadhyay, "Cellular Logic," in "Recent Developments in Switching The-
ory," Ed. Mukhopadhyay, A., pp. 281-285, 1971. 
[16] M. A. Perkowski, A. Sarabi, and F. R. Beyl, "XOR Canonical Forms of Switching 
Functions," Proc. of the IFJP WG 105 Workshop on Applications of the Reed-
Muller Expansion in Circuit Design, Hamburg, Germany, pp. 27-32. September 
1993. 
[17] I.Schaefer, M.A.Perkowski, H.Wu, "Multilevel Logic Synthesis for Cellular FPGAs 
Based on Orthogonal Expansions," Proc. IFJP WG 105 Workshop on Applications 
of the Reed-Muller Expansion in Circuit Design, Hamburg, Germany, pp. 42-51, 
Sept. 1993. 
(18] G. Saucier, D. Brasen, J.P. Hiol. "Partitioning With Cone Structures," Proc. IEEE 
JCCAD 93, pp. 236-239, 1993. 
61 
[19] R.L. Ashenhurst, "The Decomposition of Switching Functions," Proc. Int' I Symp. 
Theory of Switching Functions, pp. 74-116, 1959. 
[20] H. A. Cunis, "Generalized Tree Circuit - The Basic Building Block of an Extended 
Decomposition Theory," Journal of ACM, vol. 10, pp. 562-581, 1963. 
(21] M. Davio, J. P. Deschmnps, and A. Thayse, in Discroe and Switching Functions, 
McGraw-Hill, 1978. 
[22] M. A. Perkowski and P. D. Johnson, "Canonical Multi-Valued Input Reed-Muller 
Trees and Forms," 3rd NASA Symposium on VLSI Design, 1991. 
[23] S. B. Akers, "A Rectangular Logic Array," IEEE Trans. on Comput. C-21, No. 8, 
pp. 848-857' 1972. 
[24] V. Dvorak, "A Two-Rail Cascade Synthesis of Boolean Functions," IEEE Trans. on 
Comput. vol. C-17, No. 6, pp. 592-596, 1968. 
[25] B. Elspas, The Theory of Multirai/ Cascades, Recent Developments in Switching 
Theory, Ed. A. Mukhopadhyay, Academic Press, pp. 315-367, 1971 
[26] S. N. Kukreja and I. Chen," Combinational and Sequential Cellular Structures," 
IEEE Trans. on Comput. C-22, No. 9, pp. 813-823, 1973. 
[27] ~1. K. K. M:iitra, "Cascaded Switching Networks of Two-Input Flexible Cells," IRE 
Trans. Electron. Comput. EC-11, pp. 136-143, 1962. 
[28] R. C. Minnick, "Cutpoint Cellular Logic," IEEE Trans. on Electron. Comput. 
EC-13, pp. 685-698, 1964. 
(29] R. C. Minnick, "Cobweb Cellular Arrays," AF/PS Conf Proc. 1965. 
(30] R. C. Minnick, "A Survey of Microcellular Research," J. of ACM, vol. 14, No. 2, 
pp. 203-241, 1967. 
62 
[31] A. Mukhopadhyay,"Unate Cellular Logic," IEEE Trans. on Comput. vol. C-18, No. 
2, pp. 114-121, 1969. 
[32) A. Mukhopadhyay and H. S. Stone, "Cellular Logic," in Recent Developments in 
Switching Theory, Ed. A. Mukhopadhyay, Academic Press, pp. 281-285, 1971. 
[33] R. A. S~10n,"Two-Rail Cellular Arrays," AF/PS Conf Proc. vol. 27, pt. 1, pp. 
355-369, 1965. 
[34] H. S. Stone and A. J. Korenjak, "Canonical Fonn and Synthesis of Cellular Cas-
cades," IRE Trans. Electron. Comput. vol. EC-14, No. 6, pp. 852-862, 1965. 
[35) S. Swamy, "On Generalized Reed-Muller Expansion," IEEE Trans. on Comput. vol. 
C-21, pp. 1008-1009, 1972. 
[36] M. Yoeli, "Group-Theoretic Approach to Two-Rail Cascades," IRE Trans. Electron. 
Comput. vol. EC-14, pp. 815-822, 1965. 
[36] K. Chaudhary and M. Pedram, "A Near Optimal Algorithm for Technology Map-
ping Minimizing Area under Delay Constraints," Proc. 29th ACM/IEEE Design 
Automation Conj., pages 492-98, 1992. 
[37) K. Karplus. "Xmap: A Technology Mapper for Table-Lookup Field-Programmable 
Gate Arrays." In Proceedings of the Design Automation Conference, pages 
240-243, 1991. 
[38] Jason Cong and Yuzheng Ding, "On Area/Depth Trade-off in LUT-Based FPGA 
Technology Mapping," 30th ACM/IEEE DAC. pp. 213-218. 1993. 
[39] Jason Cong and Yuzheng Ding, "An Optimal Technology Mapping Algorithm for 
Delay Optimization in Lookup-Table Based FPGA Designs," Proc. IEEE ICCAD, 
pp. 48- 53, November 1992. 
63 
[40] A. Bedarida, S. Ercolani, and G. DeMicheli, "A New Technology Maping Algo-
rithm for the Design and Evaluation of Fuse/Antifuse-based Field-Programmable 
Gate Arrays," 1st ACM Workshop on FPGAs, pp. 103-108, Berkeley, CA, February 
1992. 
[41 Algotronix Ltd., Configurable Array Logic User Manual, Edinburgh, UK, 1991. 
[42] Motorola Inc., MPAJOXX. Field Programmable Gate Arrays Product Brief, USA, 
1993 
[ 43] A. Sarabi, P. F. Ho, K. Iravani and M. A. Perkowski, "Minimal Multi-level Realiza-
tion of Switching Functions Based on Kronecker Functional Decision Diagrams," 
IEEE Int. Workshop Logic Synthesis, pp. P3-l, 1993. 
[ 44] P. F. Ho, M. A. Perkowski, "Free Kronecker Decision Diagrams and their Applica-
tion to ATMEL 6000 FPGA Mapping," Proc. Euro-DAC'94 ·with Euro-VHDL'94, 
pp. 8-13, September 19-23, 1994, Grenoble France. 
[45] N. Song and M. A. Perkowski, "EXORCISM-MV-2: Minimization of Exclusive 
Sum of Products Expressions for Multiple-Valued Input Incompletely Specified 
Functions," Proc. of IEEE Int. Symp.on Multi-Valued Logic, pp. 132-137, 1993. 
[46] T. Sasao, "EXMI~ 2: A Simplification Algorithm for Exclusive-Sum-Of-Products 
Expressions for Multiple-Valued-Input Two-Valued-Output Functions," IEEE 
Trans. on CAD, vol. 12, No. 5, pp. 621-632, 1993. 
[ 47] ATMEL Corporation, Integrated Development System, Macro Library I and II, Ver-
sion 1.10, Release Series 1.lOc, August 1993. 2125 O'Neil Drive, San Jose, CA, 
95131. 





R-~\l ~ .. •· L-1 
I D 
~ :rc::S-.: Ir-'--'-';· 
~, I u :L:J ~-: 
















-/D /LJ I 
C~£t-C!J I /D I !CI--' 
r~ D 01 1r-~t'.:1i~:}~-;..~.--: __ ~-:~>-~'::.'~-
,--,_,1----_c-_·~/_E_,:-_) _o_·ci-+j_·---·_-.--..l![Et: 










Figure A.7 ViewLogic Schematic of MCNC misex54 
Ol 
. -'---· -~~I 01;QJ· [l .. ;--·.31 '<~3;;: -~ < : I ~ ~ ~ i; 















1 • \ t 
I~ i!\l 
1·~~. ~~.·-_;:,_ -· _,,_ !J: ; 1.~ 1: .\ -! ..... '" i: 
,Ld: ~~· ~~: D ....___ _ 
0 
. . 









1 Technology Mapping(Grouping) 1 '----------------------' 
Generate MSBT 
Map MSBT to CA 
Bus Asssignment 
Netlist to Programming Device 
Figure B.1 Framework:TRM with VDHL/SCHEMATIC Design Entry 
Warp3 is a trademark of Cypress Semiconductor 
72 
VHDL description of 4 to 1 MUX (MUX41) with Enable. 
ENTITY Muxlof4 is 
PORT( 
EN: IN BIT; 
SO: IN BIT; 
Sl: IN BIT; 
DO: IN BIT; 
Dl: IN BIT; 
D2: IN BIT; 
D3: IN BIT; 
Y: OUTBIT); 
END Muxlof4; 
ARCHITECTURE unitsim of Muxlof4 is 
BEGIN 
Y <=EN AND ((DO and not(Sl) and not(SO)) 
OR (D 1 and not(S 1) and SO) 
OR (D2andS1 and not(SO)) 
OR (D3 and S 1 and SO)); 
END unitsim; 
The pla format output of the input VHDL design from Warp 3 VHDL based 










Fig. B.2 is the representation of the output netlist ( blif format) form the program 
REMIT [1]. The grouped PRMT representation is shown in Fig. B.3 after the technol-
ogy mapping phase. The final mapping with bus assignment for the MUX41 is shown 









£CI -CCI -cs N::3: 
~l 
76 




dl I I I I I : I I I I I I 
d3~+--+---+-+--+--+-+--t--rt--t--~ 
sl ~~111-Lt-:1~1 -==~11-=-+-1~.1~1~ 
Figure B.4 MUX41 with Bus-Assignment 
Comparing the Fig. B.4 and Fig. B.5; it is evident that the proposed Framework 
can produce good results. The MUX41 uses 10 cells for logic, 3 for wire and occupies 
a rectangular area of 18 with our TRM. Where as the ATMEL (IDS) [47] uses 13,5,20 
cells for logic, wire and rectangular area respectively. In both the Figures, the cells 
needed for connecting to the buses were not included. 
MUX41 • 4-to-1 Multiplexer 
Layout for Shape 1 
~) 
~ 
Figure B.5 MUX41 implementation in ATMEL (IDS) [47] 
77 
