FPGA technology mapping optimizaion by rewiring algorithms. by Tang, Wai Chung. & Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
FPGA Technology Mapping 
Optimization by Rewiring 
Algorithms 
TANG Wai Chung 
A Thesis Submitted in Partial Fulfilment 
of the Requirements for the Degree of 
Master of Philosophy 
in 
Computer Science and Engineering 
©The Chinese University of Hong Kong 
June 2005 
The Chinese University of Hong Kong holds the copyright of this thesis. Any 
person(s) intending to use a part or whole of the materials in the thesis in a 
proposed publication must seek copyright release from the Dean of the Grad-
uate School. 
1 7 OCT 15 ) i j 
� “ UNIVERSITY 
.IBRARY SYSTEMy>^  
Abstract of thesis entitled: 
FPGA Technology Mapping Optimization by Rewiring Algo-
rithms 
Submitted by TANG Wai Chung 
for the degree of Master of Philosophy 
at The Chinese University of Hong Kong in June 2005 
FPGA Technology Mapping is an important design automation 
problem which affects floorplanning, placement and routing dra-
matically. Depth-optimal technology mapping algorithms were 
proposed and produced quality mapping solution for delay min-
imization. However such algorithms fail to further reduce area 
consumption by applying powerful logic transformation techniques. 
Our work is the first and successful attempt to apply rewiring 
techniques, state-of-the-art logic transformation techniques, on 
FPGA technology mapping problem. We focused on reduction 
on the number of LUTs used while keeping the depth optimal-
ity. Our approach is based on a greedy but effective heuristic to 
choose good alternative wires to transform the network for less 
LUTs. Experiment results show our approach can effectively re-
duce 20% of LUTs over initial mapping solutions by FlowMap, 
FlowSYN and Cut Map. Our final optimized mapping solutions 
are the best among all network-flow-based mapping algorithms 
and also very comparable to logic-based approach such as BDS-
pga. 
Our result also provides solid recommendations on the choice of 
rewiring techniques which can provide flexible trade-offs between 
optimization effort and runtime constraints. REWIRE allows 
i 
complete transformation ability while REWIRE can carry out the 
optimization process at high throttle. The proposed greedy LUT 
minimization approach is highly practical and further alleviate 
























I would like to thank my labmates for inspiring discussions on 
different rewiring techniques and they provide me a lot of insight 
on this work. 
Also I would like acknowledge UCLA VLSI CAD lab for pro-
viding the source code of FlowMap, FlowSYN and CutMap. 
iii 





1 Introduction 1 
2 Rewiring Algorithms 3 
2.1 REWIRE 5 
2.2 RAMFIRE 7 
2.3 GBAW 8 
3 FPGA Technology Mapping 11 
3.1 Problem Definition 13 
3.2 Network-flow-based Algorithms for FPGA Tech-
nology Mapping 16 
3.2.1 FlowMap 16 
3.2.2 FlowSYN 21 
3.2.3 CutMap 22 
4 LUT Minimization by Rewiring 24 
4.1 Greedy Decision Heuristic for LUT Minimization 27 
4.2 Experimental Result 28 
V 
5 Conclusion 38 
Bibliography 40 
vi 
List of Figures 
2.1 (A)Original Circuit (B) Modified Circuit 4 
2.2 MA values for s-t-1 fault on 6 
2.3 Unobservability and Uncontrollability Propagation 
Rules (Basic) 7 
2.4 Circuit Containing a GBAW LOCAL 13 Pattern . 9 
2.5 Resultant Circuit After Rewiring based on GBAW 
LOCAL 13 Pattern 9 
2.6 Local 13 Pattern(A) and its Backward Wire(B) . 10 
3.1 Schematic of a SRAM-based 3-LUT 12 
3.2 Simple Boolean Network k. its Corresponding Circuit 14 
3.3 FPGA mapping example 15 
3.4 Label Calculation in FlowMap 18 
3.5 Label Calculation in FlowMap (Cont，） 19 
4.1 Initial Mapping Solution 26 
4.2 Final Mapping Solution after transformation . . . 26 
4.3 Algorithm Flow of REMAP： Greedy LUT Mini-
mization 28 
vii 
List of Tables 
3.1 Relationship between k and LUT Utilization . . . 20 
4.1 Experimental Results: REWIRE on FlowMap . . 30 
4.2 Experimental Results: REWIRE on FlowSYN . . 31 
4.3 Experimental Results: REWIRE on CutMap . . . 31 
4.4 Experimental Results: RAMFIRE on FlowMap . 32 
4.5 Experimental Results: RAMFIRE on FlowSYN . 33 
4.6 Experimental Results: RAMFIRE on CutMap . . 33 
4.7 Experimental Results: GBAW on FlowMap . . . 34 
4.8 Experimental Results: GBAW on FlowSYN . . . 35 
4.9 Experimental Results: GBAW on CutMap . . . . 35 





Rewiring techniques had been developed by EDA research com-
munity in the past decade. Those techniques are able to find out 
alternate wires for a given target wire, and allow logic transfor-
mation on the network without changing any behaviours of the 
circuit at primary outputs. The techniques were applied success-
fully in various design automation problem such as circuit parti-
tioning and proves the power of the techniques for efficient and 
flexible logic synthesis. 
We studied FPGA technology mapping problem and reviewed 
several depth-optimal mapping algorithms. Statistics showed that 
the logic block components in the FPGA is not fully utilized by 
current algorithms. Moreover, logic transformation is not in-
1 
CHAPTER 1. INTRODUCTION 2 
eluded or fully used in these algorithms. This suggests us to 
investigate the feasibility of adding in rewiring techniques into 
technology mapping algorithms. 
The thesis is organized as follows: chapter 2 introduces rewiring 
algorithms and compares their abilities and speed; chapter 3 re-
views network-flow-based technology mapping algorithms which 
guarantee depth-optimal mapping solutions; chapter 4 explains 
our approach and discusses our extensive experiments. This chap-
ter also includes our analysis on the experimental results; chapter 
5 give the final conclusion. 
Chapter 2 
Rewiring Algorithms 
Redundancy addition and removal (RAR), or rewiring, can mod-
ify the structure of a logic circuit by adding redundant wire(s) or 
gate(s) to force removable redundancies on another part of the cir-
cuit, and throughout the whole process the function implemented 
by the circuit is remained unchanged. We call the wire which is 
assigned to be removed the target wire and the new wire to be 
added to the circuit the alternate wire. Rewiring algorithms are 
referring to the techniques to identify any target / alternate wire 
pair for a given circuit. We denote a wire by a duple of gates -
(s, d) refers to a wire from gate s to gate d. 
Consider the example circuit shown in figure 2.1, where the 
target wire is (^1,^6). We can see that the circuit functionality 
3 
CHAPTER 2. REWIRING ALGORITHMS 4 
c ； — c — 
(A) (B) 
Figure 2.1: (A)Original Circuit (B) Modified Circuit 
will be maintained by: 
1. Replacing (c, gb) with the output of an AND gate whose 
inputs are gl and c respectively. 
2. Removing (^1,^6). 
The modified circuit is shown in figure 2.1. Clearly, the output 
function y remains to be (a + b){ab + c) = (a + b)c + ah after 
rewiring. 
Although rewiring was first designed for logic minimization, 
the technique is prove to be powerful in areas like post-layout 
timing optimization [13], circuit partitioning [7] [15] and FPGA 
routing [4]. Prom all the examples, we can understand that 
rewiring algorithms provide flexible and powerful logic transfor-
mation which can be used to optimized circuits' performance for 
different goals. This strongly motivates us to apply rewiring tech-
niques in optimization for FPGA technology mapping. 
CHAPTER 2. REWIRING ALGORITHMS 5 
To analyze and compare extensively with the performance of 
different rewiring techniques, the best three rewiring techniques, 
namely REWIRE, RAMFIRE and GBAW, are chosen for our 
purpose. The techniques were applied and tested with our tech-
nology mapping optimization in order to characterize their abili-
ties for our problem. They are briefly introduced in the following 
sections. 
2.1 REWIRE 
REWIRE [5] is based on the idea of MA (Mandatory Assign-
ments) .These are the assignments that are required in order for 
a test of a fault to exist. If the set of MA for a fault is inconsistent, 
no test is possible for that fault. REWIRE tries to add wires that 
force the set of MA for the stuck-at fault test of the target wire 
to become inconsistent, that is, after the addition of the wire, the 
target wire's stuck-at fault will become untestable and the wire 
will become redundant. It then ascertains the added wire itself is 
also redundant so its addition will not change the circuit function-
ality. There are fast filters in REWIRE to screen out those wires 
that cannot become redundant so that less redundancy tests are 
needed. 
CHAPTER 2. REWIRING ALGORITHMS 6 
a _ OjpXO . 
1 oJ  
0 
Figure 2.2: MA values for s-t-1 fault on (pi, ^6) 
To illustrate the idea of mandatory assignments, we consider 
again the circuit shown in 2.1 A. If we perform a stuck-at-1 fault on 
the wire (pi, ^6), the values of MA are shown on 2.2. We can see 
that if we insert an AND gate with inputs gl and c in front of p5 
the MA values become inconsistent, making the stuck-at-fault not 
testable. In this way, the target wire is hence removable after the 
addition of the alternate wire. REWIRE identifies alternate wires 
for different target wires based on this simple idea of creating 
conflicts in MA. 
REWIRE is found to be the most powerful rewiring algorithm 
which can find the largest number of target / alternate wire pairs. 
However, it consumes large amount of computation time due high 
computational complexity in doing logic propagation and justifi-
cation. 
CHAPTER 2. REWIRING ALGORITHMS 7 
^ ^ 0 ( 1 ) 
\ 0(1) 
Figure 2.3: Unobservability and Uncontrollability Propagation Rules (Basic) 
2.2 RAMFIRE 
RAMFIRE[2][3] is another ATPG^-based rewiring algorithms. It 
utilizes a polynomial-time redundancy identification technique 
called FIRE[12] to reduce unnecessary redundancy checking. FIRE 
applied the concept of uncontrollability and unobservability di-
rectly on implications. A signal is assigned 1(0) if we cannot 
control the signal to be 1(0)，and a signal will be marked as * 
when any faults on this signal line is not able to be propagated 
to primary output for observations. During the implications, un-
controllability will propagate based rules on gate types and un-
observability will propagate mainly backwards. The propagation 
rules from [2] is reproduced in figure 2.3 for reference. 
The algorithm first starts implication of 1 and 0 on the tar-
get wire and get two sets of uncontrollability and unobservability 
values 5(1) and 5'(0). And the alternate wire(s) will be identified 
1 automatic test pattern generation 
CHAPTER 2. REWIRING ALGORITHMS 8 
checking the potential of redundancy based on the following def-
inition: 
A gate is potentially redundant in S if 
• the gate output is assigned unobservable (*) in S, or 
• the gate output has a MA of 0(1) in stuck-at-fault test and 
a value of 0(1) in S, 
given that the gate is in the transitive fanout cone of the target 
wire. 
Experimentally, RAMFIRE has a speed up of 20 times over 
REWIRE but REWIRE can find 70% more alternative wires than 
RAMFIRE. 
2.3 GBAW 
GBAW [16] does not apply any logic implications. What it de-
pends on is a set of graph configurations, which are called "pat-
terns" .Patterns are pre-defined graph representations of sub-
circuits which contains alternative wires. An example pattern is 
shown in figure 2.6. The target wire to the NOR gate can be re-
CHAPTER 2. REWIRING ALGORITHMS 9 
: = | = g > ^ 
C fl" """1 
Figure 2.4: Circuit Containing a GBAW LOCAL 13 Pattern  
^ ^ 
^ 
Figure 2.5: Resultant Circuit After Rewiring based on GBAW LOCAL 13 
Pattern 
placed by the alternative wire to the AND or NAND gate. Figure 
2.4 shows a circuit containing a pattern local 13 (the pattern is 
embraced with the dashed box). Thus when GBAW finds alter-
native wires on this circuit, local 13 pattern will be matched and 
then we can identify (c, p5) or (分2, ^5) as the target wire. Suppose 
we replace (p2, by a new alternative wire (^2, ^ 6) (equivalent 
adding a new AND gate after ^6), we get a logically equivalent 
circuit shown in figure 2.5. 
GBAW finds alternative wires by performing pattern matching 
on the circuit with the library of patterns. GBAW is able to find 
alternate wires with high speed, and on average it is around 150 
CHAPTER 2. REWIRING ALGORITHMS 10 
NOR AND/NAND NOR AND/NAND 
a Target Wire ^ ^ ^ ^ / ^ A l t e r n a t i v e W i r e ^ ^ 
Alternative Wire Target Wire 
(A) (B) 
Figure 2.6: Local 13 Pattern(A) and its Backward Wire(B) 
times faster than REWIRE. However, GBAW can only locate 
23% of alternate wires found by REWIRE. 
• End of chapter. 
Chapter 3 
FPGA Technology Mapping 
Field Programmable Gate Arrays (FPGA) is a popular technol-
ogy for digital designers, especially in Application Specific Inte-
grated Circuit (ASIC) area. The reuseablility and programma-
bility of FPGA dramatically reduce the design turn-around time 
and development cost. FPGA is also widely used in benchmark 
testing and prototyping of digital circuits. 
One of the most important components in FPGA are logic 
block, which is responsible for implementing all the combinato-
rial functions of the circuit. In conventional FPGA, the logic 
block is implemented by lookup tables (LUT), which is found 
to be an area-efficient method for Boolean function implementa-
tions on VLSI. Usually, fixed size LUTs are used among the whole 
11 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 12 
~ R A M cell — 
> — R A M cell 一 
a _ I — R A M cell — 
^ — R A M cell — 
b _ 5 一 f 
§ — R A M cell —— •‘ 
c — & 
£? — R A M cell 一 
一 R A M cell — 
~ R A M cell ~ 
Figure 3.1: Schematic of a SRAM-based 3-LUT 
FPGA chip and the size of every LUT is denoted by k, which is 
commonly chosen to be 4 or 5. Every LUT is implemented by 
memory cells with /c-bit address decoder. Any inputs to a 
Boolean function will be taken as an address to read correspond-
ing bit pre-loaded inside the memory cell. Therefore a /c-LUT can 
implement any A;-variable Boolean functions by saving the truth 
of the function inside the list of memory cells. Figure 3.1 shows 
a possible structure of a 3-LUT^ 
During the design automation process, after a circuit is syn-
thesized by technology-independent optimization algorithms, we 
have to carry out a technology mapping to assign functions to 
be saved on the LUTs - when the circuit consists of small input 
gates, the mapping process needs to collect and group the gates to 
together to fit into a LUT; when the circuit is originally in com-
1 Wired-OR: The outputs of the SRAM cells are connected through a shared BUS 
I 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 13 
plex form, for instance, PLA, the mapping process is required to 
break down the circuit into smaller pieces which can be imple-
mented by LUTs. The following section explains the technology 
mapping problem more formally. 
3.1 Problem Definition 
A circuit is modeled as a Boolean Network. There is a set of 
nodes PI representing the primary inputs (PI) and another set 
of nodes PO representing the primary outputs (PO). All other 
nodes in the network is called internal and these nodes are asso-
ciated with specific functions. The function type can be simple 
(AND, OR, NOT, XOR) or complex. Every wire in the circuit is 
represented by an edge between two nodes. All incoming edges 
to a node is called fanin of this node and all outgoing edges are 
called fanout] Nodes in PI has only fanouts while nodes in PO 
has only fanins. If the in-degrees of all nodes are less than or equal 
to k, the network is /c-bounded. Clearly /c-bounded network can 
be implemented by a FPGA using /c-LUTs as logic block. A sim-
ple Boolean network and its corresponding circuit are shown in 
figure 3.2 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 14 
PI PO 
r I internal , 
E擁丨丨 
Figure 3.2: Simple Boolean Network k its Corresponding Circuit 
The FPGA technology mapping problem is formulated as fol-
lows: 
• INPUTS: A Boolean Network N, LUT size k, delay model 
D 
• OUTPUT: A /c-bounded network N' 
• OBJECTIVES: 
1. Minimize the number of LUTs used to map the circuit, 
denoted by . 
2. Minimize the delay of N', d{N'), according to D, 
The original network can be /c-bounded or /c-unbounded. For 
the latter case, we have to decompose the /c-unbounded network 
into a /c-bounded one before we can process it with the technology 
mapping algorithms. This process is usually referred as gate de-
composition. We should notice that gate decomposition will alter 
the initial network structure supplied to the mapping algorithms, 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 15 
thus affect the final mapping solutions. Figures 3.3 shows an ex-
ample on this problem. In this example, k is chosen to be 3. The 
gates within dashed polygons can be packed in a 3-LUT. 
c i FS'I 
« _ q ^ - � � � � � 
U j i > l _ � � � � � � � 1 
d ‘……--： 
a , ' D ~ _ L L ^ / 
- — — P I S ： z I / 
b _ L 93 J z i / 
^ ^ ^ … … … - — — u J . / 
Figure 3.3: FPGA mapping example 
For a quick but accurate delay estimation, unit delay model 
can be used to estimate the overall delay of the circuit. In unit 
delay model, every node is assumed to have delay 1 over the 
signal propagations, and therefore the total delay of the circuit 
is determined by the depth of the circuit. In other words, the 
number of nodes on the longest signal path (or critical path) is 
the total delay under this model. Depth of the network can be 
found by doing a depth first search (DFS). 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 16 
3.2 Network-flow-based Algorithms for FPGA 
Technology Mapping 
In this section, we will introduce three network-flow-based al-
gorithms for the technology mapping problem. These algorithms 
guarantee to produce mapping solutions with optimal depth. There-
fore in later design process, the wiring delay of the circuit can be 
reduced as much as possible. This is rather important when the 
wiring delay dominates the overall delay on the whole chip in 
deep sub-micron technology in VLSI. In our work, these three 
algorithms are used and compared for the best result. 
3.2.1 FlowMap 
FlowMap [9] is the first depth-optimal technology mapping al-
gorithm developed. The algorithm will first apply DMIG[6] to 
decompose the network into network composed of small gates. 
The author of FlowMap believes small gates can be packed and 
grouped more efficiently by their algorithms than large input 
gates. And experimentally they showed that the depth of the 
mapped network is the smallest when the original was first de-
composed into 2-input gates. 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 17 
After gate decomposition, the algorithm enters the labeling 
phase. The algorithm calculate a label l{t) for every node t in 
topological order. The label l{t) gives the minimum depth of any 
mapping solution of the subnetwork rooted at node t, denoted 
by Nt. Moreover, l{t) is either equal to the maximum label p 
of the nodes in fanin of t or one more than the maximum label. 
FlowMap first collapses all nodes with label p in Nt to get a new 
network Nj., then it continues with computation of the maximum 
volume min-cut of Nj. by the classic network flow technique. If the 
cut size is less than or equal to k, the label l{t) is assigned to be 
p, otherwise l{t) = p+1, indicating a new LUT is used to map Nt-
After label calculation, FlowMap starts /c-LUT generation with 
a list of PO nodes. It iteratively takes a non-PI nodes on list and 
generate a LUT to implement the function for all the nodes with 
the same label. The fanins to this newly generated LUT is then 
put on the list. 
To illustrate the label calculation we show the network for the 
circuit in previous example in figure 3.4. For simplicity, we take 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 18 
cK 
Figure 3.4: Label Calculation in FlowMap 
k = 3. Suppose during the label phase we need to compute the 
label for node g8 with p = 2 (l(g7) = 2). Thus we collapse node 
gl with gS together and consider this collapsed node as the sink 
node. After addition of a dummy source node connecting to all 
PI nodes, we find a minimum cut on the network by network flow 
technique. Figure 3.5 shows the collapsed network and the graph 
for flow calculation. The min-cut simply separates the sink node 
with all other nodes, which implies that nodes gl and g込 can be 
collected together and implemented by a 3-LUT. Since the cut 
size equals 3, the label of node is 2，same as that of gl. 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 19 
Figure 3.5: Label Calculation in FlowMap (Cont，） 
FlowMap has a polynomial time complexity of O(kmn) where 
n and m are the number of nodes and the number of edges in N, 
Therefore the algorithm is extremely fast even for large circuits 
with thousands of gates. However, during our experiments we 
found that the mapping solutions by FlowMap does not guarantee 
high utilization of LUTs, despite its depth optimality. In other 
words, many /c-LUTs do not have all k inputs used, and this 
situation goes worse when k increases. One prime reason is that 
when k increases, the utilization drops since the algorithm finds 
it harder and harder to push in nodes in an old LUT when no 
modification is allowed during mapping. Table 3.1 shows the 
statistics collected. 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 20 
Number of Inputs used 
Circuit 2 I 3 I 4 I 5 
3-LUT � 
alu2 I 120 I 146 I 0 0 
alu4 186 ^ 0 0 
des 一 1528 982 “ 0 0 
Total 1834 (56.4%) T4I8 (43.6%) 0 0 
4-LUT  
alu2 I 68 I 59 I 87 I 0 一 
alu4 108 107 179 0 
des 346 ~ ~ 292 ~ 1076 0 “ 
Total 552 (22.5%) 458 ( 1 9 . 7 ^ 1342 (57.8%) 0 
5-LUT  
alu2 I 37 I 23 I 37 I 77  
alu4 一 77 45 S 110 “ 
des 361 ^ 641 _ 
Total 475 (23.4%) 270 ( 1 3 . 3 ^ 453 (22.4%) 828 (40.9%) 
Table 3.1: Relationship between k and LUT Utilization 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 21 
3.2.2 FlowSYN 
FlowMap does not take the logic information stored into consid-
eration during the technology mapping process. It is obviously 
true by re-synthesis of the network the delay and the number of 
LUTs can be further reduced. FlowSYN[8] attempts to augment 
FlowMap with logic reconstruction technique. When FlowSYN 
finds a min-cut with size more than /c, instead of incrementing 
the label by 1，it tries to re-synthesize the subnetwork based on 
functional decompositions. The principal decomposition form is 
/(:ri，:r2，...，:zv) = f ( g i , . . . , g j , X k + i , … w h e r e gi for z = 1 
to j are functions for Xi,X2,... ,Xk, j < k and r > k. If such 
decomposition is possible, all gi functions can be implemented by 
/c-LUT and resultant number of inputs from f to f' is reduced 
by 1. This follows to recursively decompose the function so that 
all subfunctions are fitted into /c-LUTs. After all, such recursive 
decomposition can allow the label to remain p and thus there is 
higher chance to have a smaller depth in the final mapping solu-
tion. 
The functional decompositions are carried out using ordered 
binary decision diagrams (OBDD) [1] efficiently. Nevertheless, 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 22 
more memory usage and computation time are expected in FlowSYN. 
3.2.3 CutMap 
CutMap [10] introduced two important concepts - depth relax-
ation on non-critical nodes and node cost functions based on max-
imum fanout free cone. 
CutMap relaxes the rule to assign depth-optimal labels on 
nodes which are not on the critical path in order to gain room 
for area minimization. Instead it keeps track on the difference be-
tween the current label and the depth-optimal label. During label 
calculation for non-critical nodes, the node collapse originated in 
FlowMap for depth optimization is not used, and this is replaced 
by the minimum cost /c-feasible cut computation in CutMap. 
The cost function for every node is determined by the maxi-
mum fanout free cone (MFFC) it belongs to. The root node of 
every MFFC is assigned the cost zero since they are more likely 
to be implemented by a single LUT and they have high number 
of fanouts. PI and PO nodes have cost zero and other nodes have 
cost one. Consequently this avoids cuts along the MFFCs and 
prevents more nodes to be implemented by unnecessary LUTs. 
CHAPTER 3. FPGA TECHNOLOGY MAPPING 23 
CutMap preserves the depth optimality of FlowMap but at 
the same time minimize the number of LUTs used through min-
cost cut computation. Clearly, the computation for such cuts 
dominates the runtime of the algorithm. Although the algo- , 
rithm is speed-up by careful case handling, its time complexity 
is 0(2/077177^ /2+1) Taking /c = 4 or 5, CutMap runs slowly than 
FlowMap by a factor of n\ 
• End of chapter. 
Chapter 4 
LUT Minimization by Rewiring 
Prom our discussion on several network-flow-based FPGA tech-
nology mapping algorithms, we can see that most of the logic 
information available in the network is not utilized for reduction 
of number of LUTs in the mapping solution. Though FlowSYN 
employs functional decomposition inside the algorithm, its main 
goal is still on depth optimization. On the other hand, CutMap 
tries to reduce the number of LUTs solely by taking better cuts 
along non-critical paths on the network and the algorithm does 
not consider the network contains digital logic information inside. 
Therefore we explored the possibility of linking logic transfor-
mation with technology mapping together and work out a co-
herent collaboration so that we can on one hand keep the depth 
24 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 25 
optimality provided by FlowMap / FlowSYN / Cut Map, but at 
the same time transform the network to allow less LUTs used in 
the mapping solution. We chose network-flow-based mapping al-
gorithms since they provide optimal-depth solutions and we can 
thus focus on LUT reduction using logic transformation. 
To illustrate the idea, we first present an example of LUT re-
duction by rewiring technique. Consider the sub-circuit shown 
in figure 4.1. Suppose initially we get 4 LUTs rooted at gl, 
g5 and g6 respecitively, as bounded by dotted lines. More here 
we assume that l{gl) equals l{g5). If we find an alternate wire 
{g2, gb) for the target wire we can allow more efficient 
packing with 3 LUTs only, and the overall depth of the circuit is 
unchanged. The final mapping is should in figure 4.2. 
Rewiring was chosen to provide transformations available to 
the network. The first reason for this choice is that rewiring is 
powerful in which it is proved that any logic transformation can 
be archived by apply rewiring. Therefore rewiring technique pro-
vides us with high reachability in good starting network for the 
mapping algorithms. The second reason is that rewiring tech-
CHAPTER 4. LUT MINIMIZATION BY REWIRING 26 
i  ……i:::::::::::::::.. I 
- i — — = i ^ n 
Figure 4.1: Initial Mapping Solution 
— [ = = : • , 
；:KVf-
Figure 4.2: Final Mapping Solution after transformation 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 27 
niques usually work better on network decomposed in small gates 
so we can link rewiring technique with the technology mapping 
algorithms seamlessly. 
4.1 Greedy Decision Heuristic for LUT Mini-
mization 
Rewiring technique provides a set of target / alternative wire pairs 
for a given network. In our approach, the pair will be evaluated 
directly using the mapping algorithms to determine the number 
of LUTs and the depth after the transformation applied to the 
network. Even the evaluation can be completed in polynomial 
time\ we would like to make the trial evaluation as least as pos-
sible. Therefore heuristically we always take the first alternate 
wire which can reduce the number of LUTs with no penalty on 
depth. So after the transformation, the depth can be smaller or 
at the least, but we use less number of LUTs to map the circuit. 
And the greedy strategy will be applied through the whole opti-
mization process until we reach a minimum point and no more 
improvement can be done by transformation on the network. The 
iwhen fc = 4 or 5, FlowMap runs at O(kmn) and FlowSYN and CutMap runs at 0{kmn^) 
approximately. 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 28 
Algorithm ReMap 
Input: Boolean network N 
Output: LUT minimization network N' 
1. finding a set of alternate wires W oi N 
2. for each AW w in W 
3. do transform N according to w 
4. Nf flowmap(N) 
5. if lut{Nf) < lut(N) and d(Nf) < d(N) 
6. then N —Nf 
7. goto step 1 
8. /* no more possible reduction */ 
9. return N � 
Figure 4.3: Algorithm Flow of REMAP： Greedy LUT Minimization 
algorithm is outlined in figure 4.3a. 
We conducted a series of experiments in linking the rewiring 
technique with the technology mapping algorithms based on our 
greedy decision heuristic in order to verify the usefulness of this 
approach. REWIRE, RAMFIRE and GBAW are chosen to work 
V 
with FlowMap, FlowSYN and CutMap. The experimental results 
and analysis are given in the following section. 
4.2 Experimental Result 
We conducted all the following experiments with 12 small to 
medium sized MCNC benchmark circuit on a personal computer 
with 2.4-GHz CPU and 1028 MB memory. The experiments were 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 29 
written in C language with SIS [11] library on Fedora Core 3 Linux 
platform. 
We started with REWIRE on the three map algorithms since 
REWIRE should be the most powerful which enables us to full 
understand the extent of effectiveness of the approach. Table 4.1, 
4.2 and 4.3 show the initial depth, the initial number of LUTs, 
the new depth, the optimized number of LUTs, the reduction 
percentage and finally the execution time measured in minutes. 
We can see that REWIRE can readily reduce the number of 
LUTs used for mapping into LUTs. The overall percentage re-
duction is 22.7%, 24% and 22.1% on FlowMap, FlowSYN and 
CutMap respectively. In general the depth of the circuit remains 
the same after the optimization. This can explain that REWIRE 
does not length the depth very much during the transformation 
for less LUTs and actually we have margins on each level of LUTs 
to accommodate a few more level of small gates so that the fi-
nal mapping depth is kept unchanged. For small-sized circuits, 
REWIRE completed optimization within one to three minutes 
while for medium-sized circuits the time taken will be around 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 30 
two to three hours on FlowMap and FlowSYN. The runtime will 
be longer on CutMap due to higher complexity in evaluation, and 
it takes around six to eight hours to optimize a medium-sized cir-
cuits. 
"Ch^uit Init. d Init. n New d Opt, n Red. (%) C ^ 
comp 4 27 4 W ^ 0.4 
f51m 3 ~ ^ ~ ~ 3 ^ 1 7 ^ 0 . 4 
5xpl 3 30 3 ^ 1 3 . 3 0.8  
pclerS 3 ~ 47 3 35 — 25.5 0.2  
b9-n2 3 ~~47~~ 3 ~ ^ ~ ~ 19.1 —0.3 
ttt2 3 ^ 3 42 ^ ^ 3.6 
terml 4 ~ ~ ^ 4 S ^ 1 0 . 6 
C880 8 ^ 7 ^ 106 27.9 —20.5 
alu2 9 m 9 m ^ 2 6 1 . 0  
duke2 4 4 ~ 25.3 114.6 
misexS 7 ^ 7 ^ 2 3 4 . 1 
x3 5 5 13.9 63.7 
Total — 1302 “ 1006 22.7 710.2— 
Table 4.1: Experimental Results: REWIRE on FlowMap 
Then we proceeded with experiments with RAMFIRE and re-
sults are shown on table 4.4，4.5 and 4.6. The reduction brought 
by RAMFIRE is quite satisfactory even when compared to results 
using REWIRE - the reduction percentages are 19.3%, 16.7% 
and 15.4% respectively and the ratios to results on REWIRE are 
84.8%, 69.5% and 69.9%. However RAMFIRE used only one-fifth 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 31 
Circuit l ^ t . d Init. n || New d Opt, n Red. (%) CPU 
comp 4 27 4 ^ ^ I ? 0.4 
f51m 3 ~ ^ 3 16 ^ L0~~ 
5xpl 2 ^ 2 ~ 1 8 ~ ~ 14.3 1.0 
pclerS 3 47 ^ 3 ^ ^ 0.2 
b9-n2 3 ~~47""“ 3 39 17.0 ~0.3 
ttt2 3 59 ^ 3 41 ^ 4.0 
terml 4 76 4 52 ^ 10.7 
C880 8 - 146 — 7 106 一 27.4 23.3 
9 “ 164 — 9 114 “ 30.5 316.8 
duke2 4 ~ 1 8 7 ~ 4 139 ^ 121.1 
misexS 7 221 7 249 258.9 
x3 5 5 216 14.6 59.1 
Total 968 —24.0 796.8 
Table 4.2: Experimental Results: REWIRE on FlowSYN 
"cTrcuit Init. d Init. n New d Opt, n Red. (%) CPU 
comp 4 22 4 21 4.5 1.7 
f51m 3 ^ ~ ~ 3 ~ ~ ^ ~ ~ 21.4 1.3 
5xpl 3 " " “ ^ ~ ~ 3 24~~ 11.1 —1.2 
pclerS 3 ~~34~~ 3 ^ ~ ~ 14.7 0.2 
b 9 _ n � 3 ~ 42 3 ~ 37 ~ 11.9 
ttt2 3 48 3 ^ 42 12.5 2.7 
terml 4 ^ 4 49 21.0 21.6 
C880 8 m 7 9 2 2^7.6 171.8 
alu2 9 150 9 118 21.3 1450.2 
duke2 4 153 4 " ~ 1 2 0 ~ 21.6 334.4 
misex3 8 7 146 28.4 3573.6 
^ ^ 5 m 5 1 7 9 22.5 140.3 
Total 1129 879 22.1 5699.8 
Table 4.6: Experimental Results: RAMFIRE on CutMap 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 32 
of CPU time to complete the set of experiments. Prom this we 
believe RAMFIRE is quite promising to produce substantial LUT 
reduction even if short amount of CPU time is allowed. 
"circuit Init. d Init. n New d Opt, n Red. (%) CPU 
comp 4 ？ 7 4 ~ ~ ^ ~ ~ 3.7 0.1 
f51m 3 28 3 W~~ 7.1 0.1 
5xpl 3 ^ 3 2 7 10.0 0.1 
pclerS 3 4 7 3 35 25.5 0.0 
b9_n2 3 4 7 3 ~ ~ ^ ~ ~ 17.0 0.1 
ttt2 3 ^ ^ 3 44 29.0 0.9 
4 77 4 59 23.4 1.7 
C880 8 147 7 109 25.9 8.4 
alu2 9 m 9 ~ 1 3 6 21.8 35.4 
duke2 4 4 1 4 8 20.4 10.7 
7 225 ~ ~ 7 ~ ~ 180 20.0 55.2 
^ 5 ~ 252 5 222 11.9 21.8 
Total 1302 1051 19.3 1345" 
Table 4.4: Experimental Results: RAMFIRE on FlowMap 
Finally we repeated the experiments with GBAW, the fastest 
rewiring technique. Obviously GBAW can finish the optimiza-
tion in extremely short time, yet we need to analyze the power of 
GBAW in specifically reduction for LUTs. Table 4.7, 4.8 and 4.9 
show the experimental results for our analysis. Here we show the 
CPU time in seconds since GBAW finished every circuit with 10 
seconds. Nevertheless, the alternative wires found by GBAW is 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 33 
"Circuit Init. d Init. n New d Opt, n Red. (%) CPU 
comp 4 ZZ 4 ^ ^ 0.1 
f51m 3 ^ 3 ^ 16 36.0 0.2 
5xpl 2 S 2 ~~18~~ 14.3 —0.2 
pclerS 3 47~~ 3 35""“ 25.5 ~0.0 
b9-n2 3 ""“47~~ 3 39 17.0 —0.1 
ttt2 3 ^ ^ 3 42 28.8 1.0 
terml 4 76 4 ~ ~ m ~ ~ 21.1 一 1.9 
C880 8 146 7 1 0 9 25.3 9.6 
alu2 9 m 9 ^ 133 18.9 41.9 
duke2 4 187 4 0.0 ~1.0 
" M I S ^ 7 221 — 7 173 21.7 64.6 
^ ^ 5 m 5 2 2 3 11.9 —18.9 
Total i m 1061 16.7 139.5 
Table 4.5: Experimental Results: RAMFIRE on FlowSYN 
"circuit Init. d Init. n New d Opt, n Red. (%) CPU 
comp 4 2 2 4 21 4.5 0.6 
f51m 3 2 8 3 ~ ~ ^ 3 . 6 0.3 
5xpl 3 ^ ~ ~ 3 26~~ 3.7 —0.4 
pclerS 3 34 3 29 14.7 0.1 
b9_n2 3 4 2 ~ ~ 3 36 14.3 0.5 
ttt2 3 4 8 3 43 10.4 0.8 
terml 4 ^ ~ ~ 4 59~~ 4.8 3.8 
C880 8 127 7 96 24.4 141.2 
alu2 9 150 9 116 22.7 403.1 
duke2 4 4 ^ 125 18.3 64.6 
misex3 8 ^ 8 ^ 159 22.1 750.9 
^ ^ 5 m 5 ^ 217 6.1 34.5 
Total 1128 — 954 “ 15.4 1400.8 
Table 4.6: Experimental Results: RAMFIRE on CutMap 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 34 
not very useful in LUT reduction, and we can only reduce 1.8%, 
1.6% and 1.2% of LUTs on FlowMap, FlowSYN and CutMap 
by GBAW respectively. The problem is especially serious when 
GBAW works on small-sized circuits, in most cases, no reduction 
is found. 
"circuit Init. d Init. n New d Opt, n Red. (%) CPU(s) 
comp 4 2 7 4 ^ 0 . 0 ~ 0.57 
f51m 3 ^ 28 3 28 0.0 0.42 
5xpl 3 ^ 30 3 29 ^ 0.1 
pclerS 3 4 7 3 47 0.0 “ 0.16 
b9_n2 3 4 7 3 47 0.0 0.21 
ttt2 3 ^ 3 61 1.6 0.07 
terml 4 7 7 4 74 3.9 0.44 
C880 8 147 8 146 0.7 0.4  
alu2 9 174 9 171 1.7 "~0.81 
duke2 4 4 179 3.8 0.8  
misex3 7 2 2 5 ~ ~ 7 ~ ~ 221 1.8 1.42  
^ 5 252 5 249 1.2 4.19  
Total 1302 1279 1.8 9.6 
Table 4.7: Experimental Results: GBAW on FlowMap 
To sum up, we found that REWIRE and RAMFIRE is a strong 
optimization tool for the technology mapping algorithms under 
experiments. They are capable to reduce at least 15% of LUTs 
used by all three mapping algorithms. The final mapping solution 
is the best when we take REWIRE on CutMap and it uses only 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 35 
"circuit Init. d Init. n New d Opt, n Red. (%) CPU(s) 
comp 4 27 4 27 0.0 0.63 
f51m 3 ~ ~ ^ ~ ~ 3 ~ 1 8 ~ ~ 28.0 0.14 
5xpl 2 ^ ~ ~ 2 ~ ~ ^ ~ ~ 0.0 — 0.58 
p c l e ^ 3 47 ~ ~ 3 ~ 47 0.0 “ 0.18 
b9-n2 3 ~ 4 7 ~ 3 ~ ~ 4 7 0 . 0 一 0.22 
ttt2 3 ^ 3 5 6 5.1 0.08 
terml 4 W 4 7 4 2.6 1.82 
C880 8 146 8 ~ ~ 145 0.7 
alu2 9 m 9 ^ 164 0.0 8.55 
duke2 4 187 4 — 187 0.0 ~ T 5 4 ~ 
m i s e ^ 7 221 7 ~ 217 1.8 6.07 
^ ^ 5 ^ 5 2 5 0 1.2 4.67 
Total i m 1253 1.6 34.6 
Table 4.8: Experimental Results: GBAW on FlowSYN 
Circuit Init. d Init. n New d Opt, n Red. (%) CPU(s) 
comp 4 ^ 4 ^ 22 0.0 2.89 
f51m 3 28 3 2 8 0.0 1.52 
5xpl 3 ？ 7 3 ~ ^ ~ ~ 0.0 2.13 
pclerS 3 ^ 34 3 33 2.9 0.09 
b9_n2 3 4 2 3 42 0.0 0.82 
ttt2 3 48 3 4 8 0.0 5.26 
terml 4 ^ ~ 4 60 3.2 6.46 
C880 8 127 8 1 2 7 0.0 74.46 
alu2 9 9 ^ 150 0.0 105.32 
duke2 4 4 ^ 148 3.3 11.46 
misexS 8 2 0 4 7 ~~20r~ 1.5 34.9 
^ ^ 5 ^ 5 m ~ 0.9 14.25 
Total" 1128 1115 1.2 259.6 
Table 4.6: Experimental Results: RAMFIRE on CutMap 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 36 
879 LUTs to map all 12 circuits, when compared to 1302 LUTs 
used by FlowMap solely, we have already a 33% improvement. 
Such an improvement can readily ease floorplan, placement and 
routing in later design stages and also allow the whole design pro-
cess to be completed in short time. 
On the other hand, we do not recommend the use of GBAW 
for the mapping optimization problem since the alternative wires 
found by GBAW are more type-limited when compared to ATPG-
based rewiring technique (REWIRE, RAMFIRE). Despite its high 
speed, the reduction in LUTs is not very significant. 
Besides network-flow-based mapping algorithm, logic-based al-
gorithms are also proposed in literature. One of the logic-based 
algorithms is BDS-pga introduced in [14]. The algorithm first 
decomposes the network using OBDD and finalized the mapping 
with FlowMap. Table 4.10 shows the comparison between our 
approach using REWIRE on CutMap and BDS-pga. Our work is 
comparable with BDS-pga with better results in several circuits. 
Yet BDS-pga can perform better in XOR-intensive circuits like 
alu2 or comp. 
V 
CHAPTER 4. LUT MINIMIZATION BY REWIRING 37 
V 
REWIRE + CutMap BDS-pga + FlowMap 
Circuit new d opt. n delay LUTs 
comp 4 21 3 14 
~b9-n2 3 37 — 3 — 40 
C880 7 ~ 92 8 103 
alu2 ~~9 118 4 41 一 
duke2 4 120 8 173 
rot — 7 213 10 223 
Table 4.10: Comparison between REWIRE on CutMap and BDS-pga 
• End of chapter. 
Chapter 5 
Conclusion 
We proposed a new approach to optimize FPGA technology map-
ping by using rewiring techniques. Our approach can effectively 
reduce the number of LUTs by 15 - 18% on average and it works 
even better with large circuits where a lot of alternative wires can 
be found. Our result is already better than all network-flow-based 
mapping algorithms found in literature and also very comparable 
to logic-based mapping algorithms. 
We also conducted experiments to verify the power of various 
rewiring algorithms on the LUT minimization problem. We found 
that REWIRE is the most powerful technique which can on av-
erage reduce more than 20% of LUTs in initial mapping solution. 
However, when design time is a concern, we recommend the use of 
38 
CHAPTER 5. CONCLUSION 39 
RAMFIRE since it can reduce 15% of LUTs in one-fifth of CPU 
time used by REWIRE. 
The thesis reveals our preliminary effort in apply logic transfor-
mation with technology mapping and the result is clearly encour-
aging. The LUT reduction is advantageous to all later physical 
design process after technology mapping. Therefore we believe 
this coherent approach is a very good and challenging research 
area and we truly hope more effort will be made in the future. 
• End of chapter. 
Bibliography 
1] R. E. Bryant. Graph-based algorithms for Boolean function 
manipulation. IEEE Trans, Com/put., C-35(8):677-691, Aug. 
1986. 
2] C. W. Chang and M. Marek-Sadowska. Single-pass redun-
dancy addition and removal. In ICC AD, pages 606-609, 2001. 
3] C. W. Chang and M. Marek-Sadowska. Who are the alter-
native wires in your neighborhood (alternative wires identifi-
cation without search). In Great Lakes Symposium on VLSI, 
pages 103-108, 2001. 
4] S. C. Chang, K. T. Cheng, N. S. Woo, and M. Marek-
Sadowska. Post-layout logic restructuring using alternative 
wires. IEEE Trans. Computer-Aided Design, 6:587-596, June 
1997. 
5] S. C. Chang, L. P. P. P. van Ginneken, and M. Marek-
Sadowska. Fast boolean optimization by rewiring. In Proc. 
IEEE Int'l Conf. on Computer-Aided Design, pages 262-269, 
Nov. 1996. 
6] K. C. Chen, J. Cong, Y. Ding, A. B. Kahng, and P. Trajmar. 
Dag-map: Graph-based fpga technology mapping for delay 
minimization, pages 7-20, Sept. 1992. 
7] D. I. Cheng, C. C. Lin, and M. Marek-Sadowska. Circuit 
partitioning with logic perturbation. In Proc. Int. Conf. on 
Computer Aided Design, pages 650-655, Nov. 1995. 
40 
BIBLIOGRAPHY 
8] J. Cong and Y. Ding. Beyond the combinatorial limit in 
depth minimization. In Proc. IEEE Int'l Conf. on Computer-
Aided Design, pages 110—114，1993. 
9] J. Cong and Y. Ding. Flowmap: An optimal technology map-
ping algorithm for delay optimization in lookup-table based 
fpga designs. IEEE Trans. Computer-Aided Design, 13:1-12, 
June 1994. 
10] J. Cong and Y. Hwang. Simultaneous depth and area mini-
mization in lut-based fpga mapping. In Proc. ACM/SIGDA 
International Symposium on FPGAs, 1995. 
11] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Mur-
gai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton 
and A. Sangiovanni-Vincentelli. SIS: A system for sequential 
circuit synthesis. Technical report, 1992. 
12] M. A. Iyer and M. Abramovici. Fire: A fault-independent 
combinational redundancy identification algorithm. IEEE 
Trans. VLSI Syst, 4(2):259-301, June 1996. 
13] Y. M. Jiang, A. Krstic, K. T. Cheng, and M. Marek-
Sadowska. Post-layout logic restructuring for performance 
optimization. In Proc. of Design Automation Conf” pages 
662-665, 1997. 
14] P. K. N. Vemuri and R. Tessier. Bdd-based logic synthesis for 
lut-based fpgas. ACM Transactions on Design Automation 
of Electronic Systems, 7(4):501-525, Oct. 2002. 
15] Y. L. Wu, C. C. Cheung, D. I. Cheng, and H. Fan. Further 
improve circuit partitioning using gbaw logic perturbation 
techniques. IEEE Trans. VLSI Syst, ll(3):451-460, June 
2003. 
16] Y. L. Wu, W. Long, and H. Fan. A fast graph-based alter-
native wiring scheme for boolean networks. In International 
VLSI Design Conference, pages 268-273, 2000. 
. - ：





• . . . 








 > . 
. .
 .
 • • . 
. .
 , . . . 
. 、 •
 • -
• . . - , 
• • •
 •










 . : :
々 r 。 " v ; . , : : - -
. .
 .
 . - 、 •














 “ - : . 、 . . -
‘；






































































 . / . • . . . : 、
 >




、 r r s :
 " V
一 、 力





 . . .
 i 、. .
 -






 - r 
- .-.,-
 -





 . . • . 
• :





















 . . . 
CUHK L i b r a r i e s 
II瞧^^ 
004270408 
