Logic perturbation based circuit partitioning and optimum FPGA switch-box designs. by Cheung, Chak Chung. & Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Logic Perturbation Based Circuit Partitioning 
and Optimum FPGA Switch-Box Designs 
C H E U N G Chak Chung 
A Thesis Submitted in Partial Fulfillment 
of the Requirements for the Degree of 
Master of Philosophy 
in 
Computer Science and Engineering 
© T h e Chinese University of Hong Kong 
June, 2001 
The Chinese University of Hong Kong holds the copyright of this thesis. 
Any person(s) intending to use a part or the whole of the materials in this 
thesis in a proposed publication must seek copyright release from the Dean of 
the Graduate School. 
— - 。 
d � 丨 0鍵 
II • 丨 • 
Abstract 
In this thesis, a framework which integrates an efficient Graph-Based Alterna-
tive Wiring ( G B A W ) technique and state-of-the-art Multi-level hypergraph 
partitioner (hMETIS-Kway) for multi-way circuit partitioning is proposed. 
Moreover, the routability of Hyper-Universal Switch Box is examined by recon-
figuring the switch-box architecture of well-known Field-Programmable Gate 
Arrays (FPGAs) router V P R . 
Efficient circuit partitioning is gaining more and more importance with 
the increasing size of modern circuits. Conventionally, circuit partitioning is 
solved without altering the circuit by modeling a circuit as a hypergraph for 
the ease of applying graph algorithms. However, there exists rooms for fur-
ther improvement on even optimum hypergraph partitioning results, if logic 
information can be applied for circuit perturbation. Such logic transformation 
based partitioning techniques are usually more complicated and harder to for-
mulate. A multi-way partitioning framework which can couple any hypergraph 
partitioner is presented. This framework gives a novel logic perturbation tech-
nique for further improvement over very excellent partitioning results. This 
approach can integrate with any graph partitioner. Experiments on 2-, 3-, 
4-, and 5-way partitionings for various circuits of different sizes from M C N C 
benchmarks were performed. The experimental results showed that this par-
titioning approach can achieve a further 15% reduction in cut size for 2-way 
partitioning with an area penalty of only 0.33% over existing state-of-the-art 
graph partitioner. 
i 
An F P G A switch box is said to be universal (hyper-universal) if detailed 
routing can be performed on all possible surrounding 2-pin (multi-pin) net 
topologies which satisfy the global routing density constraints. A switch box 
is optimum if it is hyper-universal and the switches inside is minimum. It has 
been shown that if the net topology is restricted to 2-pin nets, then a 2-D (4-
way) switch box can be built to be universal with only 6W switches, where W 
is the global routing channel density. A previous work has constructed a for-
mal mathematical model of this optimum design problem for switch boxes with 
arbitrary dimensions, and given a scheme to produce hyper-universal designs 
with less than 6.7W switches for 4-way F P G A switch boxes. The investigation 
of the most common 4-way switch box case is presented, and gives new theo-
retical results followed by extensive experimental justification. It is shown that 
such an optimum switch box can be built with a very low number of additional 
switches beyond 6 W for today's practical range of low W (e.g. just 6W plus 
1 or 2 additional switches for W up to 7). Even for arbitrary large PF, the 
bound can be shown to be under 6MW. To make experimental comparison, 
experiments are conducted by running today's published best F P G A router 
V P R on large benchmarks for the popular disjoint structure and the proposed 






























First of all, I would like to express m y greatest gratefulness to m y supervisor, 
Professor David Yu-liang W u for his invaluable suggestions and comments on 
different research topics. His support and guidance in the past two years 
helped m e to work in the right direction in academic area and social issues. I 
am greatly impressed by his working style which work hard and never give up. 
I learnt “nothing is impossible to achieve”. I would also like to thank for his 
support and arrangement for the study in Chip Implementation Center (CIC), 
Taiwan 2000 summer. Moreover, I acquired lot of experiences and gained 
interests in Computer-Aided Design automation through our discussions. 
I greatly appreciate the members of m y dissertation committee, Professor 
Philip Heng-wai Leong, Professor Kin-hong Wong and Professor Hong-bing 
Fan for their useful feedback and suggestions. I would like to specially thank 
m y final year supervisor, Professor Philip Heng-wai Leong for introducing me 
to Professor David Yu-liang W u and teaching m e in hardware design. I am 
greatly indebted to Dr. David Ihsin Cheng, Professor Hong-bing Fan and 
Professor Jiping Liu for the collaboration works with them. I would also like 
to thank Professor Chak-kuen Wong and Professor Evangeline Fung-yu Young 
for their advice and courses which enriched m y background knowledge. 
I would also like to thank m y research partner Mr. Cliff Chin-ngai Sze for 
his encouragement and collaboration work on the development of G B A W . I 
would like to show m y appreciation to Dr. Xao-long Yuan for his assistance in 
m y research work. 
iii 
Many thanks go to Professor George Karypis of University of Minnesota 
and the authors of METIS and Professor Jonathan Rose of University of 
Toronto and the authors of V P R for their prominent works on Partitioning 
and F P G A Routing. 
Thanks also go to m y marvelous friends in R o o m 1026 who helped me 
to solve many problems and made m y life more enjoyable and unforgettable. 
Especially, I would express m y appreciation to Mr. Ivan Ka-ho Leung, Mr. 
Hiu-yung Wong, Miss Wing-seung Yuen, Mr. Norris Monk-ping Leong and 
Miss Polly Po-man W a n for their frank warm-hearted help. I would like to 
thank for all the friends who help m e and make m y life more colorful. 
M y unplumbed gratitude goes to m y parents, sister and brother-in-law 
for their endless love and encouragement and m y aunt Kit-chi Wong for her 
support and understanding. Last but not least, I would express m y love to m y 
best friend, Miss Victoria Chor-kiu Lee for her sacrifices and patience. They 
all brought untold happiness to m y life. 
iv 
VITA 
M a y 23, 1977 Born, Fujian, China 
Education 
B.Eng. in Computer Engineering, The Chinese University of Hong Kong 
Publications 
P.K. Tsang, C.C. Cheung, K.H. Leung, T.K. Lee and and P.H.W. Leong， 
"An Asynchronous Forth Microprocessor", Proceedings of the IEEE Region 
10 Conference (TENCON), Vol. 2, pp. 1079-1082, 1999. 
Y.L. W u , C.N. Sze, C.C. Cheung, "On Improved Graph-Based Alternative 
Wiring Scheme for Multi-Level Logic Optimization", IEEE International Con-
ference of Electronics Circuits and Systems (ICECS), 2000. 
C.C. Cheung, Y.L. W u , and D. Ihsin Cheng, "Further Improve Circuit Par-
titioning using G B A W Logic Perturbation Techniques" Proceedings of IEEE 
Design Automation and Test in Europe (DATE), pp. 233-239, March. 2001. 
Munich. 
H.B. Fan, J. Liu, Y.L. W u , and C.C. Cheung, “On Optimum Switch Box 
Designs for 2-D FPGAs" to appear in Proceedings of I E E E / A C M Design Au-
tomation Conference (DAC). June. 2001. Las Vegas. 
Y.L. Yu, C.C. Cheung, D. Ihsin Cheng and H.B. Fan, "Further Improve Cir-
cuit Partitioning using G B A W Logic Perturbation Techniques" under review 




• • • 
Acknowledgments m 
Vita V 
Table of Contents vi 
List of Figures x 
List of Tables xiv 
1 Introduction 1 
1.1 Motivation 1 
1.2 Aims and Contribution 4 
1.3 Thesis Overview 5 
2 VLSI Design Cycle 6 
2.1 Logic Synthesis 7 
2.1.1 Logic Minimization 8 
2.1.2 Technology Mapping 8 
2.1.3 Testability 8 
2.2 Physical Design Synthesis 8 
2.2.1 Partitioning 9 
vi 
2.2.2 Floorplanning Sz Placement . 10 
2.2.3 Routing 11 
2.2.4 Compaction, Extraction & Verification 12 
2.2.5 Physical Design of F P G A s 12 
3 Alternative Wiring 
3.1 Introduction 13 
3.2 Notation and Definitions 15 
3.3 Application of Rewiring 17 
3.3.1 Logic Optimization 17 
3.3.2 Timing Optimization 17 
3.3.3 Circuit Partitioning and Routing 18 
3.4 Logic Optimization Analysis 19 
3.4.1 Global Flow Optimization 19 
3.4.2 O B D D Representation 20 
3.4.3 Automatic Test Pattern Generation (ATPG) 22 
3.4.4 Graph Based Alternative Wiring ( G B A W ) 23 
3.5 Augmented G B A W 26 
3.6 Logic Optimization by using G B A W 28 
3.7 Conclusions 31 
4 Multi-way Partitioning using Rewiring Techniques 33 
4.1 Introduction 33 
4.2 Circuit Partitioning Algorithm Analysis 38 
4.2.1 The Kernighan-Lin (KL) Algorithm 39 
4.2.2 The Fiduccia-Mattheyses (FM) Algorithm 42 
4.2.3 Geometric Representation Algorithm 46 
4.2.4 The Multi-level Partitioning Algorithm 49 
4.2.5 Hypergraph M E T I S - hMETIS 51 
4.3 The G B A W Partitioning Algorithm 53 
vii 
4.4 Experimental Results 56 
4.5 Conclusions 58 
5 Optimum FPGA Switch-Box Designs - HUSB 62 
5.1 Introduction 62 
5.2 Background and Definitions 65 
5.2.1 Routing Architectures 65 
5.2.2 Global Routing 67 
5.2.3 Detailed Routing 67 
5.3 F P G A Router Comparison 69 
5.3.1 C G E 69 
5.3.2 S E G A 70 
5.3.3 T R A C E R 71 
5.3.4 V P R 72 
5.4 Switch Box Design 73 
5.4.1 Disjoint type switch box (XC4000-type) 73 
5.4.2 Anti-symmetric switch box 74 
5.4.3 Universal Switch box 74 
5.4.4 Switch box Analysis 75 
5.5 Terminology 77 
5.6 Hyper-universal (4，VK)-design analysis 82 
5.6.1 Hs is an optimum (4, 3)-design 84 
5.6.2 H4 is an optimum (4,4)-design 88 
5.6.3 Hi is a hyper-universal (4, i)-design for i = 5,6,7 . . . . 90 
5.7 Experimental Results 92 
5.8 Conclusions 95 
6 Conclusions 99 
6.1 Thesis Summary 99 
6.2 Future work 100 
viii 
6.2.1 Alternative Wiring 100 
6.2.2 Partitioning Quality 100 
6.2.3 Routing Devices Studies 100 
Bibliography 101 
A 5xpl - Berkeley Logic Interchange Format (BLIF) 115 
B Proof of some 2-local patterns 122 
C Illustrations of FM algorithm 124 
D HUSB Structures 127 
E Primitive minimal 4-way global routing Structures 132 
> 
ix 
List of Figures 
1.1 Moore's Law vs Industrial C P U size (from Intel) 2 
2.1 The Path to Silicon Chip 7 
2.2 VLSI Physical Design Cycle 9 
2.3 Example of Bi-partitioning 10 
2.4 Example of Floorplanning 11 
2.5 Example of Channel Routing 11 
2.6 Example of Compaction 12 
3.1 Example of alternative wiring 14 
3.2 Configuration of a sub-network 16 
3.3 Example of Timing Optimization by alternative wiring 18 
3.4 Example of Circuit Partitioning 18 
3.5 Example of Routing 19 
3.6 Example of Global Flow Optimization 20 
3.7 Flow graph for the circuit in 3.6 20 
3.8 Result of fanout re-connections 21 
3.9 O B D D representing / = xl • x2 + x3 21 
3.10 0-local pattern in G B A W 24 
3.11 1-local patterns in G B A W 25 
3.12 2-local patterns in G B A W 26 
3.13 Example of one 2-local Pattern Verification 27 
3.14 The Logic Optimization Process by G B A W and SIS 30 
V 
4.1 Circuit partitioning by rewiring 35 
4.2 Example of K L algorithm - First pass 42 
4.3 Example of K L algorithm - Second pass 42 
4.4 Bucket list structure 44 
4.5 Example of F M algorithm - Initial k Final 45 
4.6 A simple graph and its Laplacian 48 
4.7 Result of spectral partitioning and linear ordering of the vertices 
using the second eigenvector 48 
4.8 Result of spectral partitioning by using the modified Laplacian 
Q which incorporates the actual vertex size information 49 
4.9 The various phases of the Multi-level graph bisection 50 
4.10 Various hyper edge coarsening schemes 52 
4.11 Perturbations and gains 54 
4.12 Extract the Graph domain information for hMETIS-kway . • • . 55 
4.13 Algorithm of GBAW-Partitioner (GP) 61 
5.1 The architecture of a 2D F P G A 63 
5.2 A Switch module with i^ s 二 3 66 
5.3 A Connection module with Fc = 2 66 
5.4 Percentage of completion versus Fc for the circuit B N R E . • • • 67 
5.5 Example of Global Routing in F P G A 68 
5.6 Example of Detailed Routing in F P G A 68 
5.7 CGE: The F P G A Model with one Global Route 69 
5.8 CGE: The Coarse Graph and its Expanded Graph 70 
5.9 SEGA: The F P G A Model with one Global Route 71 
5.10 SEGA: The Coarse Graph with Express Edge 72 
5.11 XC4000-type Switch-Box Architectures for W = 3 and 4 . . . . 73 
5.12 Disjoint relationship between different tracks for W = 3 73 
5.13 Anti-symmetric Switch-Box Architectures for W 二 3 and 4 . . • 74 
xi 
5.14 Universal Switch-Box Architectures for W = 3 and 4 74 
5.15 Six different types of connections 75 
5.16 Example of routing over Disjoint Switch-box 75 
5.17 Example of routing over Anti-symmetric Switch-box 75 
5.18 Example of routing over Universal Switch-box 76 
5.19 Another comparison between Disjoint and Universal Switch-box 76 
5.20 (2,4) and (3,4) design 78 
5.21 A family of hyper-universal 4-way designs 82 
5.22 Structure of H7 83 
5.23 Detailed Routing of GR (Case 1) 86 
5.24 Detailed Routing for Table 3.2 87 
5.25 Detailed Routing of density 2 Global Routing in H^ 88 
5.26 Detailed Routing of GR in H^ 89 
5.27 Detailed Routing of GR in H^ 89 
5.28 (4,4)-Global Routing 91 
5.29 The structure of Disjoint S-Box, W = l l 93 
5.30 The structure of H'USB(4,n) (H'USB(4,7) and H'USB(4,4)) . . 93 
5.31 Detailed Routing by using Disjoint S-Box 94 
5.32 Detailed Routing by using H'USB S-Box 95 
5.33 Detailed Routing by using Disjoint S-Box on G R l 96 
5.34 Detailed Routing by using Hyper-Universal S-Box on GR\ . . . 96 
5.35 Routing result of e64 by using Disjoint S-Box, W = 9 97 
5.36 Routing result of e64 by using H'USB S-Box, W二8 98 
B.l New 2-local pattern 1 122 
B.2 New 2-local pattern 2 123 
B.3 New 2-local pattern 3 123 
B.4 New 2-local pattern 4 123 
C.l Example of F M algorithm - First pass 125 
xii 
C.2 Example of F M algorithm - Second pass 126 
D.l HI S-Box 127 
D.2 H2 S-Box 128 
D.3 H3 S-Box 128 
D.4 H4 S-Box 129 
D.5 H5 S-Box 129 
D.6 H6 S-Box 130 
D.7 H7 S-Box 130 
D.8 H8 S-Box 131 
D.9 H9 S-Box 131 
E.l GRl 132 
E.2 GRl,, GRl,, G R l , 132 
E.3 GRl,, GRl,, GRl,, G R l , 133 
E.4 GRl,, GRl,, GRl,, GRl,, GRl,, G R l , 133 
E.5 GRl,, GRl,, GRl,, GRl,, GRl,, G R l , 134 
E.6 GRl,^ GRl,, GRl,, GRl,,, 134 
E.7 G R j i, G R l G R l G R l ^  135 
E.8 135 
xiii 
List of Tables 
3.1 2-local pattern comparison 29 
3.2 Comparison between R A M B O and G B A W 31 
3.3 Results for logic optimization 32 
4.1 Alternative wire statistics of R A M B O and G B A W 57 
4.2 2-way partitioning results of hMetis-Kway & F M & G P 58 
4.3 3-way partitioning results of hMetis-Kway & F M & G P 59 
4.4 4-way partitioning results of hMetis-Kway & F M & G P 60 
4.5 5-way partitioning results of hMetis-Kway & F M & G P 60 
5.1 Comparison between different F P G A Routers 72 
5.2 Detailed routing result by C G E over different switch-box . • . • 76 
5.3 Primitive minimal 4-way global routings 81 
5.4 Primitive minimal 4-way global routings which are not a iso-
morphic 85 
5.5 Channel widths required for different benchmark circuits Fc = 





Integrated Circuit (IC) technology has evolved for many times from Small 
Scale Integration (SSI) to Very Large Scale Integration (VLSI). ICs are widely 
used in computers, electronic devices and even implantation of human body. 
According to Moore's law, the capacity of a chip is doubled for every 18-24 
months of the previous chip as shown in Figure 1.1. Due to the huge number 
of components of circuit design, it is not practical for designer to repeat the 
fabrication process. As a result, Computer-Aided Design (CAD) tools [SW96 
are developed for physical design which is one of the critical step in VLSI 
design cycle. The goal of physical design is to fit the design into a compact 
area for fabrication. Physical design is a very complex process and thus it 
is broken into many different stages from partitioning to routing. The netlist 
extraction by layout synthesis, design verification and validation check are also 
performed on the layout during physical design stage. This thesis addresses 
the two important steps of physical design, partitioning and routing especially 
for Field-Programmable Gate Arrays (FPGAs). 
In the era of deep sub-micron, many problems arise and designers have to 
tackle them such as the domination of interconnect delay, failure to meet timing 
requirement and insufficient rout ability on different devices. By considering 
1 
Chapter 1 Introduction 2 
wm •搬 mm i 細 , 
i { ) f i 
！ i ； : , - 5 0 0 
9U«V> —-„l …-
I I ； 'Ji'fUU 
: i S 转 
1M i 丨 丨 J 广‘'i^rtstirrr ^ 
I I 泰...m^m 
： i ( 
1PK i QA 
後 ,‘终。’ ： I _ 
=•、赛 丨 丨 I I 
Figure 1.1: Moore's Law vs Industrial C P U size (from Intel) 
more circuit information in the physical design stage, we can improve the over-
all performance, get a better utilization of chip area, better yield and reduce 
the cost. The idea behind is to apply Alternative Wiring techniques [EC95] in 
order to remove some wires along the critical path, or replace some wires in 
congested area with those in uncongested area. Alternative wire allows adding 
and removing wires inside the circuit without changing the circuit functionality. 
Thus, it can explore a large degree of freedom in many different physical design 
problems, such as logic optimization [EC95], circuit partitioning [CLMS95] and 
timing optimization [EEOU96 . 
The idea of VLSI circuit partitioning is to divide a circuit into smaller sub-
circuits (clusters). The objective is to minimize the interconnection between 
clusters with balance constraints. Due to the increase of the VLSI design com-
plexity, circuit partitioning is getting more and more crucial. The system may 
consist of several hundred million transistors and most of the algorithms in 
physical design cycle and logic synthesis can not handle such large problem 
size or become ineffective. As a result, circuit partitioning becomes the first 
step in physical design cycle. A large system becomes smaller manageable 
components which can be solved in a top-down hierarchical design methodol-
ogy. The feasibilities of placement, floorplanning, global routing and detailed 
routing also depend on the quality of the partitioning solution. 
Nowadays, in deep sub-micron VLSI design, the interconnection delays 
Chapter 1 Introduction 3 
dominate the gate delays. A larger die size means a longer routing delay on 
chip. A n effective partitioning algorithm becomes the fundermental key to 
improve system performance. There are many different approaches to tackle 
this partitioning problem including move-based approaches, geometric rep-
resentations, combinatorial formulations and clustering approaches [AK95b . 
Recently, a new trend is to integrate existing partitioning algorithms with 
multi-level approach [ZSC96, KAKS97]. Most of the algorithms model the 
circuit as graph (hypergraph) and seek for optimal solution. However, these 
methods do not consider the logic function of the circuit, that is the function 
performed by each node. 
Alternative wiring is a technique to transform a circuit into another but 
functionally equivalent circuit. It can be well-used for post-layout logic syn-
thesis, logic optimization and other C A D problems. Graph Based Alternative 
Wiring ( G B A W ) is newly introduced and it is very efficient in identifying al-
ternative wires. This motivates the integration of G B A W technique for circuit 
partitioning. 
Nowadays, logic designers have several good logic devices to realize the dig-
ital logic functions. These devices include Small-Scale Integration (SSI) chips 
such as 7400 series, Application-Specific Integrated Circuits (ASICs) and Pro-
grammable Logic Devices (PLDs) such as Simple PLDs (SPLDs), Complex 
PLDs (CPLDs) and Field-Programmable Gate Arrays (FPGAs) [BFRV92； • 
With the huge logic resources, short turn-around time and user-programmability, 
F P G A s are easy for logic designer to modify their sophisticated design and re-
duce the manufacturing time. After the first emergence into market, F P G A s 
have been continuously growing and in the leading place amongst other logic 
devices. A typical F P G A composes of three major components: logic modules, 
routing devices and Input/Output (I/O) devices. Routing resources compose 
of pre-fabricated wire segments and programmable switches (switch-box). 
Chapter 1 Introduction 4 
Switch modules are the most important components of the routing re-
sources in F P G A s . Programmable switches usually have high resistance and 
capacitance, and they consume large amount of area [ea88, ea89]. Thus an ex-
cessive number of switches means a slower and larger design. Higher routability 
means a smaller number of tracks to complete the design routing and reduces 
the size of F P G A . As a result, it is desirable to investigate the routability of 
switch-box for a better area performance design. 
The rewiring technique can also be applied to layout-driven logic synthesis. 
G B A W technique is able to locate alternative wires such that the congested 
wire can be removed, thus the whole design can become routable within the 
smaller number of tracks. 
1.2 Aims and Contribution 
There are two main parts in this thesis: circuit partitioning by using logic 
perturbation technique and F P G A routing employing Hyper Universal Switch 
Box (HUSB). 
Firstly, the identifying alternative wiring power of the G B A W is further 
improved by introducing new rewiring patterns. The comparison of logic op-
timizations between famous ATPG-based R A M B O and augmented G B A W 
is also presented. Based on new rewiring technique, logic optimizations on 
M C N C benchmark circuits are successfully conducted and the integration of 
G B A W with the well-known multi-level graph partitioner (hMETIS-kway) is 
also done. The results showed that it is possible to obtain an improved parti-
tioning result from near optimal solution. 
Secondly, the construction and mathematical model of Hyper Universal 
Switch Box (HUSB) are given. Further investigation of the quality of H U S B 
is done by giving new theoretical results and adopting efficient F P G A router 
V P R which is used to generate experimental justification. A n encouraging 
Chapter 1 Introduction 5 
result over the comparison between the popular disjoint structure (XC4000) 
and the proposed designs is obtained. 
Our proposed design flow for F P G A s has two stages. In the first stage, the 
interal switch-box architecture is modified in order to achieve high routability. 
In the second stage, the rewiring engine G B A W will be integrated into the 
whole F P G A design flow which is not yet completed. 
1.3 Thesis Overview 
This thesis is organized as follows. Chapter 2 presents the background of 
physical design. Chapter 3 introduces the alternative wiring engine ( G B A W ) 
and shows the comparison between R A M B O and G B A W on logic optimization. 
Chapter 4 discusses the new approach to further improve the partitioning 
solution by hMETIS-kway. Chapter 5 addresses the quality of F P G A routing 
result by V P R with enhanced H U S B (H'USB). Finally, Chapter 6 concludes 
this thesis and discusses the future works. 
Chapter 2 
VLSI Design Cycle 
A simple VLSI design cycle starts from system specification of a VLSI chip 
and ends as a tested and packaged chip. The design flow involves architectural, 
functional, logic, circuit design and the last step before fabrication is physical 
design which collects the net-list information of a circuit into layout. At the 
beginning of the VLSI design process, the user can get the specifications for the 
size, speed, power and the functionality of the VLSI system [She98, Mic94]. In 
the next step, the VLSI designers are required to design a system which meet 
all these constraints. Nowadays, the Hardware Design Language (HDL) such 
as V H D L and Verilog are commonly used in VLSI design. H D L language are 
used to express the circuit and timing information for simulation and verifica-
tion. Since the design scale is large and it is impossible to design by hand, the 
Computer-Aided Design (CAD) tools become crucial and widely used for a fast 
and correct design. The typical design flow is shown in Figure 2.1. Logic syn-
thesis is a step to translate the register transfer level (RTL) of a digital system 
to the gate net-list information. In the other words, logic synthesis is applied 
to the extraction from R T L language and deals with the logic optimization, 
testability and verification. In physical design step, there are many C A D tools 
developed for automatic layout, design rule check and extraction [She98]. A 
tested VLSI chip will be available after all these complex stages. 
6 
Chapter 2 VLSI Design Cycle 7 
System ^ VLSI system information 
Specification 
1 r 
High-level ^ VHDL or Verilog descriptions 
Design 
1 r 
Logic ^ generate optimal logic netlists 
Synthesis (minimum delay, area, power) 
Physical ^ Automatic layout tools. 
Design design rule checker, extractors 
Fabrication ^ F r o m Wafer to circuit 
1 r 
A Tested ^ ship to market 
Chip 
Figure 2.1: The Path to Silicon Chip 
2.1 Logic Synthesis 
Logic synthesis enables the mapping from R T L level to an optimized gate-level 
description. The generated description must fit the functional specification and 
user constraints. It basically begins with logic optimization and provides an 
efficient implementation to a high quality silicon chip. With the aid of C A D 
tools, the circuit designers are allowed to work at higher levels of abstraction 
and easily modify their design. Logic synthesis tools such as "Synopsys" not 
only reduce the logic but also perform timing analysis, technology mapping 
and ensure the testability of the generated net-list. In the process of logic 
synthesis, it aims at: 
• minimizing the layout chip area. 
• minimizing the critical path delay. 
• maximizing the testability such that all test vector can be found. 
Chapter 2 VLSI Design Cycle 8 
2.1.1 Logic Minimization 
Logic synthesis are usually divided into two-level synthesis (PLA) and Multi-
level synthesis [BHSV90]. Therefore, many optimization methods are devel-
oped and can solve the problem in reasonable computation time. The algo-
rithms include the minimization of don't cares [BBH+88], global flow [BT91], 
O B D D representation [Bry86] and rewiring techniques [CE93, WLFOO] which 
change the original circuit net-list information. 
2.1.2 Technology Mapping 
Technology mapping [Keu87] is the process to translate the abstract descrip-
tion such as H D L languages into a hardware description by using a specific 
technology. The result is a mapped circuit. Therefore, one description lan-
guage can be mapped into different architectures by varying the technology 
mapping library. 
2.1.3 Testability 
The circuit which cannot provides a complete single stuck-fault test vector 
is called an untestable circuit. Testability is an important criteria for circuit 
design, otherwise the fabricated chip cannot locate the fault wire. 
2.2 Physical Design Synthesis 
Physical design is also called as layout phase which is the process to determine 
the exact location for the logic devices and the interconnection between differ-
ent devices. Physical design is a complex process and it can be broken down 
into several steps which are shown in Figure 2.2. Many research groups focus 
on each of the sub-steps in order to obtain high performance design [She98 . 
Chapter 2 VLSI Design Cycle 9 
The cost of fabrication is very high, therefore the objective for physical design 





Physical I  
Design ^ ^.^. . Partitioning 




















Figure 2.2: VLSI Physical Design Cycle 
2.2.1 Partitioning 
Partitioning is the first step in physical design owing to the increase of prob-
lem size. Many of the algorithms in the latter steps become infeasible, thus 
partitioning breaks the original problem into small sub-problems and solves 
eventually. The output of partitioning is a set of clusters which contains a 
Chapter 2 VLSI Design Cycle 10 
set of logic gates, and the interconnections between clusters. In Figure 2.3, it 
shows a good and a bad bi-partitioning. The partitioning result for the good 
one is reduced from 9 to 1. 
s-働 
Cut \ Z Cut 
interconnections 
Bad bisection Good bisection 
Figure 2.3: Example of Bi-partitioning 
2.2.2 Floorplanning & Placement 
The objective of floorplanning is to arrange the components of an IC in order 
to have a compact layout. The two floorplans show in Figure 2.4 are slicing 
and non-slicing floorplan. Slicing floorplan can be recursively cut horizontally 
or vertically but non-slicing cannot. In the placement stage, the block to be 
placed are layouted and have pin assignment. After placement, the IC will 
have minimum area such that the interconnections between modules can be 
completely routed. Since a packed placement solution does not guarantee a 
routable design, the placement step is usually taken again after routing. 
Chapter 2 VLSI Design Cycle 11 
Slicing floorplan Non-slicing floorplan 
Figure 2.4: Example of Floorplanning 
2.2.3 Routing 
In order to connect the wires between blocks, routing step is usually done in 
two steps as the global routing and detailed routing. Global routing will specify 
the region used for detailed routing and detailed routing will determine the 
exact location for each connection. It aims at shorter wire length and less 
resources used. Figure 2.5 shows a failure channel routing example. 
Terminals 
\ 2 3 3 4 
fH it • • • # 
• congestion 
i ----- 二-----------' Z 
_ o ^ \ / i 1 � 
4 \ / 1 2 0 
Trunks 
Figure 2.5: Example of Channel Routing 
Chapter 2 VLSI Design Cycle 12 
2.2.4 Compaction, Extraction & Verification 
Compaction is a step to reduce the chip's total area and signal delay between 
modules of the circuit. Example of x-y coordinate compaction is shown in 
Figure 2.6. The circuit extraction step [BB91] consists of determining the 
circuit connectivity and calculating various electrical parameters such as re-
sistances and capacitances. The last step before fabrication is to verify the 








/ / t t 
’ , 
Before Compaction After Compaction 
Figure 2.6: Example of Compaction 
2.2.5 Physical Design of FPGAs 
The time-to-market for even ASIC chip is unacceptable for many applications, 
therefore the short turn-around time of F P G A s takes advantage and gain more 
popularity. Since the most of the physical design algorithms are inapplicable in 
FPGAs, the C A D tools development of F P G A s are separated from others. For 
example, the traditional channel routing algorithm [SW96] cannot be applied 





Alternative wiring technique has a wide range of applications such as circuit op-
timization [CE93，EC93, EC95, C M S C 9 6 , CvGLMS99], partitioning [CLMS95； 
and F P G A synthesis [CCWMS94b，CCWM97]. Alternative wiring refers to the 
reconstruction of circuit while keeping its functionality intact by addition and 
removal of wires. A wire in a circuit is redundant if its addition and removal 
does not change the functionality of the circuit. The concept of alternative 
wiring is to add a redundant wire to the circuit that in turn will make another 
wire become redundant and its removal would lead to some useful circuit trans-
formation for certain objectives. Due to the wide range of application, many re-
search groups are working on the efficiency of locating alternatives wiring. The 
traditional rewiring tools apply Automatic Test Pattern Generation (ATPG)， 
Implication analysis [KSM97] and Recursive learning [KP94] techniques in or-
der to search alternative wires. However, these kinds of techniques consume 
too much C P U expenditure, and another Graph-Based Alternative Wiring 
technique is proposed to replace the A T P G approaches. 
Consider the circuit given in Figure 3.1(a). This circuit is irredundant. 
By adding a connection from the output of gate g^ to the input of gate Qq 
(shown as a dotted line in Figure 3.1(b)), the functionality of the circuit does 
13 
Chapter 3 Alternative Wiring 14 
not change. In other words, the added connection is redundant. However, the 
addition of the connection causes two originally irredundant wires becoming 
redundant as shown in Figure 3.1(b). After removing these two wires and 
associated gates that either become floating ge or have a single fanin 々 4 and 
g、the circuit can be greatly optimized as shown in Figure 3.1(c). 





(a) irredundant circuit 
c N  
g4 ) 1 
b redundant 
e ^ 1 ^ 
^ redundant :) g ^ — ： 
^ 
f  
(b) adding a redundant wire 
e ^ ^ — — 
- g2 )—' 
c Y ^ J 。2 
: = B — — ^ p 
f — — 
(c) final irredundant circuit 
Figure 3.1: Example of alternative wiring 
Chapter 3 Alternative Wiring 15 
This chapter is organized as follows. The notation and definitions are intro-
duced in Section 3.1. The application of rewiring is presented in Section 3.2. 
In Section 3.3, logic optimization analysis is introduced. The details of Aug-
mented Graph-Based Alternative Wiring are stated in Section 3.4. Followed 
by the discussion on Logic Optimization by using G B A W and experimental 
results are presented in Section 3.5. Conclusion is drawn in Section 3.6. 
3.2 Notation and Definitions 
A combinational circuit can be represented by a Directed Acyclic Graph (DAG) 
where vertices correspond to the primary inputs (PI), primary outputs (P〇) 
and the internal gates of the circuit. PI and P〇 are nodes which have only 
outgoing edges and incoming edges respectively. A n internal node has at least 
two incoming edges and one outgoing edge and is associated with a Boolean 
function. Inverters are not considered as internal nodes, but as polarity of 
edges during logic domain perturbation. In a Boolean network, the in-degree 
of node y, denoted by d- (y), is defined as the number of edges entering y. 
The out-degree of node y, denoted by d+ (y), is defined as the number of edges 
leaving y. A node y is defined by a triplet {op,d~{y),d'^(y)), where op is the 
Boolean operator of y which can be any associative operator like A N D , O R , 
N A N D , or N O R . 
A wire is replaceable if it has at least one alternative wire. A graph con-
figuration D is used to m a p the logic function from a Boolean Network G. 
For each node rii in sub-network S in network G, rii is mapped to a triplet 
{op, Zi, 22) in D where op denotes the operator representing the Boolean func-
tion of rii and zi, i) are non-negative integers. All edges inside S are preserved, 
while the edges outside S are omitted in D. In most cases, h equals d— (jii) 
and Z2 equals d+ (rii). The element of a triplet (op, d-(y), (i+(y)) can also be 
don't care. For the first element, don't care means any operator. For the other 
Chapter 3 Alternative Wiring 16 
elements, don't care can be any positive integers. A configuration is used to 
denote a minimal pattern containing both the target and its alternative wire. 
The mapping is illustrated in Figure 3.2. 5 is a sub-network of G. Di 
and D2 are two mappable configurations of S. A A:-local pattern denotes 
a minimal sub-graph with the distance between the alternative wire and its 
target wire being k. The distance between two wires is defined as the difference 




b — _ _ > n ^ I r ^ 
i 
！ 
(a) Boolean network G and its sub-networks 
(AND,2,1) (AND,2,2) 
Di 
(b) A configuration of S, Dj 
(AND’dc’l) (AND，dc’dc) 
D2 
(c) Another configuration of S, D^ 
Figure 3.2: Configuration of a sub-network 
A wire is defined as a 2-point connection between a pair of source and sink 
nodes. W h e n a larger circuit is partitioned into two sub-circuits, we define the 
wires crossing the partitioning cut line as cut wires, a cut net is also defined as 
Chapter 3 Alternative Wiring 17 
a hyperedge connecting partitions and the cut cost as the number of partitions 
that the hyperedge connects. 
3.3 Application of Rewiring 
Alternative wiring greatly increases the flexibility of a circuit in which the wires 
of the circuit can be altered without changing the circuit functionality. Thus, 
the circuits before and after rewiring are said to be functionally equivalent. As 
a result, alternative wiring provides a wide range of applications in different 
steps of physical design automation. 
3.3.1 Logic Optimization 
As shown in Figure 3.1, the circuit size is reduced by adding and removing 
redundant wires. A smaller circuit containing the same logics always implies 
less resources to spend on each step of VLSI design. Therefore, logic opti-
mization by alternative wiring is always being the frontier of all VLSI design 
problems [CE93 . 
3.3.2 Timing Optimization 
The longest delay path (critical path) usually consists of more than one wire. 
If it is possible to break the critical path with wire which is not on the critical 
path. Alternative wiring technique is used to improve the performance as 
shown in Figure 3.3. The target wire is replaced by other alternative wires 
inside component 1 or 2. By several iterations which break the critical path, the 
longest path might be greatly reduced and high performance gain [EEOU96 
can be achieved. 
Chapter 3 Alternative Wiring 18 
component 1 component 2 
target wire 
I I \ 
I N ,, * • 
z 4 � 
critical / 、、、 
、、 path ‘ 、 \ i \ / 
\ / \ / \ / 1—1 / * \ / 
alternative wire 
Figure 3.3: Example of Timing Optimization by alternative wiring 
3.3.3 Circuit Partitioning and Routing 
Figure 3.4 shows the number of wires along the cut line can be reduced by re-
placing with other wires not in cut set. The application of rewiring over circuit 
partitioning will be further investigated in the next chapter. Rewiring tech-
niques can also be applied to resolve the unroutable F P G A design [ C C W M 9 7 
as shown in Figure 3.5. 
L . target wire , 。 




z 岸 ^ ����� 
/ � � � 
\ ^ / 、、< 
\ . ！ 
\ / \ / 
LJ cut line / \ / 
alternative wire 
Figure 3.4: Example of Circuit Partitioning 
Chapter 3 Alternative Wiring 19 
target wire 
|\ I I I I I 










Figure 3.5: Example of Routing 
3.4 Logic Optimization Analysis 
3.4.1 Global Flow Optimization 
Global Flow Optimization [BT91] allows the reconnection of immediate fanouts 
of a node to the input of other nodes. By modeling the problem as flow graph 
and solving the problem by maxfiow-mincut algorithm, the original circuit 
size can be reduced. As shown in Figure 3.6, one of the node s is picked 
from the circuit and has the following circuit diagram. By proper transfor-
mation [BT91], a flow graph is built and is solved by maxflow-mincut method 
as shown in Figure 3.7. The maxflow-mincut solution is 仍8, gg and gg. As a 
result, the original six connections from s is reduced to three without changing 
the circuit functionality as shown in Figure 3.8. Logic optimization is done by 
constructing the flow graph on every node in the circuit. 
Chapter 3 Alternative Wiring 20 
⑴ 
- 0 . � J > ^ s 




Figure 3.6: Example of Global Flow Optimization 
Figure 3.7: Flow graph for the circuit in 3.6 
3.4.2 OBDD Representation 
In [MKLC89], the Transduction (Transformation and Reduction) method is 
proposed for multilevel logic optimization which allows adding/deleting con-
nections and gate merging. However, the runtime and the memory storage 
exponentially increase when the number of inputs increases. It is because the 
Transduction method uses truth tables to represent the logic function. In or-
der to save C P U time and memory space, the same Transduction method is 
implemented by using Ordered Binary Decision Diagrams ( O B D D ) [Bry86 
Chapter 3 Alternative Wiring 21 
、⑴ 
= = = q nil 
J^^^LJ (。， ^ “ 
^^^^ —— 
二 ⑴ H>o-n 
_x<。） （1) ！ __ 
3 j^o [K 
£> ^ 
Figure 3.8: Result of fanout re-connections 
as the data structures for representing the logic functions. O B D D is a com-
pact logic representation and it is a reduced binary decision tree and requires 
less memory than other methods. As shown in Figure 3.9，a logic function 
f = xl ' x2 x3 is represented. 
0 1 
Figure 3.9: O B D D representing f = xl - x2 x3 
Chapter 3 Alternative Wiring 22 
3.4.3 Automatic Test Pattern Generation (ATPG) 
Nowadays, Multi-level logic synthesis becomes more practical than 2-level logic 
synthesis because they represent the majority of circuit designs [HS96]. Multi-
level circuits refer to those circuits have arbitrary number of gates between 
primary input and primary output. 2-level circuits are typically represented 
by S O P or P〇S forms, i.e. A N D plane plus O R plane. There are many 
methods which address the problem of Multi-level minimization such as con-
sidering the don't care set [BBH+88], and Automatic Test Pattern Generation 
(ATPG) [CE93]. A T P G techniques are built upon the fault models. There are 
many possible fault models and most common one is single stuck-at-0/1 fault 
model. If a wire connecting to A N D gate detects a stuck-at-1 fault, then it 
can be removed directly. A T P G will generate a vector for a stuck-at-fault test 
in the circuit. A vector means the zero or one value to all primary inputs. For 
a real circuit, if there exists a fault which is untestable that means there is no 
suitable test vector for such fault. The wire associated with this fault is said 
to be redundant. 
Based on this redundancy test, many A T P G based rewiring tools are pro-
posed. There are two main classes for A T P G tools, they are add-first based 
and target-first based A T P G techniques. Add-first means by adding one or 
some redundant wires into a circuit, some other inredundant wires become 
redundant and thus can be removed. Target-first means first choose one wire 
(target wire) in the circuit. By adding a suitable wire, this target wire becomes 
redundant and can be removed. 
By using SIS package [SSLea92] which developed by University of Califor-
nia, Berkeley, Redundancy Addition and Removal for Multilevel Boolean Logic 
Optimization ( R A M B O ) [EC95] is first proposed. R A M B O creates fault list 
for each node and removes all redundant wires. The objective is to reduce 
Chapter 3 Alternative Wiring 23 
the wire count so as the area of the circuit. R A M B O shows significant im-
provement over combinational and sequential circuit and even better results 
are obtained when incorporate with MIS-II [BRSVW87]. There are many im-
provement works and applications built upon R A M B O such as Perturb and 
Simplify [CMSC96], Rewiring [CGMS96, CvGLMS99] and post-layout in FP-
G A s [CCWM97]. 
3.4.4 Graph Based Alternative Wiring (GBAW) 
Graph-Based Alternative Wire ( G B A W ) is a newly proposed and efficient 
rewiring technique. It models a circuit as a directed acyclic graph (DAG) 
and searches alternative wires by checking graph matching between local sub-
networks and the pre-specified minimal sub-graph configurations. A configu-
ration is a minimal circuit pattern containing alternative wires within a given 
distance. Experiments show that the number of all such local minimal sub-
graph is limited. Most of the alternative wires are located topologically "near" 
to their target wires. It has been shown that about 96% of the closest alter-
native wires [WLFOO] are only 2-edge distant from their target wires. W h e n 
sub-network matches a pattern, G B A W can quickly determine the target wire 
and the corresponding alternative wires. Obviously, if Wr is an alternative wire 
of wt, then Wt is also an alternative wire of Wr. Both Wt and Wr are presented 
in a pattern. But in a sub-network, only one of them exists. In [WLFOO], it has 
shown that by using G B A W as a random perturbation engine, a competitive 
logic optimization result is obtained when comparing to R A M B O while the 
runtime is much reduced. 
0 local pattern 
In G B A W , there are more than 40 patterns for rewiring. These 40 patterns are 
grouped into three main types, 0-local, 1-local and 2-local. A 0-local pattern 
Chapter 3 Alternative Wiring 24 
is the edge distance between the target wire and alternative wire equals zero. 
O n the other words, it means a node substitution for which two nodes having 
the same logic function can be replaced by each other. If the two nodes have 
the same fanins set, they are said to be logically equivalent. Figure 3.10 shows 
the 0-local pattern in G B A W . 
(opi’k’dc) (dc,dc,dc) 
a货 z Z 
(5 
(op3，k，dc) (dc，dc，dc) 
Figure 3.10: 0-local pattern in G B A W 
1 local patterns 
There are three basic 1-local patterns in G B A W as shown in Figure 3.11. The 
bolded wire represents the target wire that can be removed if the dotted wire is 
newly added to the circuit. For example, in case 1.1, the target wire a ^ gi can 
be replaced by a 仍.Note that the operator of gi and 仍 can be changed to 
other logic gates. This change increases the search space of finding alternative 
wires in real circuit, and it reduces the runtime in identifying alternative wires. 
2 local patterns 
The alternative wire is 2-edge far away from the target wire is called 2-local 
pattern. The number of gates inside a 2-local pattern is increased as well as 
the number of possible 2-local pattern. Thus, it is necessary to include more 
2-local patterns into G B A W . In Figure 3.12, three 2-local patterns are shown. 
Chapter 3 Alternative Wiring 25 
AND AND (or NAND) 
\i，dc,1) (^ clc.dc) 
Gk：： & z：：^ 
� • • � � “ • 
\ 、 z 一 � � � � z 
(a) Ca se 1-1 ’ op ,=AND , op^ =AND (or NAND); or op^=OR , op2=OR(or N O R ) 
(0Pi’k’dc) 
(b) Ca se 1-2，op^=AND, op^ =AND (or NAND); or o p , = O R , op^^ORCor N O R ) 
a n d a n d (or NAND) 
"^i，dc，1) (^ clc.dc) 
G k ： © ：：：：：<$ 
z - Z 
(c) Ca se 1-3’ op^=NOR , opg =NAND (or AND); or op^=NAND, op2=OR(or N O R ) 
Figure 3.11: 1-local patterns in G B A W 
Proof of one 2 local pattern 
In order to maximize the searching power, pattern cluster is proposed to put 
similar patterns together. Currently, 5 clusters are built on top of more than 
30 2-local patterns and the new family members are continuously put into 
G B A W in order to enhance its searching ability. 
In Figure 3.13 below, a ^ gi is the target wire, let 仍 = ( a * x)' where 
X is the other inputs of gi.仍=(仍 * y)' where y is the other inputs of g2. 
g^ = {g2^zy where z is the other inputs of g^. Therefore, gs = 二 
{{a' + x')yyzy = {{a'y + x'yYzY = {{a'yYix'yyzy = (("4)(工之)'.Hence, this 
pattern is constructed and verified. Some other 2-local patterns are shown in 
Appendix B. 
Chapter 3 Alternative Wiring 26 
(N0R’dc’1) (NAND，dc,1) (NOR’dc，dc) 
Gk： O z：：^ 
� . . . .Z 
\ z �� z 
(a) Case 2-1 
(0R’dc’1) (AND’dc’1) 
Z , , or 
• z " (OR，dc，dc) 
(AND’k’dc) 
(b) Case 2-2 
dc (OR,dc’h) (AND,t,1) (OR’dc’dc) 
• ^ ^ (AND,s’dc) 
(c) Case 2-3 
Figure 3.12: 2-local patterns in G B A W 
3.5 Augmented GBAW 
Owing to the high applicability of alternative wiring technique, many related 
algorithms were published in last decade. Some research groups prefer al-
gorithms based on Automatic Test Pattern Generation (ATPG) to perform 
testing of redundancy since A T P G requires little memory to process large 
circuits. One of the circuit rewiring schemes based on A T P G is the well-
known Redundancy-Addition and Removal for Multilevel Boolean Optimiza-
tion ( R A M B O ) technique [CE93, EC93, EC95]. R A M B O applies ATPG-based 
Chapter 3 Alternative Wiring 27 
G ) r — 
Figure 3.13: Example of one 2-local Pattern Verification 
implication technique to identify redundant wires among a large number of can-
didate wires. It has demonstrated to be very flexible and practical in solving 
various C A D problems. Improvement in efficiency is achieved by eliminating 
unnecessary redundancy check [CvGLMS99, LWBOO] on the same wire but 
since the number of candidate wire for testing can still be large, R A M B O may 
still require a long running time due to the intrinsic exponential time property 
of A T P G algorithms. 
In [WLFOO], a new graph-based technique on identifying alternative wire, 
called Graph-Based Alternative Wiring (GBAW), was first proposed. It first 
models a circuit network as a directed acyclic graph (DAG) and identifies 
alternative wires by performing graph pattern matching between local sub-
graph of the network and the pre-specified minimal sub-graph configurations 
containing alternative wires within a given range limit. Experiments show 
that the number of all such local minimal sub-graph is limited and most of the 
alternative wires are located topologically "near" to their target wires. It has 
been shown that about 96% of the closest alternative wires are only 2-edge 
distant from their target wires. 
G B A W produces a competitive result in finding alternative wire when com-
paring to R A M B O . G B A W not only performs well in searching alternative 
wires but also runs very fast. Experiments show that on average, the C P U 
running time of G B A W is just 1.4% of that of R A M B O . This significant short 
Chapter 3 Alternative Wiring 28 
running time makes G B A W being a potentially considerable different technique 
for identifying alternative wires. The efficiency of G B A W is gained proba-
bly mainly because of its avoidance of running the could-be CPU-expensive 
Boolean implications. And since there exist common alternative-wire patterns 
that repeatedly occurs in the same circuit, it makes the one-time analysis effort 
of pattern based rewiring scheme practical. 
It was also observed that G B A W may not find many 2-edge distant pat-
terns, if it is limited to search the small pre-defined set of minimal configura-
tions like the early version shown in [WLFOO]. Therefore, a much extended 
G B A W scheme is presented, mainly 2-local pattern based, to greatly improve 
the effectiveness of G B A W . By including the concept of pattern cluster, the 
augmented G B A W is kept simple while expanding the pattern family. With 
the refined G B A W , an encouraging result in circuit optimization is achieved 
when using it as a perturbation engine. The refinement and achievement are 
presented in detail in the next section. 
3.6 Logic Optimization by using GBAW 
A much extended Graph-Based Alternative Wiring ( G B A W ) scheme to iden-
tify alternative wires in multi-level logic with promising results is presented. 
By modeling subsets of circuits as minimal graphs and applying purely graph-
based local pattern search technique, more than 40 graph patterns are found 
and they contain alternative wires within 2-edge distance from the target wire. 
Applying proper grouping technique for the similar patterns, the complexity 
of the rewiring technique can be reduced. Experimental results on M C N C 
benchmarks show that the technique is much faster than the ATPG-based 
technique R A M B O with competitive number of alternative wires found. With 
this augmented pattern family of alternative wires, it is able to find 30% more 
alternative wires compared to R A M B O with 75 times speedup on average. 
Chapter 3 Alternative Wiring 29 
G B A W is applied in logic minimization as a perturbation engine and simplify 
the target circuit by SIS algebraic operations. Results show a further re-
duction of 11.1% in literal count compared to applying algebraic operations 
alone. 
Table 3.1 shows the number of target wires which have 2-local alternative 
wires. The results are competitive with R A M B O (98%). For each target wire, 
it may have more than one alternative wire and some of them may be 2-local 
patterns. Since R A M B O uses A T P G techniques in locating the alternative 
wires, and the search space is much larger than G B A W . However, a promising 
result is obtained by G B A W since backward alternative wire can be done while 
current R A M B O does not. 
RAMBO GBAW Searched (%) 
Name (target/alter.) (target/alter.) (target/alter.) 
lo/n lo/IS 100/86 
9sym-hdl 5/6 5/8 100/133 
C1908 42/57 44/44 105/77 
C2670 85/99 83/94 98/95 
C3540 208/297 238/250 114/84 
C432 33/44 40/40 121/91 
C5315 76/113 69/73 91/65 
C6288 3/17 3/18 100/106 
C7552 132/219 76/82 58/37 
C880 27/62 27/27 100/44 
alu2 64/100 54/56 84/56 
alu4 120/198 84/86 70/43 
apex6 72/121 68/70 94/58 
b9_n2 8/10 3/3 38/30 
comp 28/44 17/17 61/39 
des 671/907 795/795 118/88 
duke2 35/64 29/29 83/45 
misex3 50/167 41/41 82/25 
rot 46/75 31/32 67/43 
sao2-hdl 25/38 9/9 36/24 
terml 41/77 27/28 66/36 
ttt2 28/68 10/13 36/19 
x3 71/82 76/76 107/93 
Total 1880/2886 1839/1909 98/66 
Table 3.1: 2-local pattern comparison 
The improved G B A W is implemented on Sun UltraSparc 5 workstation for 
M C N C benchmark circuits and results are shown in Table 3.2. The smallest 
benchmark circuit Bxpl.hlif is shown in Appendix. The speed and the ca-
pability of locating alternative wires between R A M B O and G B A W are also 
Chapter 3 Alternative Wiring 30 
compared. In the table, it is shown that G B A W are able to find 30% more 
alternative wires than R A M B O with only 1.38% C P U time on average. 
original circuit 
Decompose complex gates into 2-input gates 
by using SIS package "map", 
and library "mend . genlib" 
一 - - - - - - - -���� iterate 10 times 
, 、、： 
/f 、^、 
/ Random perturbation by GBAW �� 
(Different circuits are obtained) \ 
/ * I * 
I j 
I J 
I J I 
\ / 
�� Logic Simplification by SIS package , 
�� (script.algebraic, script.boolean) / 
Best Optimized circuit 
Figure 3.14: The Logic Optimization Process by G B A W and SIS 
To demonstrate the application of the proposed scheme, G B A W is deployed 
in the perturbation and simplification process and depicted in Figure 3.14. In 
this process, G B A W performed some random transformations on the bench-
mark circuits and the network structure of the circuit was changed. Then the 
circuits were minimized by MISII standard script script, algebraic provided in 
SIS package. Before the perturbation using G B A W , the circuits were mapped 
by SIS command map with mcncl.genlih which transforms all the complex 
gates into gates with 2 fanins. For each circuit, the process was iterated 10 
times and all the best results are shown in Table 3.3. The average reduction in 
circuit size is 43.0% compared to the original circuit. W h e n compared to the 
minimization with algebraic script alone, a further 11.1% reduction is achieved. 
Chapter 3 Alternative Wiring 31 
RAMBO RAMBO GBAW GBAW 
Name alt, wires CPU alt, wires CPU 
9sym-hdl 27 1.56 40 0.16 
C1355 185 12.82 250 0.89 
C1908 127 33.52 240 0.68 
C2670 267 83.57 344 1.33 
C3540 569 273.80 816 2.15 
C432 129 10.26 188 0.37 
C499 16 6.05 34 0.6 
C5315 511 155.91 713 2.88 
C6288 1352 361.18 2191 4.18 
C7552 1709 143.95 617 4.2 
C880 151 9.86 239 0.66 
alu2 169 214.71 263 0.84 
alu4 333 270.50 493 1.61 
apex6 239 34.32 377 1.23 
b9_n2 48 1.65 71 0.17 
comp 57 9.18 58 0.21 
des 1468 729.92 2204 8.7 
duke2 157 46.55 281 0.63 
fSlm 49 6.19 65 0.25 
misex 216 124.48 439 0.97 
my_adder 46 1.16 0 0.23 
pclerS 29 1.3 30 0.12 
rot 243 48.04 406 1.1 
sao2-hdl 104 16.86 153 0.39 
terml 106 16.81 169 0.37 
ttt2 68 9.68 133 0.34 
x3 ^ 23.13 ^ 1.2 
Total “ 8639 2657.13 — 11224 — 36.7 
Normalized | 1 | 1 | 1.2992 0.0138 
Table 3.2: Comparison between R A M B O and G B A W 
3.7 Conclusions 
In this chapter, an augmented Graph-Based Alternative Wiring ( G B A W ) scheme 
is presented. Although there are more than 40 patterns included, an attrac-
tive efficiency is still maintained by using proper grouping of pattern families. 
G B A W has forward and backward search capability and can identify alterna-
tive wire efficiently. Experimental results showed that it is capable to find 30% 
more alternative wires comparing with the forward search R A M B O version. 
G B A W has a good coverage of alternative wires with 75 times speedup on 
average. W h e n using G B A W as the perturbation engine and combining with 
SIS algebraic operations, there is a further reduction of 11.1% comparing with 
the results by algebraic operations alone. 
Chapter 3 Alternative Wiring 32 
Circuit Size(lit.)of Size reduction vs. 
Circuit ~^iginal MISII Perturb Original MISII 
^ 172 125 0.468 0.273 
9sym-hdl 232 138 93 0.599 0.326 
C1908 883 582 524 0.407 0.100 
C2670 1444 800 714 0.506 0.108 
C3540 2267 1398 1367 0.397 0.022 
C432 392 296 229 0.416 0.226 
C499 854 559 552 0.354 0.013 
C5315 3282 1970 1875 0.429 0.048 
C6288 5195 3393 3339 0.357 0.016 
C7552 4105 2423 2285 0.443 0.057 
alu2 777 488 474 0.390 0.029 
alu4 1470 921 904 0.385 0.018 
apex6 1417 856 839 0.408 0.020 
b9_n2 208 141 130 0.375 0.078 
comp 270 170 141 0.478 0.171 
des 6655 3915 3766 0.434 0.038 
duke2 676 430 416 0.385 0.033 
f51m 244 169 146 0.402 0.136 
misexS 990 644 593 0.401 0.079 
my_adder 339 304 197 0.419 0.352 
pclerS 174 105 96 0.448 0.086 
rot 1251 797 762 0.391 0.044 
sao2-hdl 439 271 221 0.497 0.185 
terml 439 263 220 0.499 0.163 
ttt2 376 237 203 0.460 0.143 
34614 21442 20211 Average reduction 
Normalized 1 0：58^ 43.0% 11.1% 
Normalized 1 0.9426 
Table 3.3: Results for logic optimization 
It is also observed that R A M B O is still more flexible in locating the maxi-
m u m number of all possible alternative wires, it might be a good approach to 
couple both the advantages of R A M B O and G B A W for various emphasis of 
C A D applications. The future direction is to extend the set of A:-local pattern 
to cover more alternative wires for practical complex gates, extract more quan-
titative analysis of alternative wire pattern distributions for different kinds of 
circuits, which could turn to be useful for new macro library cell designs. More 
efficient algorithm will be developed to choose the suitable alternative wire for 
perturbation and to apply the G B A W technique on other C A D problems such 
as floorplanning and partitioning. 
Chapter 4 
Multi-way Partitioning using 
Rewiring Techniques 
4.1 Introduction 
The objective of circuit partitioning is to divide the circuit into sub-circuits 
such that the size of each component is reasonable and the number of inter-
connections between the components is minimized. As design scale expands, 
partitioning becomes increasingly important to circuit design automation. 
Traditionally, circuit partitioning is done by simply modeling the circuit 
as a graph (or hypergraph). Graph partitioning problems are known to be 
NP-hard [WC91]. A comprehensive survey [AK95b] has presented the recent 
directions of partitioning. Commonly used partitioning algorithms can be cat-
egorized into three classes. The first class strictly abide by the modeling graph, 
with no attempt to change the graph. High quality results have been reported 
by several algorithms which include iterative improvement based [WC91, FM82, 
DD96], clustering based [YCL92], and spectrum (eigenvector) based [HK91, 
ZSC96]. The second class of algorithms may modify the graph through node 
replications [KN91, MW96，EHS97, YW98]. Improvement is achieved by sac-
rificing some area due to node replications. These two classes both perform 
the partitioning task on the graph without considering the logic function of 
33 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 34 
the circuit. The third class [CLMS95, BLSV92, C C M S 9 3 , KBZ94] couples the 
graph domain (nodes and their connections) and logic domain (function per-
form by each node). The tradeoff of improving the partitioning results is the 
expensive computational cost [BLSV92, CCMS93] and may only be applied to 
Field-Programmable Gate Array (FPGA) circuits [KBZ94；. 
Recently, many research works on multi-level partitioning are proposed [HL93a, 
HL93b, KK95c，KAKS97, AHK97b, WA98，KK99]. The general idea behind 
multi-level partitioning is to first cluster the whole problem by some useful 
algorithms to reduce the size, then apply a well-known graph domain parti-
tioner on the coarsened graph to get a good initial solution. The graph is 
then unclustered and a suitable partitioning refinement algorithm is applied in 
order to adjust the cut edge between partitions. The quality and the runtime 
by multi-level partitioning are very encouraging. In particular, Karypis and 
Kumar [KK99] proposes a partitioner called hMETIS-Kway. It first coarsens 
the hypergraph, then recursively bisects the graph into k-parts, followed by un-
coarsening the hypergraph with refinement algorithms. More recent research 
works [CKMOO, CLOOa, CLOOb], in comparison with hMetis-Kway, showed 
that the solution by hMetis-Kway is of high quality that the cut size cannot 
be further reduced greatly. 
Alternative wiring (rewiring) is the technique of adding single or multi-
ple redundant wires or gates to a circuit such that other wires or gates be-
come redundant and thus removable. This logic domain technique has been 
widely used for solving many logic level and physical level design problems 
[CE93, C M S C 9 6 , CLMS95, C C W M S 9 4 a , C C W M 9 7 , CMS94]. Circuit perfor-
mance can be improved by removing a wire on the critical path and adding its 
alternative wire elsewhere. Circuit routability can also be improved by substi-
tuting an unroutable wire in congested area by a routable alternative wire in 
some other circuit part. The cut size of a partition can be reduced by replacing 
the wires crossing the cut line. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 35 
r ^ - ^ — — ^ ^ —— 
94 J) 
(a) Partition before applying alternative wiring 
^ ^ O - n 「 。 1 
^ f _ L T W ： _ 
(b) Partition after applying alternative wiring 
Figure 4.1: Circuit partitioning by rewiring 
Figure 4.1 illustrates how rewiring can be used to further improve an al-
ready optimum partition result obtained by a typical graph domain partition 
algorithm. The global optimum partition result in the graph domain, with a 
cut size of 3, is shown in Figure 4.1(a). However, by applying the logic domain 
rewiring technique to replace a target wire (thick line) crossing the cut line by 
its alternative wire (dotted line), the cut size can be further reduced to 2 as 
shown in Figure 4.1(b) (without injecting area increase). From this example, 
rewiring can be applied to partitioning to further improve upon even optimum 
solution in the graph domain. Binding the logical domain rewiring technique 
with some graph domain partitioning tool enables a larger room for obtaining 
better results. The rewiring technique can be used either as a greedily guided 
optimization tool, or as a random perturbation tool which allows hill climbing 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 36 
on cut cost functions so as to pull the cost out of some local minima. 
The well-known ATPG-based rewiring technique, R A M B O , is a very pow-
erful technique for identifying alternative wires of a specified target wire for 
a given circuit. This technique has been used for logic perturbations and 
has been integrated with a graph domain partitioning algorithm to produce 
improved partitioning results [CLMS95]. However, as it always selects a non-
negative-gain wire in replacing a target wire and it only considers simple cases 
of adding and replacing one single wire, it is easily trapped in local minima. 
Besides, the ATPG-based rewiring technique, though powerful, tends to spend 
much running time due to the time-consuming Boolean implication operations. 
Moreover, the R A M B O rewiring tool can only handle 2-input gates, therefore 
the benchmark circuits must be somehow pre-processed. 
To investigate the possibility of perturbing the circuit without applying 
any Boolean operations, minimal circuit structures yielding rewiring patterns 
have been studied [WF99, WLFOO]. Based on benchmark circuits, there is 
an observation that the nearest existing alternative wire is quite close to its 
target wire. Therefore these minimal patterns tend to be small and repeatedly 
appeared in a circuit. As a result, instead of applying the ATPG-based logic 
implications repeatedly for a same pattern, the Graph-Based Alternative Wire 
( G B A W ) technique [WF99, WLFOO] employs a more efficient graph pattern 
matching operation to locate alternative wires. The basic idea of G B A W is to 
match the sub-circuit with some "pre-specified" patterns. Rewiring by G B A W 
can be done without doing any logic implication or redundancy check, hence 
it runs very fast. Besides considering the alternative wire which is close to 
the target wire from those small "pre-specified" patterns, distant alternative 
wires can also be located by propagating the matchings in a cascading way. By 
coupling R A M B O and G B A W as the perturbation engine, [WYCOO] proposed 
the bi-partitioning tool R G which also handles the 2-input gates and has a 
larger room for perturbation. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 37 
To expand the optimization space, the coupling notion is applied between 
graph and logic domain into the GBAW-Partitioner (GP). In graph domain, 
the well-known Fiduccia-Mattheyses (FM) partitioning algorithm [FM82] is 
chosen as the iterative move-based engine for its simplicity. In logic domain, 
an augmented G B A W is applied, which enhances the ability to locate more 
2-local alternative wires, as a greedily guided perturbation engine. In the ex-
periments, near optimum partition results were firstly obtained from the pure 
graph domain partitioner hMetis-Kway. Then the coupling of graph and logic 
domain optimization G P engine, was followed. Note that the logic pertur-
bation process can be coupled with any powerful graph domain partitioning 
tool, and G B A W itself is able to handle patterns with multiple-input gates. 
Experiments on this partition flow for 2-, 3-, 4-, and 5-way partitionings are 
carried out on various M C N C benchmarks ranging from small to fairly large 
circuits. The results show that such a graph-logic domain coupled partition-
ing approach can further cut down the cut size effectively with small C P U 
overhead. The results show that it is a very promising direction for circuit 
partitioning. 
This chapter is organized as follows. In Section 4.2, a brief introduction 
on partitioning algorithm analysis is described. In Section 4.3, the details of 
repartitioning by rewiring is shown. In Section 4.4, experimental results are 
presented. Conclusion is drawn in Section 4.5. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 38 
4.2 Circuit Partitioning Algorithm Analysis 
The partitioning task is ubiquitous to many subfields of VLSI C A D . One im-
portant practical example of this problem is the placement of the electronic 
components onto printed circuit boards, so as to minimize the number of con-
nections between cards. The components are modeled as the nodes of the 
graph, each circuit connections as edge, and the edge which interconnects be-
tween cards as a cut. Since connections between cards have high cost compared 
to connections within a board, the objective of partitioning in this example is 
to minimize the number of interconnections between cards. 
Partitioning heuristics address the increasing complexity of VLSI design 
systems with million of transistors, have a greater impact on system perfor-
mance since now designs are interconnect-dominated. In current VLSI designs, 
wire delays tend to dominate gate delays, thus poor partitioning results worse 
system performance. 
It is impossible to solve the above problem by exhaustive procedure even 
for small problem size. For example, suppose G has n nodes of size 1 to be 
partitioned into k subsets and each subset has size p, where kp = n. There 
are Q ) ways of choosing the first subset,(〜”)ways for the second and so on. 
Since the ordering of subsets is not important, the number of cases is 
For example, for n = 40, p = 10 (i.e. k = 4), the number of cases are greater 
than 102o. Therefore, any direct approach to finding an optimal solution will 
require an inordinate amount of computation, heuristic methods can produce 
good solutions (possibly even an optimal solution) quickly. In practice, it is 
more preferable to have several good solutions than one optimal one. 
There are several approaches to solve the partitioning problem, such as 
Move-based approaches, Geometric representations, Combinatorial formula-
tions and Clustering approaches [AK95b . 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 39 
The first "efficient" graph bisection heuristic was introduced by Kernighan 
and Lin [KL70] in 1970. It was further improved the time complexity by Fiduc-
cia and Mattheyses [FM82] in 1982. Both famous algorithms K L k F M use 
iterative improvement technique that are based on greedy strategy which start 
with some feasible initial solution and iteratively move to the best (improv-
ing) neighboring solution. The process ends when the algorithm cannot have 
any iterative improvement. The computational complexity of K L and F M are 
O(n^) and 0{n) respectively. K L k F M are both iterative improvement that 
apply move-based approaches. Other move-based approaches are simulated 
annealing, tabu search, genetic algorithm. 
Nowadays, this two famous algorithms are still use in practical applica-
tions. However, many extensions which base on the idea of this two algo-
rithms are evolved afterwards. Such as the Tie-breaking strategies by Kr-
ishnamurthy [Kri84], Sanchis's multi-way partitioning algorithm [San89] and 
Multi-level partitioning [KAKS97；. 
The K L k F M Algorithms are described in Section 4.2.1 and 4.2.2 re-
spectively. Their extensions are briefly presented at the end of Section 4.4.2. 
Geometric representation approaches are discussed in Section 4.4.3. The Multi-
level partitioning is introduced in Section 4.4.4. 
4.2.1 The Kernighan-Lin (KL) Algorithm 
The simplest partitioning problem is to find a minimal-cost partition of a 
given graph of 2n vertices (of equal size) into two subsets of n vertices each. 
K L algorithm is not only able to solve this uniform bisection problem, it also 
provides the basis for solving other general partitioning problems, e.g. larger 
problems, 2-way partitions of the vertices with unequal size. Two-way uniform 
partitioning is first described. Followed by the algorithm of K L and example 
is shown in later section. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 40 
Two-way Uniform Partitioning 
The algorithm works as follow, it first starts with any initial partition X , Y 
of P. By interchanging the vertices between X and Y, the total cost which 
connecting X , Y can be reduced, where cost Cij define as the cost of vertex i 
connect to vertex j. W h e n the algorithm stops, it always results in a local 
minimum X', Y'. 
Suppose A e X,B e Y is the two initial partition, and final partition 
results m B e X = A e Y = y the problem is to select the correct 
set A,B from X,Y without considering all the possible choices. Suppose x,y 
are vertices in P , an define 工 G X , an external cost E工 and internal cost 
external and internal costs difference D工 and the gain (reduction in cost) are 
Ex — y^^cy Orb 
^x — Y^aeX Cxa 
Dx — Ex Ix 
gain = Dx + Dy — 2cxy 
First, the algorithm computes the D values for all elements of P. Second, 
then it choose the maximum gain from a single interchange where 
9i — Dxi + Dy. — 
All the beginning of a pass, each vertex is unlocked which means it is free 
to be swapped. The vertex becomes locked after moved. K L iteratively swaps 
the pair of unlocked vertices with the highest gain until all vertices become 
locked. Therefore, gi is computed first, with 0Ci,yi locked temporary during 
this pass, and called Then the D values for all elements oi X - xi and 
Y — yi are recomputed. Then, a pair x'^, y'2 from X — x[ and Y -y[ is chosen 
such that g2 = D^ '^  + Dy'^  - I c ^ w is maximum where is not considered. 
Process continue until all nodes have been exhausted, (xg, 2/3),..., (x^, y,^ ) with 
corresponding 仍，…，彻 . 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 41 
If A = 4，• • . ， B = y�, y'2,..., y'k- Note that some p^ 's are negative 
and Yli 9i 二 K L choose k which will maximize the partial sum Yli=i 二 
gi = G,UG > 0 means there is a reduction in total cost by swapping A and B 
from X, Y. W h e n G = 0 the algorithm has arrived one of the local optimum 
partition. With different starting partition, the algorithm may reach different 
local optimum solution. 
Pseudo-Code 
The algorithm of K L is briefly shown below: 
• Start with an initial bisection P 二 X, Y. 
• Repeat 
- F o r i = 1 to n/2: 
* Choose a pair of free cells a G X,b e Y 
s.t. exchanging a and b gives the highest gain, gain(a,b). 
* Exchange and lock a and b. 
* Let gi = gain(a,b). 
— F i n d k s.t. G 二 + 仍 H h Qk is maximized and shuffle cells up 
to this k仇 step. 
• End if G 二 0. 
Illustration 
In Figure 4.2 and 4.3, a partition problem is solved by K L algorithm. There 
are totally 2 passes, the partial sum of G is maximized during each pass and 
terminated when equal to zero. In the first pass, the gain of swap{a, d) and 
swap{b, c) equals 2, therefore swap(a, d) is chosen and locked them after moved. 
Other move will result in negative gain. Therefore, the G in the first pass is 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 42 
equal to 2. In the second pass, the process is terminated since the maximized 
partial sum is equal to zero. 
First pass 
Cost = 10 广 Cost = 8 f Cost =10 
/ Locked \ / Locked \ j Locked 
I I Locked Locked ！ Locked 
g(a，c) = -1+3-3+1 = 0 g(b,c) = -4+1-2+3 = -2 
g(a，d) = -1+2-3+4 = 2 
g(b,c) = -1 +4-3+3 = 2 g2 = -2 if we swap (b，c) 
g(b,d) = -1+1-3+3 = 0 
gj = 2 if we swap (a,d) Therefore, G = partial max. of g； = g； = 2 
Figure 4.2: Example of K L algorithm - First pass 
Second pass 
Cost = 8 f Cost =10 f Cost = 8 
/ \ I Locked \ / Locked 
！ Locked ！ Locked Locked ！ Locked 
g(a，b) = -2+3-4+1 = -2 g(d,c) = -1+2-3+4 = 2 pj_�ess 
g(a,d) =-2+1-4+3 = -2 terminated! 
g(c,b) = -2+3-4+1 = -2 g2 = 2 if we swap (d,c) \ 
g(c,d) =-2+1-4+3 = -2 \ 
gj = -2 if we swap (a,b) Therefore, G = partial max. of g; = g! + g! = 0 
Figure 4.3: Example of K L algorithm - Second pass 
4.2.2 The Fiduccia-Mattheyses (FM) Algorithm 
The Fiduccia-Mattheyses algorithm is based on K L algorithm. They intro-
duced their efficient heuristic F M by reducing the time complexity per pass to 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 43 
0{p) where p is the number of terminals in P. The main difference between 
K L and F M are: 1) move only one cell each time, 2) vertices are weighted, 3) 
nets can be multi-terminal, 4) maintain a balanced partition after each move. 
The F M algorithm performs several passes, within each pass each cell moves 
at most once and returns the best solution observed in one pass. It terminates 
when there is no improvement of cost function in a pass. However, F M has 
reduced the time per pass to linear in the size of the vertices. 
A partition P = (X，Y) is balanced iff ⑶ 忠 ^ r for some constant r < 1 
where w{X) is the total size of the cells in X. To preserve balanced partition, 
cell b is able to move only if 
rW — Smax < W � S rW + Smax 
where size of each cell 二 s⑷,W 二 w(JC U Y) = and Smax = 
max{s{i)) is the maximum cell size. 
A special data structure is used for selecting which vertices to be moved 
in order to improve the running time. This structure is Bucket list structure, 
a cell with maximum gain can be found, remove a cell, insert a cell, update 
gain(^) for any cell i, update the M A X G A I N in linear time by using this bucket. 
There are two lists and one for each partition. 
The algorithm starts with a balanced partition which can be obtain by first 
sorting the vertex weight in decreasing order and placing them in X, Y alter-
nately. At the beginning of a pass, the gains for each cell in P are computed 
in linear time, and their value are inserted to the bucket list according to its 
gain. Once the lists are updated, the vertex to be moved is chosen by consid-
ering the maximum gain vertex Xmax in list X , or the maximum gain vertex 
y而 in list Y. Both vertices can be moved only if they do not violated the 
balance condition. Notice that a move may increase the cut-cost but results 
in hill-climbing out of the local minimum. W h e n no more possible move or if 
there are no more unlocked vertices, the pass terminates. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 44 
During an F M pass, a cell pointing to the M A X G A I N is selected from the 
bucket, and deleted from the linked list. The cell is locked after each move, 
and the gains of unlocked cells incident to the moved cell are updated in the 
bucket. If the bucket indexed by M A X G A I N becomes empty, the M A X G A I N 
will be decreased until it indexes a non-empty bucket. 
Bucket list structure 
F M algorithm uses the special data structure — Bucket List so as to improve 
the time-complexity. The cell to be moved is always selected by the pointer 
which pointed by M A X G A I N . The bucket itself will store different gain value, 
and a linked list will point to the specific bucket slot. This structure looks like 
a comb which can effectively improve the whole algorithm. 
+pmax  
Cell l ^ r ^ l C e l l . . . 
Gain I ——# N - ^ f l # F ^ - ^ 
-pmax " " “ r r i 
1 2 n 
Figure 4.4: Bucket list structure 
Pseudo-Code 
The algorithm of F M is briefly shown below: 
• Start with a balanced partition P = X, Y. 
• Repeat 
- F o r i = 1 to n: 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 45 
* Choose a free cells 6 G X U F 
s.t. moving b to the other side gives the highest gain, gain(b), 
and moving b preserves balance in P. 
* Move and lock b. 
* Let Qi = gain(b). 
— F i n d k s.t. G =仍 + H h gk is maximized and shuffle cells up 
to this k仇 step. 
• End if G = 0. 
Illustration 
The example of F M is shown in Figure 4.5 and Appendix C.l, C.2. The initial 
and final partition are shown in Figure 4.5 with the balanced constant r = 0.5, 
each cell with weight 二 1, total weight of P = 4. Therefore the cell in the 
partition is valid to move only when it satisfies 1 < w{X) < 3. The final 
cut-cost is changed from 10 to 6. 
Cost =10 Cost = 6 
I 
X Y 
Figure 4.5: Example of F M algorithm - Initial k Final 
Extension 1: Tie-Breaking Strategy 
W h e n picking the maximized partial sum G , break ties by looking ahead a 
certain number of steps. E.g. gk 二 gi = 2, and gk = 91+92+93 = 2 - 4 + 4 = 2, 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 46 
the algorithm introduces a look ahead ability into the original F M algorithm. 
At this situation, the algorithm should able to select whether swap one pair of 
cell or three pairs of cell, that can introduce a better solution. 
Extension 2: Multi-Way Partitioning 
Partition into more than 2 clusters. Extend the idea of F M + Krishnamurthy 
and handle multi-way network partitioning and this technique is very useful 
for partitioning a network into a large number of subsets. 
4.2.3 Geometric Representation Algorithm 
Spectral algorithms were first proposed for placement and partitioning by 
Hall [Hal70]. Based on a linear ordering of the vertices using the eigenvec-
tor associated with the second smallest eigenvalue of the Laplacian of a graph, 
a fast bi-partitioning method were developed [HK92]. Consequently, a /c-way 
spectral partitioning algorithm and new A:-way ratio-cut cost function were 
presented in [CSZ94]. Subsequent methods for spectral A;-way ratio-cut par-
titioning include MELO[AY95] and DP-RP[AK95a]. Additional approaches 
of spectral partitioning are presented in [Bar85 . 
The problem of /c-way partitioning can be formulated as below. Given 
a graph with a set of n vertices, V. Each vertex is associated with its size 
attribute. A partitioning of the graph is a division of the n vertices into k 
disjoint, non-empty subsets i^ i, P2,... , ft such that V = PiU P2U - - -U Pk. 
• The n x n adjacency matrix, A, is composed of entries aij which represent 
the sum of the weights of the edges between vertices i,j. 
• The nxn diagonal degree matrix, D, has entries da equal to the sum of 
the weights of all edges on vertex i. 
• The Laplacian matrix is defined as Q = D — A. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 47 
• Eh is the sum of the weights of the edges which have exactly one vertex 
in partition h. 
• \\Pf^\\ is the sum of the sizes of all vertices in partition h. 
• is the n X A: ratio-ed assignment matrix. It represents a solution to the 
partitioning problem. The entry rih has value when vertex i is in 
partition h and 0 otherwise. 
• M is the n X n diagonal matrix whose m a entry represents the size of 
vertex i. 
In [Hal70], Hall shows the (1-D) weighted quadratic placement problem 
can be solved by the eigenvectors of the Laplacian matrix. The total weighted 
squared distance between n points can be express as below. That is what we 
want to minimize 
之 二 2 i^j i^ i — 
and it can be rewritten in matrix notation (quadratic form) as 
minimize z 二 〜 s u b j e c t to x'^ • x = 1 
The minimum eigenvalue Ai 二 0 yields the trivial solution. The second 
smallest eigenvalue A2 is a lower bound on a non-trivial solution to the above 
problem. Based on this idea, Hagen and Kahng [HK92] integrate the two-
way ratio-cut partitioning and spectral method. Figure 4.6 and 4.7 show an 
example of spectral bi-partitioning. 
The A;-dimensional weighted quadratic placement problem consists of find-
ing an n X A; matrix X = [xij], minimizing 
z - \ Eti EL. ELi •ih -工财=trace{X^QX) 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 48 
Lap lac ian Q = D - A 
Q Q 2-1-1 0 
Z / \ - 1 2 - 1 0 
GJ © L �… -
Figure 4.6: A simple graph and its Laplacian 
\ Y ！ • V • 
i r ? j _ I f S l V,厂。.5 C.46 1 V, V3 V, 
iV^l 丨 〇 〇--0.5 ©V, 
I ； J/1 \ ； V2 0.5 0.46 
I 丄 U I � �” 
： ！ / ^ v ^ 丨 V, 0.5 - 0 . 7 3 
i W； W i _/ \ _ ——1 1  
i V2 i i V4 i / \ - 0 . 7 - 0 . 2 - 0 . 4 
‘ • ‘ 1st eigenvector of Q 2nd eigenvector of Q 
Figure 4.7: Result of spectral partitioning and linear ordering of the vertices 
using the second eigenvector 
To avoid the trivial solution of the constraint X'^X = I is used and the k 
eigenvectors of Q corresponding to the smallest k eigenvalues of Q and mini-
mize z. The aim is to optimize the A;-way ratio-cut cost function that is find a 
solution R such that minimize 
Eh 
臓 
By using R and \\Ph\\ which include the actual vertex sizes rather than the 
number of vertices in a partition, the new spectral partitioning incorporates 
vertex size information. By taking vertex sizes into account, the constraint 
rTr r= / is replaced by If MR 二 /• R is matrix which has a single non-zero 
entry in every row and for each column exactly one non-zero value among its 
entries. The generalized spectral partitioning formulation is defined as 
minimize trace{X^QX) subject to X ^ M X = I 
where trace{M) of a square matrix M is the sum of its diagonal entries. By 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 49 
using M = S^S and substituting Q = S-^^QS'^ and 文二 SX, the original 
formulation is transformed to 
minimize trace{X^QX) subject to X^X = I 
which is a standard quadratic assignment problem and can be solved by 
using the L A S O library [PS79]. Below shows an example which incorporates 




f' Q ~ G / V ‘ 。.2 。.25 V. V3 V. 
j xfy] v: 。'2 〇 0.2--o © V2 
； z 八 I V3 0 .2 0 .3 
\ d； / i © ， 、 - 。 \ 」 I 1 . 
\ V, / V V. / / . \ -0.2 -0.2 
\ z ». 1st eigenvector o fQ 2nd eigenvector of Q 
•广 � ‘ 
A 
Figure 4.8: Result of spectral partitioning by using the modified Laplacian Q 
which incorporates the actual vertex size information 
4.2.4 The Multi-level Partitioning Algorithm 
Recently, many research works on Multi-level Partitioning [BS93, HL93b, KK95c, 
ZCS99, AHK97b, KAKS97, KK99] which first clusters the whole problem by 
some useful algorithm to reduce the size (i.e. in a graph by collapsing ver-
tices and edges). Then, the reduced problem can be solved efficiently by any 
well-known partitioning algorithm. The last step is to do un-clustering while 
applying a suitable partitioning refinement algorithm in order to adjust the cut 
edge between partitions. Figure 4.9 shows the three phases in M P , coarsening 
phase, initial partitioning phase and un-coarsening and refinement phase. 
The first multilevel work is achieved by Barnard and Simon [BS93] in 1993. 
They used a multilevel approach to calculate the eigenvector for the spectral 
partitioning algorithm. Their work has greatly inspired the research group 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 50 
r — ^ r X " " ] 
N l J S i i 
\ ^ Projected partition T 
\ r—) rrn / 
Coarsening Phase \ G2 ( �� I / UnCoarsening and Refinement Phase 
Initial Partitioning Phase 
Figure 4.9: The various phases of the Multi-level graph bisection 
of Hendrickson [HL93b]. Hendrickson and Leland proposed Chaco [HL93a 
in 1993 and it works without the transferring of eigenvectors between lev-
els by instead transferring partitions between levels. Many research works 
and ideas have elaborated and built based on their work. In 1995, the re-
search group of Andrew B. Kahng has presented their work on multilevel 
circuit partitioning M L c and M L ^ where M L c is CLIP-based [DD96] and 
M L f is FM-based implementations. Meanwhile, the research group of Pak 
K. Chan has integrated multi-level scheme with their k-way hypergraph par-
titioning by the use of spectral methods[ZCS99, CSZ94, ZSC96] MKP. The 
algorithm also directly incorporates the vertex size information during the cal-
culation of eigenvalues. Another famous multilevel graph partitioning tools 
M E T I S [KK95b, KK95a, KK95d] was proposed by the group of Vipin Kumar 
in 1995. Their extension works include from bisection to k-way partitioning, 
from graph to hypergraph [KK99], parallel multilevel graph/hypergraph parti-
tioning ParMETIS [KK96] and k-way hypergraph partitioning hMETIS [KK99:. 
These multi-level tools have soon become industry standard due to their su-
perior performance in solving the partitioning problem efficiently. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 51 
4.2.5 Hypergraph METIS - hMETIS 
Karypis, Aggarwal, Kumar and Shekhar present a new hypergraph-partitioning 
algorithm hMETIS [KAKS97, KK99] that uses the multilevel paradigm. A 
sequence of coarser hypergraphs is constructed first and then applies initial 
partition (directly partition into /imparts), and this A:-way partitioning is pro-
jected back to the original graph with refinement. 
Coarsening Stage 
Coarsening phase leads to a small hypergraph which partitions into /c-parts is 
not significantly worse than directly applies /c-way partitioning on the original 
graph. It also allows local refinement techniques to become more effective, 
and reduce the size of hyperedges. There are three schemes combined together 
to form the coarsening stage of hMETIS, edge coarsening (EC)[AHK97b, 
W A 9 8 , KAKS97], hyperedge-coarsening (HEC)[KAKS97], modified hyperedge-
coarsening (MHEC). 
E C scheme is a heavy-edge maximal matching of the vertices of the hy-
pergraph. The vertices are visited in random order. For each vertex v, all 
unmatched vertices that belongs to the hyperedge incident to v are consid-
ered. The vertex u with highest weight is matched with v. 
In H E C , an independent set of hyperedges is selected and the vertices that 
belong to individual hyperedges are contracted together. The hyperedges are 
first sorted in non-increasing hyper edge-weight order and the hyperedges of 
the same weight are sorted in a non-decreasing hyperedge size order. 
M H E C can reduce the amount of hyperedge weight that is left exposed in 
successive coarse graphs. After applying H E C and the hyperedges to be con-
tracted have been selected, the list of hyperedges is traversed again. For each 
hyperedge that has not yet been contracted, the vertices that do not belong 
to any other contracted hyperedge are matched to be contracted together. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 52 
The overview of three schemes is in Figure 4.10. 
Edge Coarsening (EC) 
f p 鞭 0 
Hyperedge Coarsening (HEC) 
f p 鞭 Q 
Modified Hyperedge Coarsening (MHEC) 
Figure 4.10: Various hyperedge coarsening schemes 
Initial Partitioning Stage 
T w o algorithms are used for computing the initial partitioning. The first one 
simply creates a random bisection which roughly partition into two parts that 
have roughly the same vertex weight. The second one starts from a randomly 
selected vertex and grows a region around it in a breadth-first fashion until 
half of the vertices are in this region. Since both algorithms are randomized, 
10 initial partitionings at the coarsest graph are obtained. The /c-way initial 
partitioning is computed by the multilevel hypergraph bisection algorithm. 
Un-coarsening and Refinement Stage 
During un-coarsening phase, the coarser hypergraph is projected to the next 
level finer hypergraph, and a partitioning refinement is used. T w o different 
partitioning refinement algorithms are used. First one is F M which repeatedly 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 53 
moves vertices between partitions to improve the cut. The second algorithm 
is Hyperedge Refinement (HER) which moves groups of vertices between par-
titions so that an entire hyperedge is removed from the cut. 
4.3 The GBAW Partitioning Algorithm 
Assume that one pin is used in a partition for a net. The objective of a multi-
way partition is essentially to minimize the number of pins required to connect 
all partitions. Since some of the wires may have alternative wires, by replacing 
some cut wires by their alternative wires that are not cut wires, cut size can be 
reduced. The rewiring process may lead to some new circuit graph, and in turn 
help escaping from local minima led by graph domain partitioning process. 
A perturbation refers to the replacement of a target wire by its alternative 
wires. Figure 4.11 illustrates the gains regarding various perturbations in a 
circuit. In the figure, thick lines represent the target wires and dotted lines 
refer to their alternative wires. As shown in the example, there are positive, 
zero and negative gains. 
The hMetis-Kway [KK99] partitioning tool is used to provide a fast and 
near optimum solution which serves as the initial partition. The well-known 
F M partitioning algorithm [FM82] is selected as the graph domain partitioner 
for its simplicity and efficiency. In fact, any other graph domain partitioner 
can integrate into GP. Then, the rewiring technique G B A W aiming for further 
improvements is applied to perform logic perturbations which include: 
• substituting a wire with its alternative wire. 
• adding a gate and removing some wires. 
• adding one wire to remove other wires. 
• adding two gates to remove other wires and so on. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 54 
Q ： © S ) © S ^ 
(a) perturbations (b) gain = 2 (c) gain = 1 
s ^ s t e 
^ ^ 
(d) gain = 0 (e) gain = -1 (f) gain = -2 
Figure 4.11: Perturbations and gains 
Figure 4.13 gives the algorithm of GP. The graph domain information of 
the benchmark circuits are first extracted as shown in Figure 4.12. 
During the perturbation process GP, only cut wires will be selected as target 
wires for perturbations. G P first randomly select a cut wire as the target wire. 
Then, G B A W is used to find the alternative wire set SWa of the target wire. 
Finally, among the wire set SWa, the alternative wire with the highest gain 
is selected for perturbation. W h e n the SWa of the target cut wire is empty, 
G P may randomly select another cut wire for another trial. The number of 
iterations is set by m. The number of trials is limited by t times, k is the 
limit of perturbations. These limits serve to set some bounds for some useless 
runs when the total number of alternative wires of all cut wires is zero or very 
small. The main difference of algorithm G P and the algorithm in [CLMS95] lies 
in the condition of perturbations. In [CLMS95], a perturbation is performed 
only when the alternative wire of the selected cut wire has a non-negative gain. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 55 




[ logic domain - GBAW '^-—i i 
I ^ I 
J graph domain - FM ~ i 
； GP I 
} r 
multi-way partitioning results 
Figure 4.12: Extract the Graph domain information for hMETIS-kway 
However, in the experiments, some (hill-climbing) perturbations are allowed 
therefore can increases the chance of obtaining better solutions. A negative-
gain perturbation is also allowed to help escaping some local minimums. O n 
the other hand, the main difference between G P and the algorithm in [WYCOO 
is the perturbation engine used. In [WYCOO], a coupling scheme by R A M B O 
and G B A W (RG) is used. Since R A M B O is limited to 2-input gates circuit, 
R G is unable to handle multi-input gate circuits. By only integrating G B A W 
to GP, the partitioner can effectively locate nearly all the alternative wires of 
the circuit. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 56 
4.4 Experimental Results 
The algorithm GBAW-Partitioner (GP) was implemented in C and the ex-
periments were conducted on Sun Enterprise E4500 workstation with 8 G B 
memory in a single-processor configuration for circuits of various sizes from 
M C N C benchmarks. The benchmark circuit from M C N C is put at Ap-
pendix. The large benchmark circuits used in ISPD98 [AHK97a] are not ap-
plicable for the experiments due to the lack of logical domain information. 
Since the rewiring engine G B A W [WLFOO] is able to locate alternative wires 
of multiple input gates as well as 2-input gates, thus the circuit SIS [SSLea92 
simplification done by [CLMS95, W Y C O O ] can be skipped. 
Here an example of how G P works below is shown. Taking C35人0 as an 
example, the circuit is firstly partitioned by hMetis-Kway and has a initial 
hypergraph with cost 132. The algorithm of G P then further cut down the 
cost to 112 with the following steps. 
• G P searches all the alternative wires of the wires which lie along the cut 
line and replace them, and the original graph is changed with gain 5. 
• With logically equivalent but different graph, F M easily reduced the cost 
down to 122. 
• Random perturbation of the graph brings another reduction by 2. 
• Again, G P searches along the cut line and reduces the total cost by 5. 
• By switching between graph and logic domain, the cost is reduced to 112 
which is the final result. 
Table 4.1 lists the statistics of alternative wires on the benchmark circuits. 
Column "Circuit" shows the name of the circuit. Column "alt. wire" lists the 
number of alternative wires found in the circuits. Column "CPU" refers to the 
runtime in seconds. From the experimental results listed in Table 4.1, it shows 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 57 
that G B A W can find 30% more alternative wires than R A M B O on average 
with 75 times speedup. The G B A W which include many newly found 2-local 
patterns is now able to locate 98% of the 2-local alternative wires found by 
R A M B O . Therefore, with the enhanced G B A W the unnecessary redundancy 
check by R A M B O can be skipped. 
Circuit RAMBO CtBAW — 
“alt, wires CPU alt, wires CPU 
^ loTr ^ ^ 
9sym-hdl 27 1.56 40 0.16 
C1355 185 12.82 250 0.89 
C1908 127 33.52 240 0.68 
C2670 267 83.57 344 1.33 
C3540 569 273.80 816 2.15 
C432 129 10.26 0.37 
16 ^ ^ ^ 
C5315 511 155.91 713 2.88 
C6288 1352 361.18 2191 4.18 
C7552 1709 143.95 617 4.2 
C880 151 9.86 239 0.66 
alu2 169 214.71 263 0.84 
alu4 ^ 270.50 ^ 1-61 
^ ^732 3 77 1.23 
b9-n2 48 1.65 71 0.17 
comp 57 9.18 58 0.21 
des 1468 729.92 2204 8.7 
duke2 157 46.55 281 0.63 
fSlm 49 6.19 65 0.25 
misex ^ 124.48 439 0.97 
my_adder 46 TAG 0 0.23 
pclerS 29 1.3 30 0.12 
rot 243 48.04 406 1.1 
sao2-hdl 104 16.86 153 0.39 
terml 106 16.81 169 0.37 
ttt2 68 9.68 133 0.34 
x3 ^ 23.13 ^ 1-2 
Total “ 8639 — 2657.13 11224 36 .7— 
"NHT^alized I 1 I 1 I 1-2992 0.013l~ 
Table 4.1: Alternative wire statistics of R A M B O and G B A W 
In the experiments, the tolerance of area imbalance of G P is set to be ±20% 
of the average area in each partitioned block. Therefore the maximal area ra-
tios are 40%:60% and 16%:24% for 2-way and 5-way partitionings respectively. 
In order to explore the graph domain optimization, hMetis-Kway [KK99] was 
firstly run for each circuit. As a result, a nearly optimum partition solution 
was obtained. The next step is to select the best solution applying G P for logic 
perturbation to further improve the quality of the partitioning with A; = 60 
and t = 50. Table 4.2 to 4.5 list the experimental results for the 2- to 5-way 
partitionings respectively. Column "area" lists the area of the sub-circuit in 
terms of the number of gates. Column “#lits，，lists the total number of literals 
of the partitioned circuits. From the results, the area penalties for 2- to 5-way 
are 0.33%, 0.53%, 0.61% and 0.71% respectively. Column "cut cost" lists the 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 58 
total number of cut pins obtained for all partitioned blocks. Column "cut wire" 
lists the number of cut wires of the partitioning. Column "cpu" lists the cpu 
time (in seconds). From the results, they showed that applying logic perturba-
tion can further cut down the cut size of the good partitionings produced by 
purely graph domain based partitioner. The total number of literals is slightly 
increased because of the area cost of the added gates during perturbations. 
14.48%, 10.18%, 9.08% and 9.24% reduction in cut size are obtained for the 
2-, 3-, 4- and 5-way partitionings respectively. The last 2 columns show that 
the quality and C P U of G P are both much better than the results obtained 
by simply running F M for 250 times. 
u A/r 广 250 F M 
Dircuit hMetis-Kway ^ — 
S F i i ~ # U t s c u t c u t C P U ^ # l i t s c u t c u t C P U c u t C P U 
cost wire cost wire cost  
i P f l ~ r a I 30 I 31 0.33 ^ p 8 I f I I 28 1 38 
9sym-hdl 67:74 232 16 8 0.36 64:77 232 10 8 14 10 42 
C1355 331:269 1055 44 35 0.65 338:266 1059 36 35 74 46 266 
C1908 260:256 883 82 55 0.64 271:251 889 62 55 61 62 238 
rofirn 516.527 1444 42 23 0.78 517:531 1449 34 23 112 126 427 
CSsIS e i l e Z Ittr 132 93 1.77 614:663 228O 112 100 357 I8O 692 
C432 119:119 392 44 27 0-46 118:130 402 36 ^ ^ ^ — — J l — 
P4QQ 231-272 ^ 46 47 0 6 ~ 232:272 855 36 47 59 54 214 
§5315 918 1044 ^ 8 2 104 71 1.98 919:1047 3286 100 69 493 234 1232 
§6288 12987:二 i m 80 270 2.33 1309:1561 5209 78 189 960 430 1708 
C7552 1281:1141 4105 18 65 1.93 1286:1142 4111 18 64 727 228 1731 
C880 261:222 780 54 30 0.57 260:234 791 38 30 58 38 200 
alu2 190:232 777 84 93 0.81 190:236 781 80 95 115 88 150 
alu4 428:357 1470 140 106 1.16 438:360 1481 120 1 0 4 _ _ ^ 1 6 0 _ _ 3 2 2 
4357473 14T7 18 H 436:473 1418 16 11 J f 36 370 
bLn2 87:70 208 16 13 0.32 64:93 208 12 10 16 10 41 
romn 93 91 270 6 4 0.32 93:90 269 6 3 35 6 51 
f e r 1727:2112 6655 236 221 3.99 1565:2282 6663 146 331 752 332 2338 
duke2 175:211 676 80 61 0.7 162:233 684 74 72 98 82 128 
f51m 72-65 244 28 32 0.37 77:65 247 26 33 35 26 41 
= 3 2II2I5 990 80 69 0.88 301:250 1003 76 7 0 _ _ 1 4 9 _ _ 7 6 _ _ ^ 
my-adder 1067106 4 2 107:105 339 2 2 22 2 72 
n/ler8 58.72 174 8 14 0.27 58:72 174 8 14 22 8 31 
二 441；383 1251 54 38 0.80 442:384 1253 46 38 95 66 322 
sao2-hdl 136:114 439 26 16 0.47 128:123 440 16 16 31 16 89 
terml 124:148 439 28 14 0.53 130:148 445 24 14 62 28 88 
ttt2 120:107 376 10 11 0.39 102:125 376 8 11 f 8 72 
„3 471-384 1334 22 16 0.84 461:396 1336 20 16 4 40 321 
tooJarge 2161:"l913 7723 318 563 5.58 2162:1917 7728 308 544 699 1210 1995 
Total 45506 1850 | | 45656 1576 | 5476 3658 1 13495 
Average I I I I I +0.33%| -14.48% 
Table 4.2: 2-way partitioning results of hMetis-Kway & F M & G P 
4.5 Conclusions 
In this chapter, a scheme coupling the graph logic domain partitioners to 
explore a larger optimization room of circuit partitioning is proposed. The 
scheme is shown to be very efficient in terms of C P U expenditure and is also 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 59 
Circuit hMetis-Kway GP 250, FM 
i l i i #lits cut cut cpu i l iS # l i t s “ “ cut cut cpu cut cpu 
cost wire cost wire cost 
5xpl 37:43:52 f ^ ^ 1 ~ 5 3 " " " [ T l 0 . 6 7 40:47:50 240 [~49 p^S 
9sym-hdl 43:44:54 232 32 19 0.67 43:44:55 233 22 19 15 20 49 
C1355 217:196:187 1055 84 95 1.19 217:204:189 1065 80 93 157 143 269 
CI 908 148:186:182 883 115 82 1.10 146:191:185 889 98 98 127 134 226 
C2670 400:329:314 1444 85 59 1.34 402:337:312 1452 71 59 119 247 411 
C3540 376:464:423 2267 175 172 2.88 361:506:413 2284 165 224 357 413 664 
C432 67:92:79 392 56 46 0.67 72:95:83 ^ 54 70 67 50 91 
C499 193:169:141 8M 82 M 0 4 195:173:136 ^ ^ 64 62 108 188 
C5315 556:692:714 3282 117 114 3.31 521:709:739 3289 100 122 510 564 1125 
C6288 810:1086:960 5195 148 429 3.98 827:1061:978 5198 165 555 704 712 1697 
C7552 745:923:754 4105 94 129 3.42 750:941:739 4113 86 136 747 490 1271 
C880 139:178:166 780 80 58 1.06 145:182:163 787 61 58 60 103 202 
alu2 124:133:165 777 140 144 1.23 134:143:167 795 127 159 122 152 178 
alu4 230:307:248 1470 214 203 2.06 218:314:268 1489 194 211 231 283 361 
apex6 257:302:349 IIT? 75 47 243:321:358 T i M 6 3 70 223 113 447 
b9_n2 45:53:59 208 21 17 0.51 44:54:60 209 21 17 32 25 52 
comp 57:62:65 270 16 8 0.59 57:62:66 271 14 7 37 20 56 
des 1107:1512:1220 6655 322 342 7.06 1089:1535:1216 6656 236 628 776 657 2761 
duke2 113:147:126 676 128 107 1.18 106:143:154 693 116 112 103 121 162 
f51m 42:53:42 244 58 54 0.68 46:54:44 252 56 62 37 49 51 
misexS 153:209:176 990 128 114 1.50 154:208:192 1006 113 162 157 139 252 
my_adder 80:65:67 ^ 8 4 ^ 79:65:68 339 4 4 23 10 72 
pclerS 41:49:40 174 17 20 0.47 42:49:39 174 15 20 25 15 37 
rot 245:268:311 1251 85 68 1.44 233:272:330 1262 77 77 195 120 368 
sao2-hdl 95:83:72 439 62 46 0.86 95:94:68 446 56 57 65 56 112 
terml 77:88:107 439 54 38 0.88 78:91:108 443 44 38 32 54 111 
ttt2 64:89:74 376 33 34 0.78 66:90:75 380 33 34 55 37 86 
x3 271:262:322 1334 78 65 1.38 276:259:340 1354 67 65 106 130 419 
tooJarge 1562:1198:1314 7723 779 976 10.46丨 1606:1201:1284 7740 747 1096 734 1761 2 6 ^ 
Total 45506 3339 1 45748 2999 5914 6761 14370 
Average +0 .53%丨 -10 .18% 
Table 4.3: 3-way partitioning results of hMetis-Kway & F M & G P 
quite capable in bringing further improvements on good partition results pro-
duced by well-known partitioner hMetis-Kway. Without the integration with 
R A M B O , the input circuits is no longer limited to 2-input simple gate circuits. 
Experiments on 29 M C N C benchmark circuits for 2- to 5-way partitionings are 
conducted, and obtained further cutsize reductions from 14.48% to 9.24% upon 
the good results produced by hMetis-Kway. Moreover, the partitioning quality 
and C P U expenditure of G P are both better than running F M for 250 times. 
As G P can be integrated with any newly developed powerful graph partitioner, 
this partitioning scheme should be very practical and useful for many partition 
tasks. 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 60 
C i r c u i t h M e t i s - K w a y GP 250 F M ~ 
iTii #lits cut cut cpu i l i i # l i ts " " " cut cut cpu cut cpu 
cost wire cost wire cost  
5xpl 34:27:31:40 [ ~ 6 3 0 . 6 8 38:29:29:39 239 p ^ 
9sym-hdl 36:38:31:36 232 48 26 0.80 37:40:32:32 232 29 26 16 26 52 
C1355 174:158:130:138 1055 110 90 1.59 164:164:125:165 1073 96 108 166 185 305 
C1908 122:125:126:143 883 137 114 1.53 126:128:121:149 890 114 126 67 184 259 
C2670 262:257:262:262 1444 125 82 1.75 264:251:274:262 1452 91 114 122 274 481 
C3540 324:285:356:298 2267 245 211 3.36 337:280:378:281 2278 230 286 385 458 791 
C432 65:53:63:57 392 75 62 0.95 72:55:70:56 407 67 73 71 66 107 
C499 122:148:116:117 ^ ^ 77 0 6 123:150:112:120~ 856 81 81 131 145 218 
C5315 532:511:460:459 3282 201 152 3.91 537:510:461:461 3289 192 161 523 633 1325 
C6288 627:670:856:703 5195 186 582 4.52 627:678:842:716 5200 210 585 482 848 2051 
C7552 537:605:647:633 4105 66 153 4.37 511:634:648:638 4114 44 153 388 732 1551 
C880 108:113:136:126 780 82 58 1.33 109:123:138:123 790 69 121 124 107 230 
alu2 101:128:96:97 777 178 171 1.51 99:126:103:111 794 171 193 128 190 212 
alu4 162:197:204:222 1470 285 240 2.35 157:191:224:236 1493 257 308 123 352 452 
apex6 233:200:261:214 1417 75 45 EM 231:209:276:208“ 1433 65 77 233 139 555 
b9_n2 41:48:35:33 208 36 23 0.76 46:43:36:35 209 32 23 17 31 58 
comp 48:43:47:46 270 14 8 0.91 47:44:47:46 270 12 8 20 28 62 
des 938:1160:947:794 6655 359 515 7.51 889:1144:953:864 6665 295 713 774 716 3322 
duke2 116:95:96:79 676 159 127 1.33 113:82:115:89 698 147 137 112 158 183 
f51m 37:35:36:29 244 67 67 0.75 41:39:36:28 251 65 68 40 63 55 
misexS 157:138:132:111 990 188 177 1.70 160:149:143:107 1006 169 192 166 180 295 
my_adder 52:54:52:54 ^ 12 6 cTsi 52:53:52:55 339 6 6 24 23 81 
pclerS 30:28:33:39 174 24 25 0.57 31:28:32:39 174 22 25 26 20 40 
rot 193:191:208:232 1251 98 75 1.86 187:193:210:240 1256 93 91 205 166 457 
sao2-hdl 75:60:63:52 439 93 60 1.08 69:64:73:51 446 76 60 35 74 117 
terml 67:56:67:82 439 59 36 1.14 77:57:69:85 454 55 37 68 68 118 
ttt2 54:66:59:48 376 51 44 0.85 55:69:60:49 382 46 46 59 48 94 
x3 191:193:236:235 1334 78 52 1.86 190:196:249:244 1358 71 111 225 164 504 
tooJarge 862:1059:964:1189 7723 1041 1345 11.58| 823:1082:961:1221 7736 999 1685 756 2206 3281 
Total 45506 4250 45784 3864 5523 8342 17308 
Average | | | | +0.61%丨-9.08% 丨 丨 丨 I 
Table 4.4: 4-way partitioning results of hMetis-Kway & F M & G P 
Circuit hMetis-Kway GP . | 250. FM 
i^Fii #lits cut cut cpu aTea # l i t s c u t cut cpu cut cpu 
cost wire cost wire cost  
5xpl 26:21:23:31:31 1 ~ 7 8 [ ~ 7 1 0 . 8 9 23:22:32:29:31 f ^ P ^ 
9sym-hdl 32:30:24:24:31 232 57 31 0.96 32:27:28:25:30 233 33 31 16 35 55 
C1355 142:113:100:130:115 1055 122 97 1.95 142:115:95:144:123 1074 109 110 175 222 369 
C1908 91:111:93:117:104 883 153 111 1.70 84:118:84:123:112 888 119 131 72 206 313 
C2670 221:221:174:207:220 1444 133 95 2.26 186:229:195:217:229 1457 106 135 129 314 578 
C3540 214:258:223:307:261 2267 312 268 4.07 224:239:237:301:284 2288 265 341 206 551 990 
C432 42:46:43:59:48 392 80 74 1.25 47:48:41:57:57 404 77 88 75 73 126 
C499 107:104:85:103:104 ^ 107 ^ 1745 111:101:81:108:105 ^ 93 102 138 173 245 
C5315 371:315:370:497:409 3282 244 193 4.92 372:315:383:471:428 3289 232 268 558 706 1650 
C6288 517:496:525:651:667 5195 222 601 5.21 508:498:532:659:669 5204 241 601 820 876 2340 
C7552 568:504:396:533:421 4105 152 169 4.84 554:528:388:504:447 4095 135 220 787 836 1901 
C880 104:95:82:111:91 780 104 84 1.58 107:91:86:107:100 787 94 157 131 155 293 
alu2 71:78:79:95:99 777 209 198 1.92 67:93:88:89:99 791 192 228 130 227 245 
alu4 150:127:143:183:182 1470 310 291 3.05 147:130:171:181:173 1487 279 375 129 399 554 
apex6 205:205:148:157:193 TU7 132 ^ 2 ： ^ 206:206:199:145:171 1436 125 112 249 165 656 
b9_n2 25:30:28:39:35 208 36 27 0.84 24:29:33:39:39 214 36 32 36 39 65 
comp 39:42:30:41:32 270 24 15 0.97 30:42:31:42:38 269 20 15 21 36 73 
des 682-711:683:919:844 6655 484 382 10.03 667:699:667:954:889 6692 386 564 785 689 4246 
duke2 73:70:71:94:78 676 175 148 1.69 88:77:65:89:89 696 164 168 115 183 217 
f51m 27:21:24:35:30 244 81 78 0.93 32:30:21:27:32 249 77 86 39 76 59 
misexS 86:103:111:133:105 990 216 215 2.23 95:96:127:107:129 1006 203 276 173 240 364 
my adder 52:41:41:39:39 16 8 0 2 51:41:42:39:39 339 10 8 26 30 87 
pclerS 26:22:24:28:30 174 29 29 0.68 27:25:25:24:32 177 26 30 28 27 48 
rot 145:147:150:209:173 1251 127 97 2.20 160:151:142:197:190 1266 121 123 217 184 543 
sao2-hdl 63:51:45:43:48 439 109 74 1.46 61:60:46:44:48 448 97 93 66 93 145 
terml 66:55:48:56:47 439 71 44 1.34 66:58:45:65:55 453 66 59 71 79 146 
ttt2 46:55:35:45:46 376 59 58 1.15 47:53:36:51:45 383 56 72 62 65 112 
x3 193:190:136:152:184 1334 122 91 2.23 193:195:170:139:199 1375 112 171 243 173 605 
tooJarge 1003:851:661:782^7^ 7723 1221 1477 14.62| 975:879:723:765:743 7734 1159 1806 802 2503 4 4 7 ^ 
Total 45506 5185 4 5 8 2 8 4 7 0 6 6309 9324 21563 
Average | I I I +0.71% 丨 - 9 . 24% 丨 丨 丨 I 
Table 4.5: 5-way partitioning results of hMetis-Kway & F M & G P 
Chapter 4 Multi-way Partitioning using Rewiring Techniques 61 
Algorithm GP (best—partition, m, k, t ) { 
searchJ-imit = 0; 
n_perturbations = 0; 
curr_partition = best-partition; 
last—part it ion = best—partition; 
for i=l to m { 
while((n_perturbations < k) && (exit == fa l se ) ) { 
search-limit = 0; 
while(searchJ-imit < t ) { 
search_limit ++; 
randomly select a cut wire Wt ； 
use GBAW to find a l l alternative wires SWa for Wt； 






if (SWa = (t>H 
pick alternative wire Wi with the largest gain; 
replace Wt with Wi in curr_partition. 
c u r r - p a r t i t i o n = F M ( c u r r _ p a r t i t i o n )； 
n_perturbat ions = n_perturbation + 1; 
if (cost (curr_partitioii) < cost (last—partition)) 
last-partition = curr_partition; 
} 
for each wire w{ 
use GBAW to find a l l alternative wires SWa for w; 
do random perturbation on SWa in curr.partition; 
c u r r _ p a r t i t i o i i = F M ( c u r r _ p a r t i t i o n )； 
if (cost (curr_partition) < cost (last—partition)) 
last-partition = curr .partition; 
} } } 
Figure 4.13: Algorithm of GBAW-Partitioner (GP) 
Chapter 5 
Optimum FPGA Switch-Box 
Designs - HUSB 
5.1 Introduction 
Field Programmable Gate Array (FPGA), a kind of Very Large Scale Inte-
grated (VLSI) circuit, consists of an array of pre-fabricated functional blocks 
and wire segments with user-programmability of logic and routing resources. 
Because of their fast turn-around time and economic manufacturing cost for 
low volume designs, F P G A s have been used in a great amount of digital equip-
ments. F P G A technologies are commonly classified into three major categories: 
(1) Look-Up-Table (LUT), S R A M based (2) multiplexer, channel organized 
and anti-fused, and (3) LPLD, E P R O M based. In this chapter, the study 
of the optimum routing structure problems for the popular L U T and S R A M 
based two-dimensional (2-D) F P G A s is presented. The architecture of an in-
dustrial product of this type is described in [AR95, alt90, BFRV92, C W W 9 6 ] . 
The importance of routing resource issues in F P G A s is never over-emphasized. 
In commercial F P G A products, the routing resource consumes most of the chip 
area, and is responsible to most of the circuit delay. A typical 2 - D F P G A 
architecture is shown in Figure 5.1. The functional blocks (or logic cells) are 
marked by L, which are separated by vertical and horizontal channels. There 
62 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 6 3 
vertical channel . track ID wire segment 
1 23 4 Z ^ ^ 
、 f ^ r box ^ 
「 二 ^ ^ ^  
horizontal L L 3丄 - L pin of logic cell ^ ^ ^ W f r —— 
channel 二二 = : — _ _ _ L二：二 ：！—I, 
2 r Q — - S — - l l Z — switches / 
^：1-：：11 ：-jig [ZlffiqijlZ inCbox / 
logic cell p二=:"! __F二：二1 r^"""n 丄 . ^ [ 
T — _ ； ： 二 - — T — -t^ T 1 I : . . . ::,\�)rV:�:"^,.._|_i 
,___ ^ switches 9 :•?•::'/•.:..-•/..•::.'•::.. _ 2 
r^ 1 mSbox - . • • .-y 
[1=1= =•__|| I I ||__F=F=nr^^lF 干 ^Fl—— 3_ ；：.~3 
__ _ V. .• • • 二工二二 S 二工 =二 s 二3二：― 4 :<::藏..::,::::‘:々 .j 
-I—I—-|—I I—H—f—H—L I LL =1= j j ‘‘Y -
‘TTTT T T box^^^ I I I I 
Figure 5.1: The architecture of a 2D F P G A . 
are W (called channel density) prefabricated parallel wire segments running 
between each pair of adjacent L-cells in both vertical and horizontal channels. 
The wire segments in a vertical (or horizontal) channel are aligned into W 
vertical (or horizontal) tracks; each track within a channel is assigned an in-
teger in {1,..., W} as its track ID. There are C-boxes in the channel between 
adjacent L-cells. A Switch Box (S-box), located at each intersection of a verti-
cal and horizontal channels, contains programmable switches to connect wire 
segments running from its surrounding C-boxes. 
W h e n an F P G A is used to realize a specified Boolean function, the pins 
used to realize the Boolean function are partitioned into groups (called nets). 
Then the pins in each group are connected together to form a real net by using 
available wire segments and switches in both C-boxes and S-boxes; different 
nets are disconnected. The latter process is referred to as a routing. Con-
ventionally, the routing process is divided into two subsequent steps, global 
routing and detailed routing, although there is no absolute need for doing 
routing in these two phases. In this chapter, the term of global routing is 
used to specify the connection topologies for all nets. The detailed routing 
decides the exact assignment of wire segments and switches used to materi-
alize the complete routing. As the connectivity within C-boxes is complete, 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 6 4 
the routability of the entire chip is fully dependent on the structure and con-
nectivity of the S-boxes [AR95, BFRV92, C W W 9 6 , LW95, P W W Y 9 2 , WC94， 
W M S 9 7 , W T M S 9 4 , W T M S 9 6 ] . As the routing resource is relatively expen-
sive in F P G A chips, it is clearly desirable to design switch boxes (S-boxes) 
with maximized routability and minimum number of programmable routing 
switches. 
A n F P G A switch box is said to be hyper-universal if detailed routing can 
be performed on all possible multi-pin net topologies which satisfy the global 
routing density constraints. The switch-box is optimum if it is hyper-universal 
and the switches inside is minimum. 
It has been shown that if the net topology is restricted to 2-pin nets, then 
a 2-D (4-way) switch box can be built to be universal with only 6W switches, 
where W is the global routing channel density. As the routing resource is rel-
atively expensive in F P G A chips, searching for optimum switch box designs is 
clearly a topic with both theoretical challenges and immense commercial silicon 
reduction values. In [FLWOO], a formal mathematical model of this optimum 
switch box design problem for arbitrary dimensions has been constructed. It 
gave a scheme to produce hyper-universal designs with less than 6.7W switches 
for 4-way F P G A switch boxes. In this chapter, further investigation of this 
most common 4-way switch box case is done, and give new theoretical results 
followed by extensive experimental justifications. The results seem to be quite 
attractive and show that such an optimum switch box can be built with a very 
low number of additional switches beyond 6 W for today's practical range of 
low W's. If the number of switches of an optimum 4-way switch box with a 
density of W will be presented as e(4, W), then e(4, W) = 6W for W being 2, 
3，or 5; e(4, VT) - 6W + 1 for W = 4; e(4，6) < = 6W + 2, and e(4, 7) 6W 
+ 1. For arbitrary large W's, the bound can be shown to be under 6.341^. 
To make an even experimental comparison, V P R [BR97] which is today's best 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 65 
published F P G A router is selected for large benchmarks on the popular Dis-
joint F P G A switch box structure and the proposed designs. The results are 
quite encouraging. 
This chapter is organized as follows. Section 5.2 gives definitions of routing 
architectures. Section 5.3 introduces different F P G A routing algorithms and 
makes comparison. The switch-box analysis is described in Section 5.4. For 
completeness, some basis previous results on S-box designs, a decomposition 
property of global routings and terminology will also be mentioned in Section 
5.5. In Section 5.6，new results for optimum or hyper-universal S-boxes are 
provided. Section 5.7 shows the experimental results and conclusion is drawn 
in Section 5.8. 
5.2 Background and Definitions 
5.2.1 Routing Architectures 
The routing resources of F P G A consist of programmable components and wire 
segments. Since the number of switches in an F P G A is proportionally increase 
with the area and delay penalty, there is basically a trade-off between routabil-
ity and the performance for routing modules in F P G A . There are two main 
modules, switch-module and connection-module in F P G A . 
Switch-block Architecture 
In [RB91], the flexibility of the switch-block and connection block are studied. 
The number of programmable switches between a terminal and others inside 
a switch-box is represented by Fs as the flexibility of a switch-box. A switch 
module with flexibility of 3 is shown in Figure 5.2 which means each pin inside 
the switch-box is connected to three other pins. A large flexibility means a 
large number of choices. Only switch-box allows the change of channel in the 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 66 
overall routing process. 
2 1 0 
�� 
1 — 、、、  一 1 
\ 
\ 




2 1 0 
Switch-Blcok with Fs = 3 
Figure 5.2: A Switch module with = 3 
Connection-Module Architecture 
As shown in Figure 5.3, there are two wire segments from the logic box to the 
connection box. The connection box allows the connection for each edge on 
the wire segment. In Figure 5.4 which sourced from [RB91], the percentage of 
completion to successfully route the BNRE circuit is plotted versus the change 
in Fc on different Fg. It showed that for large switch-box, the increase in F。 
does not greatly improve the routability. 
2 1 0 
0 
〇 Logic Logic 
Block K——�卜- Block 
1 — K  
1 
C o n n e c t i o n B l o c k w i t h F。 = 2 
Figure 5.3: A Connection module with Fc = 2 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 67 
% c o m p l e t i o n  
Fs = 2 
100.00 - ： ， 厂 / 严 - ^ ^ ： ： ： ^ -F-s"-=--3-
95.。。_ / / / 〜 
90.00 - / / / / / / Fs = 5 
85.00 - / / / / / / 
80.00 - / ‘ / / / / Fs = 7 
/ ‘ ‘ / / / 75.00 — / , / / 
70.00 — 丨 ， ！ /' / / 
65.00 _ 丨 ： / / /' / 
60.00 — I > /// 
55.00 — I ‘ ,'/ / 
50.00 — 丨• / / / / 
‘//' / / 
45.00 — ‘ / / / / ‘//// 
40.00 — '/,''/ 
‘ I / 
7,/ 
35.00 — //// 
/// 
30.00 — '// 
25.00 — / 
1 1 1 1 1 I I ^ 
2.00 4.00 6.00 8.00 10.00 12.00 14.00 
Figure 5.4: Percentage of completion versus Fc for the circuit B N R E 
5.2.2 Global Routing 
In the phase of global routing, the routing algorithms will generate a "loose" 
route for each net, that means it is assign the routing regions for each net. For 
example, in Figure 5.5, there are two possible global route for the connection 
from the middle left logic box to the right logic box. The global routing 
algorithm will determine the best or the shortest net without going into the 
detail of the routing modules. 
5.2.3 Detailed Routing 
In the phase of detailed routing, the algorithms will find the actual geometric 
and physical wire for real connection. The exact layout will be produced after 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 68 
L --r-- L ::C：： L 
L —::c：： L :C：： L 
L IZ::C：： L :C：： L 
Figure 5.5: Example of Global Routing in F P G A 
detailed routing. For example, in Figure 5.6, if the upper global route is 
selected, then the detailed routing algorithm will go further and determine the 
connection inside switch. 
L L ::C：： L 
L :c：： L :C：： L 
3:：::= s s 三:::::= 
L --r-- L :c：： L 
Figure 5.6: Example of Detailed Routing in F P G A 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 69 
5-3 FPGA Router Comparison 
5.3.1 CGE 
In [BRV92], a new detailed routing algorithm which called Coarse Graph Ex-
pansion (CGE) and designed for F P G A is introduced. The algorithm supports 
different kinds of F P G A architecture. Since the wire segments in F P G A s are 
pre-placed, there exists competition between the same wiring segments when 
applying routing algorithm. Thus, C G E applies 2 phases to ensure the 100% 
routing on the industrial circuits. It can also optimize the routing delays for 
some critical paths. 
Example of Global route from (2,2) to (4,4) 
^ . \ ^ 
A. . - L ^ - jyiFM^H^i^ hwmfr - - L = Logic Blcok 
—W 4V IT w 
• 丨 J : : C = Connectioii Blcok 
3. .LgJTTTT. TTT：-i^.TTTT. .g. rTTTT s = switch Bicok 
W v V W 4 - V W 
： ： i ： ： pMn pLil M pUt 
2" - - Xj ^ i r C ‘ ‘ i-T ‘ 
‘ ‘ I • * I ‘ 
I I ^ I I I I ‘ ‘ . 
I I « ‘ 
: . • , • channel segment 
• • • _ ‘ 
I I I ' I I j I I I ‘ I 
1 - - f c T ^ - s- - i i i j ^ t i i - zzzzF^-
‘ “ I I < i • 1 * 
• I _ I I I I I I ‘ 
t • • • 
I • • ‘ 
書 • » • » 
- ’ I > j I I I 11 
0- - -L- -t^- [ZZZ-6-ZZZZ -L-
I I I • 0 1 2 3 4 
Figure 5.7: C G E : The F P G A Model with one Global Route 
The input to C G E detailed routing algorithm is the LocusRoute global 
router [Ros90] which divides multiple-pin nets into 2-pin nets and routes them 
in the shortest distance paths. In the first phase of C G E , it generates a number 
of alternatives for the detailed routing on each global route plus pruning to 
reduce the number of paths. In second phase, there is a simple cost function 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 70 
which is the sum of the costs of its edges used to evaluate which detailed 
routing should be chosen. 
Block Grid coordinates Block Grid coordinates 
L Q 2,2 L Q 2,2 
0 
C I C X). 2,3 
Expand JT 
S p 3 , 3 — • s O 9 O 3,3 
0/ 1 1 2 
C C n O O 3 . 
1 1 1 1 
^ o - L 6 6 6 6 
Coarse Graph (Global Route) Expanded Graph 
Figure 5.8: C G E : The Coarse Graph and its Expanded Graph 
As shown in Figure 5.7, it shows the F P G A model with a global route from 
grid coordinate (2,2) to (4,4). The coarse graph and its expanded graph are 
also shown in Figure 5.8 and it is graph representation for the alternatives in 
phase 1. 
5.3.2 SEGA 
In [LB93], they address the allocation of wire segments which best match with 
the length of connection such that S E G A is able to obtain better routing result 
than C G E . For example, if there is a connection from (0,0) to (4,4) as shown 
in Figure 5.9, S E G A builds an express edge from (1,0) to (1,3) instead of three 
short segments as shown in Figure 5.10. 
Since S E G A performs the matching between the connections and the wire 
segments length, it provides a high quality routing result and achieves good 
performance. 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 71 
Example of Global route from (0,0) to (4,4) 
—: 1 ！ I ‘ 1 I _ ！ _ I I ‘ 
4- • - L- - Ir - - • • - - t! tr • L = Logic Blcok 
, -_ - t ——-^^― I •—•- [ I I » 
‘ •丨 ！ C = Connection Blcok 
3 - S = Switch Blcok 
, ^ ― < ~ ^ I I • I * 
• I ‘ 
2- • -L- 闷 -1- e- 丄-
I I I I "H ！ I • 
I . • channel segment 
. — t I » I • — I . I . 
Q - , -L- - tj (o ti" 
I I ‘ _ I • 
— , I 1 • 1 ‘ 
0 1 2 3 4 
Figure 5.9: S E G A : The F P G A Model with one Global Route 
5.3.3 TRACER 
In [CLWL95], a RAM-based F P G A routing algorithm ( T R A C E R ) is presented. 
T R A C E R basically consists of 2 router, an expansion router plus a rip-up and 
rerouter. The expansion router will use up all the F P G A routing resources 
without considering the connections of other nets. As a result, the existing 
violations are solved by rip-up and rerouter. 
The important issue for rerouter is to choose which net should be ripped up. 
The straight forward approach to select bad net might fall into local minima. 
Therefore T R A C E R does not choose the net causing a lot of violations to rip-
up, but instead there is a score for each net for the selection. The score is 
shown below where a is set by the user: 
scorein,) = a * + (1 — a) * num.violations. 
W h e n T R A C E R is unable to solved all the violation among different con-
gested nets in a certain time, the router will abort automatically. However, 
within feasible time, the T R A C E R will record the feasible solution and find a 
better solution. T R A C E R provides a better solution than C G E and S E G A . 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 72 
Block Grid coordinates Block Grid coordinates 
L Q O'O L O 0'。 
c • C 
I I I Express Edge 
s ^ Q 1,1 
Build Express Edge I 
s Q ® CJ 
^ 6 - e 
S 3,3 = 6 3.3 
。（！) 3,4 ^ 6 -
Coarae Graph (Global Route) Coarse Graph with Express Edge 
Figure 5.10: S E G A : The Coarse Graph with Express Edge 
5.3.4 VPR 
Versatile Place and Route (VPR) [BR97] was introduced in 1997 and is known 
as the best F P G A router nowadays. V P R can place and route an F P G A design 
into different F P G A architectures which is different from other F P G A router. 
The routing algorithm of V P R is based on the Pathfinder negotiated congestion 
algorithm [EMHB95] and it also applies two phases of routing, i.e. shortest 
path routing (Dijkstra's algorithm, a maze router [Lee61]), and ripping-up and 
re-routing. In the Table 5.1 shown below, the comparison between different 
routers with Fc = W and Fg = 3 and the switch-box architectures is Disjoint 
type (XC4000). The inputs are the placement results by Altor [RSV85:. 
Global R o u t i n g L o c u s R o u t e LocusRoute VPR TRACER VPR 
Detailed Routing CGE SEGA SEGA TRACER VPR 
9symml 9 9 7 6 6 
alu2 12 10 8 9 8 
alu4 15 13 10 11 9 
apex7 13 13 10 8 8 
example2 18 17 10 10 9 
k2 19 16 14 14 12 
terml 10 9 8 7 7 
tooJarge 13 11 10 9 8 
vda U U 12 n 10 
Total 123 112 89 85 77 
Table 5.1: Comparison between different F P G A Routers 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 73 
5.4 Switch Box Design 
5.4.1 Disjoint type switch box (XC4000-type) 
The most commonly used Switch-box is XC4000-type which is implemented 
in Xilinx XC4000 family. The architectures are illustrated in Figure 5.11 with 
VF = 3,4. Disjoint means the connection between different track is isolated as 
shown in Figure 5.12. There are totally 3 independent set of connections. 
,k K /fv:::k-- k 
/ 、、、/-'f-、、、 ： \ /； >'--:'-《、I \ 1 、、、 
/ J - - - 八 ： � � � ！ ��� �、、丨� � � j 
• z 、、• 、、_ 、、 \ 7 ‘ o" "J、、 I、、 •、、 ‘ 
^ . - -- -- --^If- -- -- -- --V -- -- -- -- - ^― �� I •〜、 j 、、 J 、\ I 、、、/ ;\ / 、、<Z/ ： 、、、I 、、、：、、、、,’ 、、、/ ！ i、、、 i、、、！ ,'/>'、： i \ ： \ >’ /卜、！ 、、、 \/ -^ --Y-" •“ “！ V、- '“―/V-
f I \ I ’ V ‘ L � ' / _ � � • � • \ ' f / : 、、、 ： 、、、： ，'、、， V' I \ ！ ;、、、 I 、、、:/ 
、了 I _\ ‘ / / , \ I • � • “I � 
、、,' ！、、 丨 丨、、，'/ / ： \ ； 、、、： ！ 、、丨 /; \ 
广、、 ：、、、 ： ：、、、;'' — r r — V ： 
, ' 、 、 ： 、 、 ： I A 7、 ：、、 、、、 ‘ ；人 \ ： \ I / 、、 /、、 \ , \ I ‘ ,~f、 ； \ 、、， \/ ：  ； 、、： 、、： \ 1 /y \ 
‘ 、_! V' , 、 ‘ \ I 、！、、 • 少f / 、 
、、、 f\ 1、、、 /1 J."-* 一— -'k 一 
、、 ：、、 、、‘':-；"‘ 、、 \ .、、 •；'*、/--!"? 
、、、、、、、:>小:八丨/ 、、、、丨 
~、、’i----、:f 、、t—— 1 - " f r 1 
(a) Disjoint S-Box when W = 3 (b) Disjoint S-Box when W = 4 
Figure 5.11: XC4000-type Switch-Box Architectures for W 二 3 and 4 
/K 分 / i \ 
'二-,，、、•、、：、、 :‘ 、、 J'"* * * » • » ‘ I * 
……-1\ l\/ / I \ 
/ , ��_ I ��_ • � � / ； 、、 〔-•»'-—K 卜 K — ^ ^ ^ ^ / 、 
X i \ i i V 、、,、 
\ i \i \/卜、、 ！ \  )i. ^  ’’ , 、 1  
虚 r. I Kj:::::z-
\ \ + 
TT；  
.... \ \ / i \ 
… - - / ' 丨 \ 
I ‘ / I V 
\ / I z i………•\ 
I 、、 、-、 ：/ • 、、 ： 
I f \ • ‘ 
I * \ I / 
\ I » I 
\ • / I ‘ 
\i/ 
Figure 5.12: Disjoint relationship between different tracks for W = 3 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 74 
5.4.2 Anti-symmetric switch box 
In Figure 5.13, it shows two different size of anti-symmetric switch with Fs = 3 
which is implemented in [RB91 . 
— 7 k 7 k ： A \ ,f、、,'.!、、、 
/ \ ' _ \ Z �� / I � � / • � V ' ‘ \ 
/ I \ / ！ A ' 、 、、、 / ! >： ' ' 、 、 ： ,、、； \ 
1 \ 、、 I 、、、 / 1 , 、、： Z 、、： 、、、 
！ / X'！ 、\ ！ 、\ — ； - 朱 -n- 
~ \ 广-”、-、- T\" -厂 、、、、z4' i、>:' i、、、、i X' 
\ ,方 ；\ i、、、、/ ,z、,:丨 I,'、、、、j 、、  
/, , 、、 ! 义 一 V ；t r. ^ /一 
A： i /、、、— V： i、、/ i i、v' i、、、/ 、、、、/;、、、、 i iX/、、、、、i/\i i/、、、、i '、、、 
/ \ 、、、： ’',、、、 、、、、Z个、、i、、、/: 
/ 1 、 、 ’ : 、 、 ： 、 、 _ [ ， 、 , * \ — � - -V ； \ ! �\J / ' �、、： ，/'! \ 
\ ： I / I / 厂 一 • 一 _ 垂 • 嘛 ^ • • 一 _ « • • • 垂 _ _ • 垂 广 释 一 - 一 y • • • 穩 響 》 _ ， 、 、 \ 身 、 , / � • � � • I , 
\ /、、、、/ \ 丨 X、V!'X、丨 / 
、 、 个 十 � r � � � r+ ' 、 、 卜 � f 
(a) Anti-symmetric S-Box, W = 3 (b) Anti-symmetric S-Box, W = 4 
Figure 5.13: Anti-symmetric Switch-Box Architectures for W = 3 and 4 
5.4.3 Universal Switch box 
In [ C W W 9 6 ] a so called Universal Switch Box (USB) structure, which is a 
4-way S-box of density W with 6W switches, has been proposed. However, 
this model only accommodates 2-pin nets, therefore designing a general S-box 
for all kinds of nets is important and necessary. In Figure 5.14, it shows two 
different U S B architectures. 
—7k""/!、、、/'k— """,'f\ ,4、、、yk ,4、、、 
/ \ / I �� / I �� / _ � � / • � � ‘ I � ‘ _ �� � �z • X / I �� / _ � ‘ 人 I A _ \ 
/ \ > I X I N / I z \ , Z 、 、 _ / � _ 、 
/ Z、、 • 广 、 丨 、、 / ： 、、丨 Z 、、，Z 、、， 、、 、、、丨Z \| 、、、— “ \--；‘： ；'TV :々 -、: T;: ；' 
'"：>： /f\ IN--…7— 、、、/ i / i、、/ i、、、、i V' 
V ' i / i、、、、、丨、、/ 、、、、!/' i,'、、、、丨、、、J,' \ 
、、、、、丨/ i V 乂 v ' : v ] 「 • 「 • 
一\"'"yiN 1 ；、、、、丨/、、、、i ：/\丨、、、、、 
\ 八 、、/ • \ •、、 Z • Z • \ / 
/ � ‘ \ _ I / V X _ \ » I » < 内、 
/ 、 、 、 、 • , ' 、 ， / ' 、 、 • 、 、 _ 、 、 • ‘ , Z、、 
、 广* • / * ‘ M/ K* \ 、、、 ！ \ / 1 \ / ： / ~ \ 一 
� A » K » � � / • I � � ‘ _ 
、、、、i/、、、、i,'、、、、！/ 、、、、丨Z\i,'、、、L八丨/ 
~、 T T f ~ 、「T r 、r 
(a) USB S-Box, W = 3 (b) USB S-Box, W = 4 
Figure 5.14: Universal Switch-Box Architectures for W — 3 and 4 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 75 
5.4.4 Switch box Analysis 
For 2-D type F P G A , there are six types of connection as depicted in Fig-
ure 5.15. If there are two type 1, two type 2, one type 3 and one type 5 
connections, the possible routing results are shown in Figure 5.16, 5.17, 5.18 
for Disjoint, Anti-symmetric and Universal switch-box respectively. 
,, Type 2 
Tvpe — T v p e 6 
Type 1 
Type ^ J 、 T y p e 5 
Figure 5.15: Six different types of connections 
. 称 、 r y ...涉、j\�| 
/ \ \ / \ \i /、、、、、 
\ 、、、、、、、、、j.-X' / 
\ ..--V >- \ - - - " ' j /  
Unroutable Routable 
Figure 5.16: Example of routing over Disjoint Switch-box 
— ^ / f . .---'k ~/k/]、、、、卜、、A\ ~ 
!\ /" .i, .、\ 、 \ / 、、‘ 、，、：V' 、‘‘ ’ I 、、、 V V - 、 >： 
\| /•--. i 、、、、/、、、 :<1 _ 
、、、/、、、丨 入-,'" V 丨\ ,'v" 
/ \ \ ./r\ /、、、、-. \i. .,'、、、、、、、 _/\i 
\ V 父 1 / \ VIA / \r-x-A / 
\ \ / I I \ \|/ � �� , I 十 ' � �� / 
Unroutable Unroutable Unroutable 
Figure 5.17: Example of routing over Anti-symmetric Switch-box 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 76 
From the above figures, the required connection is unroutable in anti-
symmetric switch-box but is is routable in both Disjoint and Universal switch-
box as shown in Figure 5.18. Another example is depicted in Figure 5.19, it 
shows the routability of U S B is higher than Disjoint type because Disjoint S-
box failed to route one type 1, one type 3 and one type 6. From [ C W W 9 6 ] the 
results are put in Table 5.2, U S B has higher flexibility than anti-symmetric and 
disjoint type S-box. A larger routing capacity always means a better routing 
solution, therefore it is necessary to improve the routability. 
/、、(>、、、、二-
• /r\、‘、、v"" 
A / i、、、、、、八 
八 \ i / A 
、、、.、、、、‘'i、、、/ 
Routable 
Figure 5.18: Example of routing over Universal Switch-box 
J>r\、\ /I、\ „, l \ 
_ j. N _ _《 I …-_、— 、、/ I 、、‘、、i / 、、、丨 丨，'’ \ 丨 、、，： / 、、、、、、丨 / 
/ r � � > � / ,'，:、、、 / ：� � � i V ；� � � i � � � 
/ i 、、、 i /\ / ！ 、、、、)/ I 、、 、、 • 、、、 I 1'''' 、、、 I /、、‘ I '' 
*、 • 、一 - I I * • Z 、 ！ , 
、个-z、、f Y Y 
Unroutable for Disjoint S-box Routable for USB 
Figure 5.19: Another comparison between Disjoint and Universal Switch-box 
Circuit Fc USB Disjoint Anti-symmetric 
BUSC 9 lO n 10 
DMA 10 11 14 11 
DFSM 10 11 15 11 
BNRE 12 12 14 14 
Z03 14 14 15 U  
Total I - II 58 I 69 I 60 
Table 5.2: Detailed routing result by C G E over different switch-box 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 77 
5.5 Terminology 
The terminology and symbols of graphs are referred to [BM76]. Let G = 
(V(G),E(G)) be a simple graph with vertex set V{G) and edge set E(G). 
V{G)\ (or \G\) and \E{G)\ are denoted as the number of vertices and edges in 
G, respectively. Let S C V{G). denotes the induced subgraph of G by S. 
Vi^ Vi^  …Vii is used to denote the path with consecutive vertices vi^.vi^,..., vi^ . 
Let W and k be positive integers with A: > 2. In [FLWOO], a (k, P^)-global 
routing is represented as a collection GR = {Ni\i = 1,..., /} of non-empty 
subsets of {1，...，k} such that each element of {1,2,..., k} belongs to exactly 
W subsets of GR. W is called the density of the global routing, and a (A;, W)-
global routing is also called a A;-way global routing with density W. Each 
Ni in GR is referred to as a net of the global routing. Note that a global 
routing GR is a multiple set; two equal sets in GR represent two different nets 
in the global routing. Note also that a net of cardinality n corresponds to an 
n-pin net. For simplicity, 1-pin nets are allowed to ensure that each element 
of {1,2,..., k} appears exactly W subsets of GR. 
Let GRi and GR2 be global routings and m a positive integer. The union 
of GRi and GR2 as multiple set is denoted by GRi + GR2, and mGRi is the 
union of m GRiS. 
Given a global routing a local and mathematical view, further view is taken 
to the track with ID j on the z-th side of an S-box as a vertex vij and a switch 
connecting the track with ID j on the z-th side and the track with ID m on the 
h-th side is an edge VijVm,h- Therefore, any /c-way S-box of density W can be 
represented as a A:-partite graph G on where Vi = {vij\j — 1,..., W } 
and each Vi is an independent set in G^ for z = 1,..., A:. The graph G is called as 
a (k, VF)-design. In particular, a 4-way S-box of density VF is a (4, VF)-design. 
Let G be a {k, PF)-design on (Vi,... ,14). A detailed [k, l^)-routing (or 
shortly detailed routing) of a {k, Vr)-global routing {Ni\i = 1,...,/} in G is a 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 78 
set of mutually vertex disjoint subgraphs {T{Ni)\i = 1..•，of G satisfying: 
(1) T{Ni) is a tree of \Ni\ vertices, and (2) \Vj H V{T{Ni))\ = 1 if j e N“ for 
i = l,..”l. T{Ni) is called a detailed routing of Ni. 
Hyper-universal design 
A hyper-universal (k, VK)-design on (Vi,...，T4) is a (k, ^ )^-design on (Fi,. • •，V^ ) 
such that it contains a detailed routing for each (A;, l^)-global routing. For ex-
ample, the complete A:-partite graph on (Vi,..., Vk) (in which, there is an edge 
joining each pair of vertices Vij and with ii ^ i) is a hyper-universal 
(A:, VF)-design. A hyper-universal (A;, T^)-design represents a k-way S-box of 
density W (also called a {k,W) S-box) which can accommodate any {k,W)-
global routings. A (2,4) and a (3,4) design are depicted in Figure 5.20. 
1 2 3 4 
1111 
1 2 3 4 1 2 3 4 
(2,4)- HUSB (3,4)- HUSB 
Figure 5.20: (2,4) and (3,4) design 
Optimum design 
A n optimum {k, VF)-design is a hyper-universal�k, VF)-design with the min-
imum number of edges. Clearly, the number of edges in an optimum (/c, W)-
design is uniquely determined by k and W, which is denoted by e(k, W). 
A global routing is called primitive if it does not contain two unequal nets 
of size 1. If a (Jc, Vl^ )-global routing GR is not primitive, then the unequal nets 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 79 
of size 1 are combined into nets of size 2 to obtain a primitive (k, PF)-global 
routing GB!. Any detailed global routing of GR' will induce a detailed global 
routing of GR by simply deleting the edges of those one edge trees representing 
the nets of size two in GB! which are obtained by combining the unequal nets of 
size 1 in GR. Therefore, to verify that a (k, Vl^)-design is hyper-universal, it is 
only necessary to show that each primitive global routing is detailed routable. 
This approach depends on a very nice decomposition property of global 
routings. Let GR be a {k, VF)-global routing and GR' be a sub-collection of 
GR. If GR' is a (A;, n)-global routing with n<W, GR' is called a sub-global 
routing of GR. GR is said to be minimal if it does not contain subglobal 
routings. The following result was proved in [FHL . 
Lemma 1 For any integer k with k > 2, there exists an integer f{k) such 
that any k-狐y global routing GR could be decomposed into minimal /c-way 
subglobal routings with densities at most f{k). Moreover, f{k) = A; - 1 for 
k = 2,3,4. 
In [FLWOO], a general reduction technique is developed for designing [k, W) 
S-boxes. 
I. Find f{k) and all k-way minimal global routings. The existence of f(k) 
and the finiteness of the number of minimal k-way global routings are 
guaranteed by L e m m a 1. 
II. Determine p and r such that W = pq + r, p and r are as small as possible 
so that any (/c, VF)-global routing is a union of q subglobal routings of 
density p and a subglobal routing of density r. (Note that as k is fixed 
and each W corresponds to a unique r, it may has more than one r as 
W varies but there are finitely many such r's for all W). 
III. Design a hyper-universal (k,p) S-box Si and a (k,r) S-box S2 with the 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 80 
number of switches as small as possible. Then a disjoint union of q copies 
of Si and an S2 is a hyper-universal (k, W) S-box. 
IV. Design an efficient detailed routing algorithm to detailed-route any A;-way 
global routing in the S-boxes designed in III. 
A n efficient algorithm for IV is given. N o w the 4-way S-box designs will be 
focused. 
For Step 1，Lemma 1 shows that /(4) = 3. For Step 2, [FLWOO] showed 
that any (4, VI^)-global routing can be decomposed as a union of (4,6)-global 
routings and at most 1 (4, r)-global routing for some r 二 1, 2，3,4, 5 and 7. 
The following are all primitive minimal (4, l^)-global routings. 
For Step III, [FLWOO] gave a hyper-universal (4, Vl^)-design F{W) with less 
than 6.71^ switches. 
The goal of this chapter is to further investigate Step III for obtaining 
better (4, i)-designs for i = 3,4 and 6 and a better (4, T^)-design than the 
(4, W)-design F{W) constructed in [FLWOO；. 
The following result was proved in [FLWOO] which will be used in this 
chapter. 
Lemma 2 Let G = ((^1,^2,^3,^4),^) be a hyper-universal (4, VF)-design. 
Then \E\ > 6W. If G is restricted on any two parts, it gives a hyper-universal 
(2, T^)-design. If G is restricted on any three parts, it gives a hyper-universal 
(3, VF)-design. The optimum (2, VK)-design is a perfect matching, and an op-
timum (3, VF)-design is a Hamilton cycle which includes each vertex in the 
graph and forms a circle. Moreover, the optimum (3,4)-design must be a 
Hamiltonian cycle. 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 81 
Gi?卜{{1,2,3,4}} 
碑，1 = {{1,2}, {3,4}} 
碑,2 二 {{1,3}, {2,4}} 
碑 3 二 { { M } , {2, 3}} 
綱 , 1 = {{1},{2,3,4}} 
啤 2 = {{2}，{1,3，4}} 
Gi^“ = {{3},{l,2,4}} 
G 用 1 = {{1,2，3}, {1,2，4}, {3,4}} 
關’2 = {{1，2，3}, {2,3，4}, {1,4}} 
0用，3 二 {{1,2,4}，{2,3，4},{1，3}} 
Gi?f4 = {{M,4}，{2,3,4},{l,2}} 
Gi<5 = {{l,2,3}，{l,3,4},{2,4}} 
啤 6 = {{1，2，4},{1，3,4}，{2，3}} 
爛 , 1 二 {{1，2,3}, {1,4}, {2}，{3,4}} 
G 瑪 2 = {{1,2,4}, {1,3}, {2}, {3,4}} 
G丑• 3 二 {{2,3,4}，{1,4}，{2}, {1,3}} 
啤 4 = {{1,2，3}, {1，4}, {3}, {2,4}} 
5 二 {{1，4,3}, {1,2}, {3}, {2,4}} 
吨 6 二 {{2,4,3}, {1,2}, {3}, {1,4}} 
Gi?2^ = {{2,4,3},{l,2},{4},{l,3}} 
= {{1,4,3}, {1,2}, {4}, {2, 3}} 
啤 9 = {{1,2，4},{1,3},{4},{2,3}} 
Gi^li。= {{l,2,4},{4，3},{l}’{2，3}} 
碑 11 = {{M，3}，{4,2}, {1}，{2,3}} 
G丑•，12 二 {{2,1,3}, {4，2}，{1}, {4，3}} 
吨 1 = {{1，2}，{3,1},{2,3},{4}，{4}} 
綱2 二 {{1,2}, {4,1}, {2，4}, {3}, {3}} 
吨 3 二 {{M}，{4,1}，{3，4}, {2}, {2}} 
碑4 二 {{3,2},{4,3},{2,4},{1}，{1}} 
GRI 二 {{1，2,3}, {1, 2,4}, {3,4，1}, {2,3,4}} 
GRl 1 二 {{1，2，3}, {1,4}, {2,4}, {3，4}, {1, 2,3}} 
GRl^ = {{2,3，4}, {1，2}, {1,3}，{1,4}，{2,3,4}} 
GRi, 二 {{3,4，1}，{2,1}, {2,3}, {2,4}, {3,4，1}} 
GRl, = {{4,1,2}, {3,1}，{3, 2}, {3,4}，{4，1, 2}} 
Table 5.3: Primitive minimal 4-way global routings 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 82 
[ f . , \ 
V2，31 V3 i " V 3 ， 2 、 2 ^ ___^3，7 
卜’ 1 . |V2，i V^i’i ^ ^ 
V V Hi [ ^ A 
\3 、3 Vi’2 V2’2 < ——“ 
H3 ’ H2 ’ \ V2,6 / 
v.4. y j V 4 ， i < ^ ^ : : : ^ l _ ^ V i ， i Vi,6 
7vi4 \ , \ V2，5^^  \V3’5 
N V2’3'> 也 i … U s \ 
V3 3 V4’5 
ZTV If^ l “V3，4 “ 
U ^^  V V4’6 V3’6 
MV3,2 V4’3 广 v r x p ] 
' f ^ A V2’5i 丨 >3 i '丨 V3’4 V3’5 丨丨 V V nV2,6、…丄 ^ "、！ 
V'^^^V r T V N v i 1 1 L 
V4\ ^vn V3’5 V4,5 Vu V2’4 v' v,^  v：, 二 
‘ H4 ‘ H6 H7 
Figure 5.21: A family of hyper-universal 4-way designs. 
5.6 Hyper-universal (4, Ty)-design analysis 
To design a hyper-universal (4, VF)-design, Step III of the reduction design 
technique mentioned in Section 2 is processed. Figure 5.21 provides a family 
of (4, z)-designs Hi for z = 1,2, 3，4, 5, 6, 7 which are necessary for designing a 
(4, VF)-design for all W according to this technique. Figure 5.22 also shows 
the structure of (4,7)-design Bj on the S-box. In [FLWOO], it showed that Hi 
is an optimum (4, i)-design for i = 1,2. The following lemmas claim that Hi,s 
are hyper-universal or optimum (4, z)-designs for i = 3,4, 5, 6, 7. The prove of 
these lemmas is discussed in this later section. 
Lemma 3 Hi is an optimum (4, z)-design for z == 3,4, 5. 
Lemma 4 Hi is a hyper-universal (4，i)-design for i = 6, 7. 
N o w define G{W) as the following graph: 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 83 
,v,7 y2s |V.s y24 y23 y22 y^ i 
V.7 • ^ V3, L A； ；f；——TT^~,'j  
/ \ ,1、、 <1、、 'I、、 ‘\ 、、'--__:、. ’•、、 
/ V \ 、、、 'I � � � / '、v::-。-、<^-'---'、:��  �� 
/ , ,, \ ；、、二----、/:--”/、>—'''-、、、\、、、 、、、 、、、 
I < I ‘ ‘ \ I I - - - - V * / 、、 ,/ «• -々 、 、 、； 、、 ” 
V. ： —T-； / / 二-4'/'' 、、“ 、、 ------•為 
、、：、？—---'」—々 ->'!、、、！、、一、::-?、、、、； 
V “ C / ���� V �i ��� i 
v M — — 々 、、、、、丨着 � U - : � : : H ::S -、、、、、/ 
v . . _ _ _ ^ v . ‘ 甜 K — — W \ � � � � � � �K 今”3 
/ \ \ ^ / / n \ > 1 , 、、 、 • 》 、 • 、‘ • ‘ 
f V. \ 、：f / � \ �� � � � � i； � � � �L \ � � L /���:/ 
/ c \ V 一 入 r、、、、、\ ： 、、、、、r\��yLpji� 
^ _ ^ � � � � � ! \ � � � � �7 T … : - - • - ! -… - \ 
y > 广 、 、 - - - - - 广 
. 、、 I 、！ _ _ _••• 、 i \ I 、 山 - « / \ ‘ 、i 
v . , ^ 一 一 V32 V�…�J一-’K \ ��:��/"\：J：：：--^" 、、  ! � � � �\ \ \ ； 
\ \ I •“ % "^N • / 1/ \ 1/ \ » ‘ 
、 、 丨 r • � r �f 1 
V n V , 3 V i , V22 V 4 , V , 3 V44 V43 V , , 
Figure 5.22: Structure of H 7 
disjoint union of h Hq's iiW = 6h, 
disjoint union of (h - 1) Hq's and a i^ r iiW = 6h+l, 
disioint union of h HqS and a H2 if W = 6/z + 2, 
G{W)= 
disjoint union of h Hq's and a Hs if 二 + 3, 
disjoint union of h He's and a H^ if 二 6" + 4， 
disjoint union of h Hq's and a i/5 iiW = 6h-\-b. 
\ 
By the definition of i = 1,..., 7, it is easy to see that the number of edges 
of F{W) for VF > 1 is 
‘fW if = 
f W - l if W = l(mod 6), 
^ • W - ^ if = 2(mod 6), 
fW -1 if V^ = 3(mod 6), 
f W - l if V^ = 4(mod6)， 
f W - l if ly = 5(mod 6). 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 84 
Theorem 5 For W > 1, G{W) is a hyper-universal (4, VF)-design. 
Proof. 
If = 6/i + 1, then h>l and any (4,6h + l)-global routing GR can be 
decomposed into a union of {h — 1) (4,6)-global routings and a (4, 7)-global 
routing, (sometimes, GR can be decomposed into h (4, 6)-global routings and 
a (4, l)-design. But this does not always happen.) N o w it is easy to see that 
G{6h + 1) contains a detailed routing of GR as G{W) is a disjoint union of 
{h — 1) Hq's and an Hj, and Hq and Hj are hyper-universal (4,6)-design and 
(4, 7)-design, respectively by L e m m a 4. 
Let W = where i ^ 1. Since the densities of minimal 4-way global 
routings are 1, 2 or 3, any (4,6/i+i)-global routing GR can be decomposed into 
h (4，6)-global routings and a (4, z)-global routing for 2 < z < 5. Since G{6h-hi) 
is a disjoint union of h H^s and an Hi, and Hq and Hi are hyper-universal 
(4，6)-design and (4, i)-design, respectively by L e m m a 3, then G{6h-\-i) contains 
a detailed routing of GR. • 
Next the proof of the lemmas 3 and 4 are divided into subsections. For 
simplicity, each vertex of Hi is labeled by only the side label, where the corre-
sponding track belongs to. 
5.6.1 Hs is an optimum (4, 3)-design 
Note that H3 has 18 edges which is the lower bound. Therefore, it is sufficient 
to show that H3 is hyper-universal. It is suffice to show that H^ contains all 
possible (4, 3)-global routings obtained by combining the primitive minimal 
global routings in Table 5.3. All these primitive are depicted in Appendix E. 
Notice that the permutation a =(1,4)(2,3) is an automorphism of 丑3, thus 
it is only necessary to detailed-route GR in H3 where GR is a union of those 
global routings in Table 5.3 which are not isomorphic to each other under a. 
These global routings are shown in Table 5.4. 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 85 
Gi^i 二 {{1，2,3，4}} 
碑 1 二 {{1，2},{3,4}} 
Gi^l2 = {{l,3}，{2,4}} 
0竭,3 = {{1，4},{2,3}} 
叫 1 二 {{1},{2，3，4}} 
碑 , 2 二 {{2}，{1,3,4}} 
Gi??i = {{l，2，3},{l,2,4},{3，4}} 
啤 2 二 {{1，2，3}, {2，3，4}, {1，4}} 
爛 : 3 = {{1,2,4},{2，3’4},{1,3}} 
Gi??,6 = {{l,2,4}，{l,3,4},{2,3}} 
綱 ’ 1 = {{1，2,3}，{1，4}，{2}，{3,4}} 
G均,2 = {{1,2,4}, {1，3}, {2}，{3,4}} 
綱 , 3 = {{2,3,4}, {1，4}，{2}，{1，3}} 
綱 , 7 二 {{2，4,3}, {1，2}, {4}, {1,3}} 
Gi?l8 = {{l,4,3},{l,2}’{4},{2,3}} 
G瑪,9 = {{1,2，4}，{1,3}，{4}, {2，3}} 
Gi^•，1 二 {{1,2}, {3,1}，{2, 3}，{4}, {4}} 
二 {{1,2}, {4，1}, {2，4}，{3}, {3}} 
GRI = {{1，2，3}，{1,2,4}, {3,4，1}，{2,3，4}} 
GRl 1 = {{1, 2，3}, {1,4}，{2,4}, {3,4}, {1, 2,3}} 
GRl, = {{3，4，1}, {2,1}, {2, 3}，{2,4}, {3,4,1}} 
Table 5.4: Primitive minimal 4-way global routings which are not a isomorphic 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 86 
1 4 。 。 .1 . 4 , ， . 1 A . ,3 
2i I ^ o 3 2i I • • I • 
3 4 1 2 3 4 1 2 3 4 ^ 1 2 
GR^ i GI^ 'i GP2.3 
Figure 5.23: Detailed Routing of GR (Case 1) 
Let GR be a (4,3)-global routing which is a union of global routings from 
Table 5.4. 
Case 1. GR G 
A detailed routing of GR is given by Figure 5.23. 
Case 2. GR consists of only density 1 global routings. 
Subcase 2.1. GR does not contain 
In this case, the graph H3 can be partitioned into three subgraphs Gi, G2 
and Gi consists of the lower level, G2 consists of the left of the top and 
second levels and G3 consists of the right of top and second level. Note that 
each subgraph can detailed-route any of {GR\, GR\ 2, GR\ i, GR\ 2}. 
Therefore, it is sufficient to detailed-route GR in H3. 
Subcase 2.2. GR = 3Gi?“s. 
A detailed routing is given in Figure 5.24(a). 
Subcase 2.3. GR contains only one GRl ^ . 
If GR contains U GR\ 2, then it is possible to detailed-route GRl -^ U 
GR\,2 as shown in Figure 5.24(b)，and if GR contains GR]^^ U GRI 2, and 
the detailed-route G R ^ ^  U G R I 2 are showed in Figure 5.24(c). Note that the 
unused part in Hs can detailed-route any of {GRl,GRl i, G R \ 2, G R \ 2}-
This proves that GR is routable in H^. 
For those GR,s which do not contain GRI 2 and GRI 2, detailed routings 
in H3 is shown in Figure 5.24(d),(e),(f). 
Subcase 2.4. GR contains two Gi^ “，s. 
If detailed-route of 2Gi?2,3 is performed as shown in Figure 5.24(g), then 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 87 
p ^ l ~ p ^ l 
——y__o3 : . ‘ [ l 1 4 _ , , 3 2 , , — — o 3 
3 4 1 2 3 4 1 2 3 4 1 2 
(a) (b) (c) 
" ” " “ ^ f ^ ' 
_ _ [ i _ U — 0 3 2,,_[i~L~•丨3 2 , . _ I _ o 3 
3 4 1 2 3 4 1 2 3 4 1 2 
G R ! + G R 2 3 + G R I 1 G R \ + G R 2 3 + G R I , 1 G R �3 + G R ^ ’ 1 + G R 2 . 1 
(d) (e) ffl 
2 、i I4 : [1 [4 … 2 0 — — y - — — " 3 
rTV^' 
3 4 1 2 3 4 1 2 1 1 
2 G R 5 3 2 G R i 3 + G R i 2 2 G R , . 3 G R 3 . 2 
(g) ’ (h) (i) 
Figure 5.24: Detailed Routing for Table 3.2 
the unused top level can detailed-route any of {GR\, GR\ i, GR\ i}. For GR = 
+ GR\,2 or GR 二 2GRl^ + a detailed routing is given by Fig-
ure 5.24(h),(i). 
Case 3. GR contains a minimal global routing of density 2. 
Figure 5.25 shows all detailed routings of density 2 global routings in Fa-
Note that the lower level of H2 is not used in all the routings in Figure 5.25 
and the lower level can be used to detailed-route GR\, GR\ i, GRl GRl i and 
GRl 2- Therefore, detailed-route GR can be done in H3 if GR is a union of a 
density 2 global routing and a density 1 global routing from {GR\, GR\ i, GRl 2, GR\ i, G均 2}. 
A detailed routing of GR in H3 is given in Figure 5.26 when GR is a union 
of density 2 global routing and 
This completes the verification that it is detailed routable in H3 for all 
(4,3)-global routings, and hence H3 is hyper-universal. 
Note that only when GR G {2G均,3 + 瑪 ， 2 , “ + GRl^. GRl^}, it is 
needed for the top level edge {1,4} in the detailed routing. This information 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 88 
2,. I L > 3 21, I I 113 I I -.3 2,. l ^ >3 
k s ^ 
3 4 2 1 2 3 4 2 1 2 3 4 之 1 2 3 4 ^ 1 2 
GRi’i GRl, 2 GRi’3 GRi,6 
M y II- [l 1 4「 "‘ _ y .>3 2 , 1 ‘ 4 ,,3 
3 4 1 2 3 4 2 1 2 3 4 2 1 2 3 4 2 1 2 
GR2^  GR2’2 GR2,3 GR2’7 
I ‘‘"‘•上4 o3 2 丨 I ^ I I' n3 2 .__1 •丨3 
k t W 
3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 
GRJs GR2,9 GR3,1 GR3’2 
Figure 5.25: Detailed Routing of density 2 Global Routing in Hs 
will be used later. 
5.6.2 H4 is an optimum (4,4)-design 
The first step is to show that H4 is hyper-universal. Let GR be any (4,4)-
global routing which is a union of global routings from Table 5.3. If GR is 
a union of a minimal (4, 3)-global routing and a (4, l)-global routing from 
Table 5.3, the five minimal (4, 3)-global routings can be detailed-routed as in 
Figure 5.27(a). Note that the unused part in H^ is a cycle 1,2,4,3 which 
can be used to detailed-route all the (4, l)-global routings except from 
Table 5.3. If GR contains a a detailed routing of GR in H^ is given in 
Figure 5.27(b). 
N o w assume that GR is a union of two (4, 2)-global routings, then GR is 
routable in H4 as H4 contains two disjoint H2S. 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 89 
2, ) ,,3 1 4 2丨 I J , |3 2 > , , 4 (»3 
k ^ ^ 
3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 
GRi, 1 + GRI 3 GRI 2 + GRI 3 GRI 3 + GR^ 3 GR?, 5 + G^i 3 
2i 丨 I 14 1,3 ： , [1 I4 “”‘‘__I n3 2,,_1 … 
r ? ^ KTV i^ 
3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 
GR2,, + GRI 3 GR2,2 + GRI 3 GR2' 3 + GRI 3 GR2,7 + GR2, 3 
f^^ yr^^ 
r ^ T V ^ K W N k v W 
3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 
GR2, 8 + GR2, 3 GR2, 9 + GR2,3 GR3,1 + GR2, 3 GR3,2 + GR2! 3 
Figure 5.26: Detailed Routing of GR in H^ 
/ T Z l x / T Z ^ K / T z I n ^^Z^f^ 
2 3 2 3 2 I 2. 2 J I 3 2 t：：/ L：：/ t：/ tlZl t：/ 
/T—：\ % —： ^ /T^—：^ Z 3 . 
3 2 3 2 I 3 2 1 ^ 2 J 1 3 2 , L t t ： ^ 
GRi GR2. 1 GR2,2 GR2,3 GR2.4 
(a) 
/ p - K / T Z ^ r z ^ / T l l Z x / T T — ^ K 
丄 3 2 , 3 2 , . 3 2 3 2 I 3 2 
L ： / t ： / t l ： / U l： ^ / t^l 
%—/T—：：\ /T ：^ %—：^ 
3 2 1 3 2 1 3 2 1 3 2 a 2 
L J / t d / L I / t z l / 
GR2,3+GFLi 2 GRU+GR2.3 GI43+G4’4 
(b) 
Figure 5.27: Detailed Routing of GR in EU 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 90 
This proves H ^ is hyper-universal. The fact that H ^ is an optimum design 
follows from the following result which also indicates that the lower bound ^ W 
given in L e m m a 2 is not achievable in general. 
Theorem 6 There is no 3-regular hyper-universal (4,4)-design. A n optimum 
(4,4)-design must have at least 25 edges. 
Proof: Suppose that there is a 3-regular hyper-universal (4,4)-design G. Then 
is a 4-partite graph on (^1,^2,^3,^4)； there is a 4-matching between each 
pair of Fi, Vj for i + j and the induced subgraph of G on each set Vi U Vj U Vm 
� i . ] + m) is a cycle (see Lemma2). Based on these facts, all such 3-regular 
graphs can be constructed. 
It is easy to observe (by relabeling if necessary) that any graph described 
in the above paragraph contains the graph Gi as shown in the Figure 5.28(a). 
Begin from Gi, selection of a matching between Vi and Vs is done so that the 
induced subgraph on Vi U V2 U is a cycle. There are 6 different choices. 
For each of these choices, it is necessary to select a matching between Vi and 
V4 to obtain G so that the induced subgraphs of G on Vi U F2 U V^ and on 
Vi u V3 U V4 are Hamiltonian cycles. Total 23 such graphs are constructed. 
For each of these graph, a (4,4)-global routing is found and it is not detailed 
routable in the graph (see Figure 5.28). As a result, it showed that there is no 
3-regular hyper-universal (4,4)-design. 
5.6.3 Hi is a hyper-universal (4，i)-design for i = 5,6,7 
Note that any (4，5)-global routing is a union of a (4，3)-global routing and a 
(4, 2)-global routing, and H^ contains an H3 and a disjoint H2, therefore, H^ 
is hyper-universal. Also note that H^ has 30 edges which is the lower bound 
of an hyper-universal (4, 5)-design. Therefore,丑5 is an optimum design. 
H7 is a hyper-universal (4, 7)-design is similarly proved. 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 91 
1 2 3 4 
穩 
1 2 3 4 (a) 
_ _ _ _ • •‘ 1 2 3 4 1 2 3 4 1 3 
3GR2 3 +GR2 1 2G< i +2GR2.3 2GR2.1 +2GR2’ 3 
_ _ _ 
1 2 3 4 ^ 1 2 3 1 2 3 4 
2GR1 i+2GR2’3 2GR2 1+2GR2.3 2GR2 2 + 2GR!’3 
'• A •  1 2 3 4 1 2 3 4 1 f 3 4 
2GR2 2 +2GR1 3 3GRU 1 2 +GR2.1 
_ _ _ _ _ 
'• 1 2 3 4 : 1 2 3 4 1 2 3 4 1 2 3 4^  1 2 3 4 
i+2GI^ ’3 2GR2 2+2GR2.3 2GI^ .I+2GI^ .3 I+2GI^ .3 _____ 
1•…2.•丄..：* 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 
2GR2 2+2GR2 3 i 3 2GR1. ! +2GR2.2 2GR1, 1 +2GR2,2 
_ _ _ _ 
1 2 3 4 1 2 3 4 1 2 3 4 
2GR2 1 +2GR“ 2G 边 1+2GR“ 2GR2’ i +2GR2,2 
_ _ _ _ 
……i-丄.丄-.4-……. 2GR“+2GR“ 2GI^,,+2GR2,2 2GRii+2GR“ 
Figure 5.28: (4,4)-Global Routing 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 92 
The first step to show Hq is a hyper-universal S-box is to prove Hq con-
taining a detailed routing for every (4,6)-global routing. Then, note that the 
permutation a = (1,4)(2,3) is an automorphism of Hq. Therefore, it is nec-
essary to check those global routings which are union of global routings from 
Table 5.4. Let GR be such a global routing. 
Case IGR 二 GRi-\-GR2, where both GRi and GRi are (4，3)-global routings. 
It is easy to see that H^ contains a detailed routing of GR as Hq contains 
two disjoint B^s. 
Case 2 GR = GRi + GR2 + GR3, where each GRi (i = 1, 2,3) is a primitive 
minimal global routings from Table 5.4. 
Let K(i, z + 1) be the subgraph of He which consists of the levels i and 
i + 1. There are three disjoint subgraphs i^(l，2)，i^(3,4) and i^(5,6) of 
Hq. In Figure 5.25, it shows that K{1,2) and i^(5,6) can detailed-route any 
minimal (4, 2)-global routings in Table 5.4. All detailed routing of minimal 
global routing in i^(3,4) are routable. Therefore, there is a detailed routing 
for GR in Hq, and hence He is hyper-universal. 
5.7 Experimental Results 
From the combinatorial analysis shown above, it seems a bit surprising to see 
that an optimum (hyper-universal) S-box can actually be built using only very 
few more switches beyond the widely believed lower bound of 6W. It is also 
interesting to see that there exist H U S B s with switch density of only 6W for 
some W's and the construction of optimum H U S B s seem to hardly possess 
regular scalability which can be observed in the construction of some 4-way 
U S B family. Besides the theoretical analysis, in order to get some experimental 
justification, the experiments adopted the currently known best F P G A router 
V P R [BR97], which is available on the Web. The logic block structure for the 
V P R runs is set to consist of one 4-input L U T and one flip-flop. The input or 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 93 
output pin of the logic block is able to connect to any track in the adjacent 
channels, Fc = W. Inside the switch box, each input wire segment can connect 
to three other output wire segments of other channels, Fs = 3. In appendix 
D , the H U S B s for z = 1,..., 9 are depicted. 
�-- � ���� 
！叫、 11 、爛、 
- 一一一 — T、 - -
—————i^..卞…、…卞 \\�\\�v•、一 If^—¥— 
--5 \ � 
‘、i«?t\、、 I ) 3:——::一--)E----J厂----)、、� �jttwj , 
、：、 、、、. )r-".) :-“••) —f—-) (—•) ：—•) <、——、--5 I 、 ： 、 \ � 1�.�ii �� 
Routing succeeded with a channel width 彳actor of 11. 
Figure 5.29: The structure of Disjoint S-Box, W = l l 
-inai - - - : ''、、、 .’、 、“ 、：、、、邮、：； 
叙 K 乂 ： 松 务 一 
勘 溶 職 、 f 目 
1 丽 
[imt�� T 、、： m 、 
、 、 、 、 ： 、 、 、 
、、、 、 ： ； t~~5:^ ; ~ ； ： ； ; ； ; ~ ~ ~ ； ！~i ；~|||、、 :��-��： 
“ RouUng succeeded with a channel width fador of 11. "‘ 
Figure 5.30: The structure of H，USB(4,11) (H'USB(4,7) and H，USB(4,4)) 
In order to have an even comparison (partially is also due to the limitation 
of V P R router limiting Fs to 3) with the well-known Disjoint structure, the 
experiments deliberately eliminate the "additional" switches of the H U S B s 
to make the H'USBs have density of 61^, which is the same as Disjoint S-
boxes. Figure 5.29- 5.36 show some S-box structures and routing results of the 
experiments. 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 94 
0 1 2 3 4 0 1 2 3 4 
0 1 2 3 4 0 1 2 3 4 
Figure 5.31: Detailed Routing by using Disjoint S-Box 
As shown in Figure 5.31 and Figure 5.32, there is more degree of freedom for 
H U S B switch-box if detailed routing is carried out from the lower left switch-
box to upper right one. It is because H U S B switch-box allows the router to 
choose track 1 or 2 instead of Disjoint type which only allows routing on track 
1. In Figure 5.33 and Figure 5.34, it shows Disjoint S-box is insufficient to 
route GR\ while H^ provides higher routability. 
In Table 5.5, it showed the comparing results of the number of tracks 
required to route some larger M C N C benchmark circuits [Yan91] by Disjoint 
and the H'USB FPGAs. Overall, the H'USB F P G A s use about 10% less tracks 
than the Disjoint FPGAs. (Beware that since the V P R is a simulated annealing 
based non-deterministic router, the results which produced for Disjoint F P G A s 
could be a bit different to their other reported results.) 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 95 
0 1 2 3 4 0 1 2 3 4 
Figure 5.32: Detailed Routing by using H'USB S-Box 
5.8 Conclusions 
In this chapter, the reduction design technique developed in [FLWOO] in 4-way 
S-box designs is applied. The new (4, W) S-box has at most 6.3W switches 
compared with the previous one which has about 6.6VF switches. Note that, 
according to the new design, a complete database for detailed routings in Hi 
for i 二 2,3,4,5,6 and 7 is built and hence have an efficient detailed routing 
algorithm to detailed-route any (4，VF)-global routing in the S-box G{W). 
One example is provided to show that 6W is not a lower bound of the 
number of switches in an optimum (4, W) S-box for some W while which 
is widely believed to be. This suggests that the newly design G{W) is very 
close to an optimum. Currently, research works on applying this combinatorial 
analysis models for other F P G A routing architecture designs are conducting. 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 96 
j v k k i j v k | \ i M \ l M 
一 厂 jx 、、、： \! \ [/ -二— =； 1 --y-.y -、,ic -v 、一 
一 = 二一 — • / - • -V - 一 _ _ r i - ^ -V — “ 7 广 ,V % r C ^ % K K ^ k \\N 
一 j. - K； r r � - �…夕一 
Unroutable Unroutable Unroutable 
Figure 5.33: Detailed Routing by using Disjoint S-Box on G R l 
|Va3 4 |Va2 5 「21 6 薩 
V„ 12 11 V,, 10 
Figure 5.34: Detailed Routing by using Hyper-Universal S-Box on G R l 
一 Disjoint H,USB 
aiui 12 To 
apex2 12 11 
apex4 15 13 
bigkey 8 7 
des 9 8 
diffeq 9 8 
dsip 7 7 
elliptic n n 
ex5p 15 13 
misex3 13 12 
seq 12 12 
spla 16 14 
tseng 8 7 
e64 9 8 
一 Total 156 141 (-9.62%) 
Table 5.5: Channel widths required for different benchmark circuits Fc = 















































 ： 譯 
F 
冒腳隨疆膽隱臓
j ^  




 纖 諭 編 鐘 瞧 ^



















 ^ 「i 
Chapter 5 Optimum FPGA Switch-Box Designs - HUSB 98 
iTi 5 S m F m r n ffi m ffim^m m i m m 
Routing succeeded with a channel width factor of 8. 
Figure 5.36: Routing result of e64 by using H'USB S-Box, W = 8 
Chapter 6 
Conclusions 
6.1 Thesis Summary 
In this thesis, the physical design flow was first discussed, followed by intro-
ducing the newly proposed alternative wiring tool Graph-Based Alternative 
Wiring (GBAW). Logic optimization and the circuit analysis on performance 
by using G B A W were examined. In Chapter 3, an intensive comparison with 
famous rewiring tool R A M B O was conducted. Experimental results showed 
that G B A W is able to identify alternative wires efficiently and obtained a very 
good logic optimization solution. 
In Chapter 4, the GBAW-Partitioner (GP) was proposed with graph do-
main and logic domain. In the login domain, the logic gate information was 
used by the rewiring technique G B A W in order to alter the circuit so as to 
escape from local minima in graph domain. In the graph domain, the F M algo-
rithm can be replaced by any other partitioner. Experimental results showed 
that the near optimal partitioning result by state-of-the-art Multi-level parti-
tioner hMETIS-Kway can be further improved. Rewiring techniques can be 
applied in many different C A D and VLSI problems such as designs with timing 
constraints and layout-driven constraints. 
In Chapter 5, the architectures of different switch-box architectures and 
routing algorithms were presented. The routability of Hyper-Universal Switch 
99 
Chapter 6 Conclusions ICQ 
Box (HUSB) was studied and was proved its optimality. Extensive experi-
mental results were done by using the Disjoint S-Box and H'USB on the V P R 
F P G A router. 
6.2 Future work 
6.2.1 Alternative Wiring 
The alternative wires are usually closed to the wire to be removed. More pat-
terns should be located and put into G B A W in order to enhance the searching 
power. However, the performance of G B A W should be put into consideration. 
A good data structure and organization of pattern members should be built 
in order to reduce runtime of G B A W . As a result, careful experiments should 
be conducted and ensure the integrity and performance of G B A W . 
6.2.2 Partitioning Quality 
Partitioning is always the first step in physical design and most partitioning 
algorithms can be transformed or mapped with multi-level approaches. The 
bottleneck of the best partitioning result is the lack of using logic domain 
information. Thus, it is possible to integrate rewiring technique with other 
partitioning algorithms. Improved results are expected. 
6.2.3 Routing Devices Studies 
Higher routability means that the same circuit design can be put into the 
same F P G A with less physical wire segments. It also helps to reduce the heat 
dissipation and total path delay. Nowadays, XC4000 architecture F P G A are 
going to be replaced by newly released Virtex, Virtex-E and Virtex-II FPGAs. 
The next research step in F P G A routing should be the investigation on the 
methods which improve the routing of new FPGAs. 
Bibliography 
AHK97a] Charles J. Alpert, Jen-Hsin Huang, and Andrew B. Kahng. The 
ISPD98 circuit benchmark suite. In Proceedings International 
Symposium on Physical Design, pages 80-85, 1997. 
AHK97b] Charles J. Alpert, Jen-Hsin Huang, and Andrew B. Kahng. 
Multilevel circuit partitioning. In Proceedings 34th ACM/IEEE 
Design Automation Conference, 1997. 
AK95a] Charles J. Alpert and Andrew B. Kahng. Multiway par-
titioning via geometric embeddings, orderings, and dynamic 
programming. IEEE Transactions on computer-Aided Design, 
14(11):1342—1358, Nov 1995. 
AK95b] Charles J. Alpert and Andrew B. Kahng. Recent directions in 
netlist partitioning. Integration, the VLSI Journal, 19(1-2):1-
81, 1995. 
alt90] The Maximalist Handbook. Altera Corp., 1990. 
AR95] M.J. Alexander and Gabriel Robins. New performance F P G A 
routing algorithms. Proceedings of Design Automation Confer-
ence, pages 562-567, 1995. 
101 
AY95] Charles J. Alpert and S. Z. Yao. Spectral partitioning: The 
more eigenvectors, the better. In Proceedings 32nd ACM/IEEE 
Design Automation Conference, pages 195-200, 1995. 
Bar85] E. R. Barnes. An algorithm for partitioning the nodes of 
a graph. SI AM Journal on Algorithm and Discrete Method, 
3:541-550, Dec 1985. 
BB91] K. P. Belkhale and P. Banerjee. Parallel algorithms for VLSI 
circuit extraction. IEEE Transactions on Computer-Aided De-
sign of Integrated Circuits and Systems, 10(5):604—618, M a y 
1991. 
BBH+88] K. A. Bartlett, R. K. Brayton, G. D. Hachtel, R. M . Jacoby, 
C. R. Morrison, R. L. Rudell, A. Sangiovanni-Vincentelli, and 
A. R. Wang. Multilevel logic minimization using implicit don't 
cares. IEEE Transactions on Computer-Aided Design of Inte-
grated Circuits and Systems, 7(6):723-740, June 1988. 
BFRV92] S. Brown, R.J. Francise, J. Rose, and Z.G. Vranesic. Field-
Programmable Gate Arrays. Kluwer-Academic Publisher, 
Boston M A , 1992. 
BHSV90] R. K. Brayton, G. D. Hachtel, and A. Sangiovanni-Vincentelli. 
Multilevel logic synthesis. Proceedings of the IEEE, 78(2):264-
300, Feb 1990. 
BLSV92] M . Beardslee, B. Lin, and A. Sangiovanni-Vincent elli. Commu-
nication based logic partitioning. In EURO-DAC ,92. European, 
pages 32-37, 1992. 
.BM76] J.A. Bondy and U.S.R. Murty. Graph Theory with Applications. 
London: Macmillan Press, 1976. 
102 
BR97] Vaughn Betz and Jonathan Rose. A new packing, place-
ment and routing tool for F P G A research. In Sev-
enth International Workshop on Field-Programmable 
Logic and Applications (Available for download from 
http://www. eecg.toronto. edu/^jayar/s oftware Mml), pages 
213-222, 1997. 
I 
B R S V W 8 7 ] R. K. Brayton, R. L. Rudell, A. Sangiovanni-Vincentelli, and 
A. R. Wang. MIS: A multiple-level logic optimization system. 
IEEE Transactions on Computer-Aided Design of Integrated 
Circuits and Systems, 6(6):1062—1081, Nov 1987. 
BRV92] S. Brown, J. Rose, and Z. G. Vranesic. A detailed router 
for field-programmable gate arrays. IEEE Transactions on 
Computer-Aided Design of Integrated Circuits and Systems, 
11:620-628, 1992. 
BrySe] R. E. Bryant. Graph-based algorithms for boolean function 
manipulation. IEEE Transactions on Computer, pages 677-
691, Aug 1986. 
BS93] S. T. Barnard and H. D. Simon. A fast multilevel implementa-
tion of recursive spectral bisection for partitioning unstructured 
problems. In Proceedings 6th SIAM Conf. Parallel Processing 
for Scientific Computing, pages 771-718, 1993. 
BT91] C. L. Berman and L. H. Trevillyan. Global flow optimization in 
automatic logic design. IEEE Transactions on Computer-Aided 
Design of Integrated Circuits and Systems, 10(5):557-564, M a y 
1991. 
103 
CCMS93] D. 1. Cheng, S. C. Chang, and M . Marek-Sadowska. Pariti-
tioning combinational circuits in graph and logic domains. In 
Proceedings SASIMI-93, pages 404-412, 1993. 
:CCWM97] Shih-Chieh Chang, Kwang-Ting Cheng, Nam-Sung Woo，and 
Marek-Sadowska M . Postlayout logic restructuring using alter-
native wires. IEEE Transactions on Computer-Aided Design of 
Integrated Circuits and Systems, 16(6):587—596, June 1997. 
C C W M S 9 4 a ] S. C. Chang, K. T. Cheng, N. S. Woo, and M . Marek-Sadowska. 
Layout driven logic synthesis for F P G A . In Proceedings 31th 
ACM/IEEE Design Automation Conference, pages 308—313, 
June 1994. 
C C W M S 9 4 b ] Shih Chieh Chang, Kwang Ting Cheng, N a m Sung W o o , and 
M . Marek-Sadowska. Layout driven logic synthesis for F P G A s . 
In Proceedings 30th Design Automation Conference, pages 308— 
313，June 1994. 
CE93] Kwang Ting Cheng and Luis A. Entrena. Multi-level logic op-
timization by redundancy addition and removal. In Proceedings 
EDAC-93, pages 373-377, Feb 1993. 
CGMS96] S. C. Chang, L. P. V. Ginneken, and M . Marek-Sadowska. Fast 
boolean optimization by rewiring. In Proceedings of the Interna-
tional Conference on Computer-Aided Design, pages 262-269, 
1996. 
CKMOO] Andrew E. Caldwell, Andrew B. Kahng, and Igor L. Markov. 
Improved algorithms for hypergraph bipartitioning. In Proceed-
ings Asia South Pacific Design Automation Conference 2000, 
2000. 
104 
CLOOa] Jason Cong and Sung Kyu Lim. Edge separability based circuit 
clustering with application to circuit partitioning. In Proceed-
ings Asia South Pacific Design Automation Conference 2000, 
2000. 
CLOOb] Jason Cong and Sung Kyu Lim. Performance driven multiway 
partitioning. In Pwcccdiugs Asia South Pacific Design Automa-
tion Conference 2000, 2000. 
CLMS95] D. I. Cheng, C. C. Lin, and M . Marek-Sadowska. Circuit parti-
tioning with logic perturbation. In Proceedings of the Interna-
tional Conference on Computer-Aided Design, pages 650-655, 
1995. 
CLWL95] Ching-Dong Chen, Yuh-Sheng Lee, A.C.-H. W u , and Youn-
Long Lin. TRACER-fpga: a router for RAM-based FPGA's. 
IEEE Transaction on Computer-Aided Design of Integrated 
Circuits and Systems, 14(3):371—374, 1995. 
CMS94] S. C. Chang and M . Marek-Sadowska. Perturb and simplify: 
Multi-level boolean network optimizer. In Proceedings of the 
International Conference on Computer-Aided Design, pages 2-
4, Nov 1994. 
CMSC96] S. C. Chang, M . Marek-Sadowska, and K. T. Cheng. Per-
turb and simplify: Multilevel boolean network optimizer. IEEE 
Transactions on Computer-Aided Design of Integrated Circuits 
and Systems, 15(12):1494—1504, Dec 1996. 
•CSZ94] P. K. Chan, Martine D. F. Schlag, and J. Y. Zien. Spectral k-
way ratio-cut partitioning and clustering. IEEE Transactions 
on Computer-Aided Design of Integrated Circuits and Systems, 
13(9):1088—1096，1994. 
105 
;CvGLMS99] Shih-Chieh Chang, van Ginneken, L.P.RR, and M . Marek-
Sadowska. Circuit optimization by rewiring. IEEE Transac-
tions on Computer, 48(9):962-969, September 1999. 
C W W 9 6 ] Y. W . Chang, D. F. Wong, and C. K. Wong. Universal switch 
modules for fpga design. ACM Transactions on Design Au-
tomation of Electronic Systems (TODAES), 1(1):80—101, Jan 
1996. 
DD96] S. Dutt and W . Deng. VLSI circuit partitioning by cluster-
removal using iterative improvement techniques. In Proceed-
ings of the International Conference on Computer-Aided De-
sign, pages 194-200, 1996. 
ea88] H. Hsieh et al. A 9000-gate user-programmable gate array. In 
Proceedings of CICC, pages 15.3.1—15.3.7，1988. 
ea89] A. El Gamal et al. A n architecture for electrially config-
urable gate arrays. IEEE Transactions on Solid-State Circuits, 
24(2) :394-398, 1989. 
EC93] Luis A. Entrena and Kwang Ting Cheng. Sequential logic op-
timization by redundancy addition and removal. In Proceed-
ings of the International Conference on Computer-Aided De-
sign, pages 310-315, Nov 1993. 
EC95] Luis A. Entrena and Kwang Ting Cheng. Combinational and 
sequential logic optimization by redundancy addition and re-
moval. IEEE Transactions on Computer-Aided Design of Inte-
grated Circuits and Systems, 14(7):909-916, July 1995. 
106 
EEOU96] L.A. Entrena, J.A. Espejo, E. Olias, and J. Uceda. Tim-
ing optimization by an improved redundancy addition and re-
moval technique. In Design Automation Conference, 1996, with 
EURO-VHDL ,96 and Exhibition, Proceedings EURO-DAC，96, 
pages 342-347, 1996. 
EHS97] Morgan Enos, Scott Hauck, and Majid Sarrafzadeh. Replica-
tion for logic bipartitioning. In Proceedings of the International 
Conference on Computer-Aided Design, 1997. 
EMHB95] C. Ebeling, L. McMurchie, S. A. Hauck, and S. Burns. Place-
ment and routing tools for the triptych F P G A . IEEE Transac-
tions on Very Large Scale Integration, pages 473-482, 1995. 
FHL] H.B. Fan, P. Haxell, and J. Liu. The global routng - a combi-
natorial design problem (submitted). 
FLWOO] H. Fan, J. Liu, and Y. L. W u . General models for optimum 
arbitrary-dimension F P G A switch box designs. In Proceed-
ings of the International Conference on Computer-Aided De-
sign, pages 93-98, 2000. 
'FM82] C. M . Fiduccia and R. M . Mattheyses. A linear time heuris-
tic for improving network partitions. In Proceedings 19th 
A CM/IEEE Design Automation Conference, pages 175-181, 
1982. 
Hal70] K. M . Hall. An r-dimensional quadratic placement algorithm. 
Management Science, 17:219-229, 1970. 
HK91] L. Hagen and A. B. Kahng. Fast spectral methods for ratio cut 
partitioning and clustering. In Proceedings of the International 
Conference on Computer-Aided Design, pages 10-13, 1991. 
107 
HK92] L. Hagen and A. B. Kahng. New spectral methods for ratio cut 
partitioning and clustering. IEEE Transactions on Computer-
Aided Design, 11(9):1074-1085, Sept 1992. 
HL93a] B. Hendrickson and R. Leland. The chaco user's guide. Sandia 
Nat. Labs.，Albuquerque, NM，Technical Report SAND93-2339, 
1993. 
HL93b] B. Hendrickson and R. Leland. A multilevel algorithm for parti-
tioning graphs. Sandia Nat Labs., Albuquerque, NM, Technical 
Report SAND93-1301, 1993. 
HS96] Gary D. Hachtel and Fabio Somenzi. Logic Synthesis and Ver-
ification Algorithms. Khiwer-Academic Publisher, Boston M A , 
1996. 
KAKS97] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multi-
level hypergraph partitioning: Application in VLSI domain. In 
Proceedings 34th ACM/IEEE Design Automation Conference, 
pages 526-529, 1997. 
KBZ94] R. Kuznar, F. Brglez, and B. Zajc. Multi-way netlist partition-
ing into heterogeneous F P G A s and minimization of total device 
cost and interconnect. In Prvcccdiugs 31th ACM/IEEE Design 
Automation Conference, pages 238-243, 1994. 
Keu87] K. Keutzer. D A G O N : Technology binding and local optimiza-
tion by dag matching. In Proceedings 24th Design Automation 
Conference, pages 341-347, 1987. 
KK95a] G. Karypis and V. Kumar. Analysis of multilevel graph par-
titioning. Technical Report 95-037, Department of Computer 
Science, University of Minnesota, 1995. 
108 
KK95b] G. Karypis and V. Kumar. A fast and high quality multi-
level scheme for partitioning irregular graphs. Technical Report 
95-035，Department of Computer Science, University of Min-
nesota, 1995. 
KK95c] G. Karypis and V. Kumar. Multilevel graph partitioning 
schemes. In Proceedings of the 1995 International Symposium 
on Physical Design, pages 113-122, 1995. 
•KK95d] G. Karypis and V. Kumar. Multilevel k-way partitioning 
scheme for irregular graphs. Technical Report 95-064, Depart-
ment of Computer Science, University of Minnesota, 1995. 
KK96] G. Karypis and V. Kumar. Parallel multilevel graph partition-
ing. In Proceedings of International Parallel Processing Sympo-
sium^ 1996. 
KK99] G. Karypis and V. Kumar. Multilevel k-way hypergraph par-
titioning. In Proceedings 36th ACM/IEEE Design Automation 
Conference, 1999. 
KL70] B. W . Kernighan and S. Lin. A n efficient heuristic procedure 
for partitioning graphs. Bell System Technical Journal, 49:291-
307, 1970. 
KN91] C. Kring and A. R. Newton. A cell-replicating approach to 
mincut-based circuit partitioning. In Proceedings of the In-
ternational Conference on Computer-Aided Design, pages 2-5, 
1991. 
109 
KP94] W . Kunz and D. K. Pradham. Recursive learning: A n attrac-
tive alternative to the decision tree for test generation for dig-
ital circuit. IEEE Transactions on Computer-Aided Design of 
Integrated Circuits and Systems, 13(9):1143-1158, Sept 1994. 
•Kri84] B. Krishnamurthy. An improved min-cut algorithm for par-
titioning VLSI networks. IEEE Transactions on Computers, 
33(5) :438-446, 1984. 
KSM97] W . Kunz, D. StofFel, and P. R. Menon. Logic optimization 
and equivalence checking by implication analysis. IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and 
Systems, 16(3):266-281, March 1997. 
LB93] G. Lemieux and S. Brown. A detailed routing algorithm for 
allocating wire segments in field-programmable gate arrays. In 
Proceedings ACM/SIGDA Physical Design Workshop, Lake Ar-
rowhead, CA, pages 215-226, 1993. 
Lee61] C. Y. Lee. An algorithm for path connections and its applica-
tions. IRE Transactions on Electronic Computer, EC10:346-
365，1961. 
LW95] Y.S. Lee and Allen C.H. W u . A performance and routability 
driven router for F P G A s considering path delays. Proceedings 
32th ACM/IEEE Design Automation Conference, pages 557-
561, 1995. 
LWBOO] Wangning Long, Yu-Liang W u , and Jinian Bian. IBAW: A n 
implication-tree based alternative-wiring logic transformation 
algorithm. In Proceedings Asia South Pacific Design Automa-
tion Conference 2000, 2000. 
no 
•Mic94] Giovanni De Micheli. Synthesis and Optimization of Digital 
Circuits. McGraw-Hill, New York, 1994. 
MKLC89] S. Muroga, Y. Kambayashi, H. C. Lai, and J. N. Culliney. The 
transduction method-design of logic networks based on permis-
sible functions. IEEE Transactions on Computer, 38(10):1404-
1424, Oct 1989. 
M W 9 6 ] Wai-Kei M a k and D. F. Wong. Minimum replication min-cut 
partitioning. In Proceedings of the International Conference on 
Computer-Aided Design, 1996. 
PS79] B. N. Parlett and D. S. Scott. The lanczos algorithm with 
selective orthogonalization. Mathematics and Computations, 
33(ll):217-238, 1979. 
P W W Y 9 2 ] J.F Pan, Y.L. W u , C.K. Wong, and G. Yan. O n the optimal 
four-way switch box routing structures of F P G A greedy routing 
architectures. Integration, the VLSI Journal, 25:137—159, 1992. 
RB91] J. Rose and S. Brown. Flexibility of interconnection struc-
tures for field-programmable gate arrays. IEEE Transactions 
on Solid-State Circuits, 26(3):277-262, 1991. 
Ros90] J. S. Rose. Parallel global routing for standard cells. IRE 
Transactions on Computer-Aided Design, pages 1085-1095, Oct 
1990. 
RSV85] J. S. Rose, W . M . Snelgrove, and Z. G. Vranesic. A L T O R : An 
automatic standard cell layout program. In Canadian Confer-
ence on Very Large Scale Integration, pages 169—173, 1985. 
San89] L. A. Sanchis. Multi-way network partitioning. IEEE Transac-
tions on Computers, 38(1):62-81, 1989. 
Ill 
•She98] Naveed Sherwani. Algorithms for VLSI Physical Design Au-
tomation. Kluwer-Academic Publisher, Boston M A , 3 edition, 
1998. 
;SSLea92] E. M . Sentovich, K. J. Singh, L. Lavagno, and et. al. SIS: A 
system for sequential circuit synthesis. ERL Memorandum No. 
UCB/ERL, M92/41, 1992. 
SW96] M . Sarrafzadeh and C. K. Wong. An Introduction to VLSI 
Physical Design. M c G R A W - H I L L International Editions, Elec-
trical Engineering Series, 1 edition, 1996. 
WA98] Sverre Wichlund and Einar J. Aas. O n multilevel circuit par-
titioning. In Proceedings of the International Conference on 
Computer-Aided Design, pages 505-511, 1998. 
WC91] Y. C. Wei and C. K. Cheng. Ratio cut partitioning for hierar-
chical designs. IEEE Transactions on Computer-Aided Design 
of Integrated Circuits and Systems, 10(7):911—921，July 1991. 
WC94] Y丄. W u and D. Chang. O n NP-Completeness of 2-d F P G A 
routing architectures and a novel solution. In Proceedings of In-
ternational Conference on Computer-Aided-Design, pages 362-
366, 1994. 
WF99] Y. L. W u and Hongbing Fan. O n local configuration analysis 
of alternative wires in boolean networks. ITC-CSCC'99, pages 
868-871, 1999. 
WLFOO] Yu-Liang W u , Wangning Long, and Hongbing Fan. A fast 
graph-based alternative wiring scheme for boolean networks. 
In Proceedings IEEE/ACM International VLSI Design, 2000, 
2000. 
112 
W M S 9 7 ] Y.L. W u and M . Marek-Sadowska. Routing for array type FP-
GAs. IEEE Transactions on Computer-Aided Design of Inte-
grated Circuits and Systems, 16(5):506-518, 1997. 
W T M S 9 4 ] Y丄. W u , S. Tsukiyama, and M . Marek-Sadowska. O n com-
putational complexity of a detailed routing problem in two-
dimensional F P G A s . In Proceedings 4th Great Lakes Sympo-
sium VLSI, March 1994. 
W T M S 9 6 ] Y丄.Wu, S. Tsukiyama, and M . Marek-Sadowska. Graph based 
analysis of 2-d F P G A routing. IEEE Transaction on Computer-
Aided Design of Integrated Circuits and Systems, 15(l):33-44, 
1996. 
W Y C O O ] Y. L. W u , X. L. Yuan, and D. I. Cheng. Circuit partitioning 
with coupled logic restructuring techniques. In Proceedings Asia 
South Pacific Design Automation Conference 2000, 2000. 
YanOl] S. Yang. Logic synthesis and optimization benchmarks version 
3.0. Technical Report, Microelectronics Centre of North Car-
olina^ 1991. 
YCL92] C. W . Yeh, C. K. Cheng, and T. T. Y. Lin. A probabilistic 
multicommodity-flow solution to circuit clustering problems. In 
Proceedings of the International Conference on Computer-Aided 
Design, pages 428-431, 1992. 
Y W 9 8 ] Hannah Honghua Yang and D. F. Wong. Optimal min-area 
min-cut replication in partitioned circuits. IEEE Transactions 
on Computer-Aided Design of Integrated Circuits and Systems, 
17(11):1175—1183, November 1998. 
113 
:ZCS99] J. Y. Zien, Pak K. Chan, and Martine D. F. Schlag. Multilevel 
spectral hypergraph partitioning with arbitrary vertex sizes. 
IEEE Transactions on Computer-Aided Design, 18(9):1389-
1399, Sept 1999. 
:ZSC96] J. Y. Zien, Martine D. F. Schlag, and P. K. Chan. Multi-level 
spectral hypergraph partitioning with arbitrary vertex sizes. In 




5xpl - Berkeley Logic 
Interchange Format (BLIF) 
.model top 
• inputs i_5_ i_6_ i_3_ i-4_ i_l_ i_2_ i_0_ 
• outputs o 丄 0-2- o_0_ o_9_ o_7_ o_8_ o_5_ o_6_ o-3_ o_4_ 
• default Jnput-arrival 0.00 0.00 
.default-output-required 0.00 0.00 
• default Jnput-drive 0.30 0.30 
.default-output Joad 1.00 
.default-maxJnputJoad 999.00 
.names i_6_ [565] 
0 1 
• names i-0_ [570] 
0 1 
• names i_2_ [571] 
0 1 
• names [565] [570] [571] i_5_ [1079] 
0 — 1 
- 0 — 1 
- 0 - 1 
— 0 1 
.names i-3_ o_8_ 
0 1 
.names [565] [570] o_8- i_5_ [1044] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names i_l_ [576] 
0 1 
• names [570] [576] [571] [578] 
0 - 1 
- 0 - 1 
- 0 1 
• names i_4_ [563] 
0 1 
.names [563] i_5_ [579] 
0 - 1 
- 0 1 
115 
• names [578] [579] [1163] 
1- 1 
- 1 1 
• names [1079] [1044] [1163] [581] 
0 - 1 
- 0 - 1 
- 0 1 
.names i_5_ [564] 
0 1 
• names [564] i_4- [586] 
0 - 1 
- 0 1 
• names i_0_ i_2_ i_l_ [592] 
0 - 1 
- 0 - 1 
- 0 1 
.names [586] [592] [1250] 
1 - 1 
- 1 1 
• names i_6_ i-0_ [564] i_2_ [1246] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
• names i_6- i-0_ [564] i_l- [1248] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [1250] [1246] [1248] [598] 
0 - 1 
- 0 - 1 
- 0 1 
• names [570] o_8_ [576] [583] 
0 - 1 
- 0 - 1 
- 0 1 
• names [579] [583] n_nl01 
00 1 
• names [565] [570] [576] i_5- [1110] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names i_0_ i-3_ i-l_ [588] 
0 - 1 
- 0 - 1 
- 0 1 
.names [586] [588] [1183] 
1 - 1 
- 1 1 
.names [1110] [1183] [589] 
0 - 1 
- 0 1 
• names [581] [598] n_nl01 [589] n_n47 
0000 1 
• names [563] [565] i_5- [567] 
0 - 1 
- 0 - 1 
116 
- 0 1 
.names i-4_ [564] i_6_ [1234] 
0— 1 
- 0 - 1 
- 0 1 
.names n_n47 [567] [1234] o_l_ 
0 - 1 
- 0 - 1 
- 0 1 
• names i-6_ i_0_ i_4_ i_2_ [1197] 
0 — 1 
- 0 — 1 
- 0 - 1 
— 0 1 
• names [565] [570] [571] i-4_ [1094] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names i_6- i-0_ i-4- i_l- [1195] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [1197] [1094] [1195] [608] 
0 - 1 
- 0 - 1 
- 0 1 
.names i_0_ i-4_ i-5_ [620] 
0 - 1 
- 0 - 1 
—0 1 
• names [565] [620] n_n95 
00 1 
.names [563] i_6_ [621] 
0 - 1 
- 0 1 
.names [578] [621] n_n96 
00 1 
.names [583] [621] n_n97 
00 1 
.names n_n95 n_n96 n_n97 n_n57 
000 1 
.names [565] [570] o-8_ i-4_ [1097] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names i_0_ i - l - [571] n_n36 
000 1 
.names n_n36 i-4_ i_3_ i-5_ [1227] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names i_6_ i - l - i-4- i_5- [1229] 
0 — 1 
- 0 - 1 
- 0 - 1 
117 
— 0 1 
.names n_n57 [1097] [1227] [1229] [624] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [608] [624] n_n53 
00 1 
• names i_2_ i_3_ i_l_ [627] 
0 - 1 
- 0 - 1 
一 0 1 
.names [567] [627] n_nll7 
00 1 
.names [564] [565] [628] 
0 - 1 
- 0 1 
• names [628] i_2_ i_3_ [563] n_nll8 
0000 1 
• names i_6_ i-1- i-5_ [563] n_nll9 
0000 1 
.names n_nll7 n_nll8 n_nll9 n_n56 
000 1 
.names [564] i_6_ [635] 
0 - 1 
-0 1 
• names [635] i_2_ i_4_ i_l_ n_nl23 
0000 1 
.names [565] i.0_ i_5_ i_4_ n_nl24 
0000 1 
.names i_0_ [563] i_5_ [641] 
0 - 1 
- 0 - 1 
- 0 1 
.names i_6_ [641] n_til46 
00 1 
.names n_nl23 n_nl24 n_nl46 n_n54 
000 1 
.names i_6_ i_0_ i_5_ [563] n_nl20 
0000 1 
• names [565] [563] [646] 
0 - 1 
- 0 1 
.names [588] [646] n_nl21 
00 1 
• names [592] [646] n.nl22 
00 1 
.names n_nl20 n_nl21 n_nl22 n_n55 
000 1 
• names n_n53 n_n56 n_n54 n_n55 o_2_ 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [586] [1203] 
0 1 
• names [565] [576] i_4_ [564] n_nl30 
0000 1 
• names [565] [641] n_nl31 
118 
00 1 
.names i_6- i-0- i_l- [563] n_nl32 
0000 1 
• names [1203] n_nl30 n_nl31 n_nl32 [308] 
0000 1 
• names i_4_ o_8_ [318] 
00 1 
.names i_2_ [318] i-5_ i_6_ [1100] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [308] [1100] [1094] [1097] o_0_ 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
• names [627] [1276] 
0 1 
.names [1276] i_4_ i-5_ [1225] 
0 - 1 
- 0 - 1 
- 0 1 
.names i_4_ i_5_ i_6_ [1223] 
0 - 1 
- 0 - 1 
- 0 1 
.names [1225] [620] [1223] o_9-
0 - 1 
- 0 - 1 
- 0 1 
• names o_8- i-2_ [1038] 
0 - 1 
- 0 1 
.names [571] i-3. [1065] 
0 - 1 
- 0 1 
.names [1038] [1065] o_7_ 
0 - 1 
- 0 1 
.names i.0_ [576] [571] [667] 
0 - 1 
- 0 - 1 
- 0 1 
.names i-0_ o_8. [571] [669] 
0— 1 
- 0 - 1 
- 0 1 
• names [669] [1067] 
0 1 
• names [570] [571] i_l- [671] 
0 - 1 
- 0 - 1 
- 0 1 
.names o_8_ [671] n_nl39 
00 1 
• names [1067] n_nl39 [262] 
00 1 
• names [570] o_8- iJ2_ [1104] 
119 
0 - 1 
- 0 - 1 
- 0 1 
.names i_0. i_3_ i_2_ [1106] 
0 - 1 
- 0 - 1 
- 0 1 
.names [667] [262] [1104] [1106] o_5_ 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names i-2_ i_l_ [1212] 
0 - 1 
- 0 1 
• names o-8_ i_l_ [1208] 
0 - 1 
-0 1 
• names [571] [576] i_3- [1216] 
0 - 1 
- 0 - 1 
- 0 1 
.names [1212] [1208] [1216] o_6_ 
0 - 1 
- 0 - 1 
- 0 1 
• names i-5_ [667] [1115] 
1 - 1 
- 1 1 
.names i_6_ [570] [564] i_l_ [1113] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [1115] [1110] [1113] [682] 
0 - 1 
- 0 - 1 
- 0 1 
.names [628] [669] n_nll3 
00 1 
.names i_2_ i_l- [683] 
0 - 1 
- 0 1 
.names i-0_ [683] i_5_ o_8- n_nll4 
0000 1 
• names [635] i-0_ o_8_ [571] n_nll5 
0000 1 
.names [682] n_nll3 n_nll4 n_nll5 n_n60 
0000 1 
.names [564] [588] n_nlll 
00 1 
• names [564] [592] n_nll2 
00 1 
• names n_nlll n_nll2 [194] 
00 1 
• names [1044] [194] [698] 
0 - 1 
- 0 1 
• names [564] [583] n_n89 
120 
00 1 
.names i_6- [570] i-5- i - l . n_n90 
0000 1 
• names [1079] [1080] 
0 1 
.names [698] n_n89 n_n90 [1080] [172] 
0000 1 
.names i_6_ i.0_ i_5_ iJ2_ [1258] 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [565] [570] [564] [576] n_n87 
0000 1 
• names [564] [578] n_n88 
00 1 
.names n_n87 n_n88 [176] 
00 1 
.names n_n60 [172] [1258] [176] o_3_ 
0 — 1 
- 0 - 1 
- 0 - 1 
— 0 1 
.names [571] [576] i_6- [1218] 
0 - 1 
- 0 - 1 
- 0 1 
• names i_6_ [671] n_n84 
00 1 
.names i-6- i-0_ i-3- [576] n_n85 
0000 1 
.names i-6- [570] i - l - [571] n_nl06 
0000 1 
.names n_n84 n_n85 n_nl06 n_n70 
000 1 
-names [565] [588] n_nl07 
00 1 
.names [565] [592] n_nl08 
00 1 
.names [565] [627] n_nl09 
00 1 
.names n_nl07 n_nl08 n_nl09 n_n69 
000 1 
.names [565] [583] n_nllO 
00 1 
.names i-6- i-2_ i_3_ [576] n_nl42 
0000 1 
• names i_6_ [571] i - l - o_8- n_nl43 
0000 1 
.names n_nllO n_nl42 n_nl43 n_n68 
000 1 
• names [1218] n_n70 n_n69 n_n68 o_4_ 
0 — 1 
- 0 - 1 
- 0 - 1 




Proof of some 2-local patterns 
The construction of new minimal patterns is a joint work with 
another M. Phil, student, Cliff Chin-Ngai Sze, under the supervi-
sion of Professor Yu-Liang Wu, in the Department of Computer 
Science and Engineering. In this thesis，only partial patterns are 
shown for illustration. 
Pattern 1 
Q ^ ~ — — 
Figure B.l: N e w 2-local pattern 1 
In Figure B.l, a is the target wire, let gi = {a^x)' where x is the other 
inputs of gi. g2 二 where y is the other inputs of g2. Qs = (P2*之)'where 
z is the other inputs of 仍.Therefore, gs = (((a * xYvYz)' = [{a' + x')yyz)'= 
((a^y + x'yyzy = {{a'y)'{x'y)'z)' 二（(々4)(工'2/)么)， 
122 
0 
Figure B.2: New 2-local pattern 2 
Pattern 2 
In Figure B.2, gs = (((a * xYvYz)' 二（(a' + x'Wz)' 二 ((a/y + x'yYzY = 
((a'yYix'yyzy = ((a + y')(x'y)zy = ((P4)'(工之)' 
Pattern 3 
Figure B.3: New 2-local pattern 3 
In Figure B.3，gs = (((a + x)' + ")' + 2：)' = ((a + x)y' + zy 二、ay' + xy’ + zy = 
{{94) + ocy'-h z)' 
Pattern 4 
Figure B.4: New 2-local pattern 4 
In Figure B.4, "3 = (((arr) + y)' + z)' = ((ar)V + z)' = ((a' + x')y' + z)'= 
{a'y丨 + x'y' + z)' = ((a + y)' + (x + y)' + z)' 二 ( � + {x ^ y)' + z)' 
123 
Appendix C 
Illustrations of FM algorithm 
In Figure C.l and Figure C.2, it shows 2 pass of F M algorithm on 4 vertices 
with 6 weighted edges. The bucket structure is used to store the intermediate 
value for F M algorithm. The final cost is reduced to 6. 
124 
First pass 
I selected to move 
^^^^^^^^^^^ 2I ^^^ Bucket X Bu^Y 
M A X G A I N “ » r y i MAXGAIN~•王d | 
” I a I b I / | d T ^ 
Cost = 6 
/ a K cannot move selected to move 
丫 \ Bucket X y Bucket Y / 
^ ^ 2 y 
/ JL MAXGAIN~~"“H b I MAXGAIN~H ^ 
I a I b I cT I d, I 
Cost = 8 , f 
广—•^ 1 I 广 V selected to move 
( V ^ — — — I Bucket X Bucket Y ^ 
/ M A X G A I N ~ ~ M A X G A I N ~ ~ ~ H g =-2 
I a I b I c Id I 
Cost = 6 J f 
/ Y b V 1 Bucket X Bucket Y 
• CT 
M A X G A I N ~ M A X G A I N • 6 - = 2 
I t ^ 稀 J ‘ 
，， . V select to move 
^ \ X 
I a I、b I c Id I 
Cost = 10 ，r  
Y Bucket X Bucket Y 
M A X G A I N — M A X G A I N — ~ g 
X Y , |\ I G � 4 
125 
Figure C.l: Example of F M algorithm - First pass 
Second pass c�st = 6 ^ ^ ^ 
I ^ ) \ cannot move selected lo move 
入 Bucket X Bucket Y / 
I MAXGAIN~» 6 -——叫 h I MAXGAIN ~• - 2" H d I 
‘ —CO — \ — ^ s 
” I b I d,| 
Cost = 8 
1 • 广 “ ― s e l e c t e d to move 
I Bucket X Bucket Y I 
M A X G A I N ~ m a x g a i n ~ ~ » 2 - ~ H a j > \ c I 
rf] b I c I d I 
Cost = 6 y , 
1 Bucket X Bucket Y 
MAXGAIN» -2-~~•KX/t*----^MAXGAIN~» 6 - " 叫 I g =2 
^ ’ ― 
X . Y select to move / \ 
I a I b I c ^ ^ 
Cost =10 ” 
广 V I 广 \ select to move 
( V ^ - — y Bucket X Bucket Y ^ 
m a x g a i n ~ • K / A maxga i n~ > ~ T ~ - ~H c K g =_4 
卿 , 
l a I b,| c I d I 
Cost = 6 y『 
/ Y T V / Bucket X Bucket Y 
/ J ^ • • 
MAXGAIN~» -2- » 6 - „ =4 
X • Y select to move 
�a I b l^Trd I G=g,+g,=o 
126 
Figure C.2: Example of F M algorithm - Second pass 
Appendix D 
HUSB Structures 
In this chapter, the Hyper-Univeral Switch Box (HUSB) Hi for i from 1 to 9 
is shown. For i 二 8, the H U S B is actually built by Hj + Hi, and Hj + Ih 
build Hq. 
• , N • ‘ \ 
• I s 
/ 丨 、 • 、 • I \ • I N 
/ 丨 、 
/ 1 \\ 
Vn > - ^ 3 1 3 
\丨 / 
\ I , \ _ Z N _ Z \ I , 
N , , 
\ • , N I / 
S 丨 / 
Y  
•41 4 
Figure D.l: HI S-Box 
127 
3 4 
y22 V21   ：^^^ 
/ 、 \ 一 ‘ \ 
‘ 、 、 、 - ’ ’ / 、 、 
( - ‘ 、 、 、 / 、 、 、 
2 V 3 一 , 、 \ 、 、 、 、 / 、、；-Vu 5 
� > � � ' ’ \ � � � / / 
\ '、、、 、 /V Z > 
N 、、、 、 ‘ \ Z / 
‘N 、、、 \ ‘ 、^‘ / 
‘ \ 、 、 、 、 ‘ / 
‘ 、、 � � \ ‘ Z \、 / ‘ 、 "i.'.' 、、 / / \ z'A、、 \ / / 、、广‘、、、 \ , / 、、一‘ ‘ ^  、、/ 
/ Z � � �\ 、 \ 、 、 ： h � 
1 /、、、、、 / :、 : ,一 6 、、、 •‘ 、、、、、 
N / 、、 i'' / 
�� • \ ‘ \ / Z / � � . , � -" �Y 
V42 
8 7 
Figure D.2: H2 S-Box 
•23 4 V22 5 V21 6 
^ … … / � � � � 
、 ： 、 、 、 、 / 、、、 ‘一--广、 / / 、 / 、、 
一 * � � … ‘ ‘ � ‘ � T7 7 •3—【--- ‘ 、Y • 、、 ‘ —- 一Vu 7 ^^ ‘ � / � I \ / J 
) 、、 I V 、 i 、 / ^ -
、：、’ /、、、、、” 、、JZ' 
� • � � / � � : � � , ' - - - / � � � 
*、、 V N / X / s ； 
: > : � � � �� K� - - - - ！ � � � 
、、、、\-"A-、、 ’’ 、、、 y :、-'、,'、、、、 \ / 、、J y；、、、/ 、、、、、 、、/ •、、 
/ ' 、、/ 、、、 、 / / N TT Q 
V 一 ” < 乂 >、 8 
2 ！、、 、:、、、 /、、、 
二、 / 、、、 、、、、/ 、、-‘‘ 
/ \ / ��� / , V t V - - \ ‘ % ‘ 
、、 ‘ 、v 八 、、 I ； \ / Z 、、、、‘ 
• 、 • 一 Z N 、 、、‘ 
•‘ 、々‘‘’ ：>'''\ 、、丨、:、、 
‘‘---''''''、、、 Z/、、、、、 ，'、、\ „ , 
… - " / 、、、 / 、、、、、、 ； 9 
1 、、、 / >、’ 、、、、、 
、 ‘ Z 、 ‘ \ ‘ I 、、 ‘ Z 、、 / V、！ ^ ^ ^ ^ 
V.3 12 V,, 11 V “ 10 
Figure D.3: H3 S-Box 
128 
k����…一-/P:;:作：…rq 
4 、、、、l r : 外 - 、 . i 
! \ y / / 、：、---::，•::-〜->、\ I \ ： 
LZ��i.---—--- \ 、：卜-义 \j 
V - 对 W \ W 
�I » \ � � I� 1 / 
, �» '�� �  '�� / 
3 �� / V"-^� � � , �� > 
:、/ T ^ ^ v ^ ^ 、、、 ‘‘,’ 、、、 '! •'/��� i \ V � � � / 
；/ 、、、 \ 、、、 ^ ^ ^ ^ ^ ^ !\ > 
I� 、、、，' \ 、、 /、、； 
v〜:」 \\ 、、、、 � � � � � � 1 1 
x'---. \\  、、\ j.-'-i'' 
\ / r。-、〜----、:--/ / v M 
/ 、、、i \ K 
‘ ‘^^-‘ I � \ / » I \ 
气、 I 、、 z � � i 、•、， 、 
、、、、、\ 、、、、/ /、、、、口::::“》12 
1 、、、、 / 、、、\ 人一二-----\ ^ ； 
丄 、、 、、» ‘ ‘ 、、V I 
、、、丨, - - - - - -一、 z Y ^ ~ 
V„ 16 v„ 15 v„ 14 13 
Figure D.4: H4 S-Box 
6 J'' 9 y,. 10 
I k i\ /•••., I 
柳IM.. r ‘ 
i \ -一、、、、、 卜、 I 
！ \ 、、.:•' /\ \ \ • �� 
k二:\--一 ！ \ ！ 
‘ \ \ I \ 、、• 
V 一 斗 - / y : � � � / ���� \ � � � � / � - -f … 1 2 
4 H / \、 >、、、 、、、、\ K / •_�� /• \ \ » \ �� \ ‘ �� « ！ / \ \ / ���� \  \ ！ \ /;• 
y /、、、、； \、、、、/ 、、  \ \ / M 
V 一 / 、\J \ 、、、、/ \ 、、：i ！ 13 
3 /、、、、 \ /、、、、 \ i：—---厂77 
、、‘’-！ \ \ ！ \ -----； \ \ / ,r 
！、、卜、、\ \ ！ \ ; v • 
/、、、、、、i.一-::缚：：:……\ 、、、/ \产、1、、、、 
；\ ； 、、 ： . \ . 
I \ ！ 、、、、； \ � � \ I 
！ 、、/ ------\ N \、卞、 ； 广 、 \ •-、、,* --V. ！ \ ��� \ ��� / .. t�� , . � � ; � - � : � � V…------���� / Y // \ 15 
1 、 ‘ \ • * • ' • " 、 、 》 ' 、 、 I ‘ V.---T I 1 、、 ； 、、 .-t' 、、 、 、 •. 
、、、、、["气丨 二V一：？ i 
v‘，20 V ‘‘ 19 V ‘】18 v „ 17 V “ 16 
Figure D.3: H3 S-Box 
129 
7 8 9 10 11 12 
y^S 「24 |V23 「22 「21 An ,、、、,���、、、、 / \ \ /; 、、二•-->::'〜、、、；、、、 、、 
/“V-、(广？、广、、、.、、、；：、、、 、、V 13 
5 •一办讽一7、-、、：：:、、\K 一 4 一 
�f命 M�丨--X--:::•过-/ 
、、、;/'、、、； \、、、； 、、、•• \、、、； /'、、/ 
、< 、、、1、 \ \、：、// W / 
〜 糊 H i \ 一 丄 ： 麵 一 “ 
丄 、 錄 ； 翁 條 一 
、、、、丨 > ： ' ； ' 、 ： 辦 / 、 、 、 V 
、、丨z r" r I""“1—1 
V45 V 4 4 V43 V42 V41 
24 23 22 21 20 19 
Figure D.6: H6 S-Box 
V,, 8 V,, 9 10 V,, 11 12 V,, 13 V^, 14 
k — k — ' M 、 : 一 、 、 、 
；'.、、、/：、、、/V 二衫广、、 、、 、、 
：‘、W"'--'、、、-、、、 、、、 、、、 
I • 、 ‘‘ 一 、 、 » 、丄 、： 、 、 ” 1 C 
• iKUf^' K 、、：‘、 - - - - -
‘、、、、-T^ -i^ Vy" -----\、；、、、、、； 
》〜”：千〈:!；：:沪 \r\.J:—一16 
6 、、 V ( \ I 、、 、；、-‘“,、 • 、、 ‘  、、、；z/k -----W--X"' ？-----乂 V V 一存 hM-N:- \ 1\ 、、、、 f〈、、、丨、、17 
5、、俄 \ \1、、、、、、、H\\��m 
剩 二 W - : 稱 》 為 1 8 
I \ I N < 、» 、 山 * -‘- - i » ‘ 、、 
V—-----一一\、、 、、、！ \ 分 � � J / \二::::::、為 21 
1 、、、 \ 、 、 、 一 ; 7 \ ；/、、、\ ； 
、、丨一--〒---“、、丨？’十 、、丨’ 、-( ：：^ V,, 28 27 26 25 24 23 22 
Figure D.3: H3 S-Box 
130 
9 10 V,, 11 V,, 12 V^, X3�23 14 15 16 
k k ；k /k~,'k Z'iu^-':::}”--- !\ 
° 、、 r 卜 / ,、、 _ —J \ 、 
、、、、/i---"/::-fS^ v--—::广\��� 、、  \； 
v〜t“--f >乂'’ 、、、； ； \ ； 、、 、、 i 、、、；、、、，'；， 
6 \ I .-r/ \ 、 、、 、、、；'. 、、； \ ； / 
,本Jr����、卜、 \ K /1V 2。 
/；.\ I 、、、 、、 \ ftr •、、、 
3 rK-:絶::::織：：:十-：；；^^ 
V一一斗-气厂下：����J K、、、、！：；孰汽：知1 23 
\ \ r、、、j \ ！\\、、、邊-小 
I �� • � _ �v_ N I % « 、 、《_•/• ： V 
V3. 4 —上—二 ―上i 
1、、、i\ >、 •;、、、、 
\ I \ 、、 Z、、 '/ 、、“、、丄• 
« I I S . • , N I / « t f S If 、•_ » • 
\ 丨 T 、、f、、f i 
V,, V“ V,, v„ V,, V,, 
32 31 30 29 28 27 26 25 
Figure D.8: H8 S-Box 
10 11 12 13 14 15 16 17 18 
r" v“ 卜「“ r" r" r" 
^J\/Mv:;><iMfcK、、、、、、、、I 
〜:::::\--食:喷爽、、k、、、、：1 二丄-二一 
、、 卜 / I、、 ‘ .--X r 、 I 、、 I 、、 
、、|'#、、、、、1 丨 \ X、、、、、、、4、、、”��� 
、i' //' V 、、 二 —= \ N \ 、 / 
气 n � � � � \ i N 了 赛 护 一 / � � � I��； \ 1 •�� ’_���； �� ： -'；-'/ ； ,f、、、 、、A 、、、；、、： \ ；、、； 、、、；'；-/ / ！ \ 
2 4•一」、、、、、、j 、、,： r、、、丄 
、 件 上 ^ ^ ^ 彻 � a T 
Is I'； I3 Ix 30 Is 23 
Figure D.3: H3 S-Box 
131 
Appendix E 
Primitive minimal 4-way global 
routing Structures 
2 
3 O 1 
4 
Figure E.l: GR\ 
2 2 2 
3 1 3 1 3 1 





3 O • 1 3 9 1 3 _ 1 
4 
2 
3 • 1 
• 
4 
Figure E.3: GRl.^GRl^.GRl.^GRl, 
2 2 2 
1 1 •• 
3 1 3 1 3 1 
4>  
4 4 
2 2 2 
O i •• 
3 1 3 1 3 1 
• f f  
4 4 






























5 1 5 
2 2 ，
 2


























































































































































































































































































< > ； . 1 
F 






















“• 、 . ......
 .
 •







. . . .















































， ： - : 7 、 : .
 .














































f m ^ M w i
 












^ i t f 
？f
 












CUHK L i b r a r i e s 
•圓•illlllllll 
0D3fl71fiM3 
