Experiences with serial and parallel algorithms for channel routing using simulated annealing by Brouwer, Randall Jay
I 
B 
I 
I 
B 
I 
B 
I 
(I 
E 
I 
I 
I 
I 
B 
I 
I 
I 
I 
February 1988 UILU-ENG-88-22 13 
CSG-84 
/- 6 / 3  COORDINATED SCIENCE LABORATORY N’’ 
College of Engineering / O - - d / - C L  
EXPERIENCES 
WITH SERIAL AND 
PARALLEL ALGORITHMS 
FOR CHANNEL ROUTING 
USING SIMULATED 
ANNEALING 
Randall Jay Brouwer 
{BASA-CB- 182530) EXFERIEHCES 51ZB SEBIAL nim-18289 
APE PABALLEL BIGCFIITHES FCB CEAE%EL P O U T I B G  
C.511G S I E U L A T E D  dbUEAL3EG ( I l l i n o i s  Uniw-)  
5 4  F CSCL 09B Unclas 
63/61 0126GC8 
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
~- ~ 
Approved for Public Release. Distribution Unlimited. 
https://ntrs.nasa.gov/search.jsp?R=19880008905 2020-03-20T08:27:06+00:00Z
EXPERIENCES WITH SERIAL AND PARALLEL ALGORITHMS 
FOR CHANNEL ROUl'IiiG USING SIMULATED ANNEALING 
BY 
RAiiDALL JAY B R O W  
B.S.. Calvin College, 1985 
THESIS 
Submitted in partial fulfillment of the requirements 
for the degree of Master of Science in Electrical Engineering 
in the Graduate College of the 
University of Illinois at C'rbana-Champaign, 1988 
Urbana. Illinois 
I 
I iii 
ABSTRACT 
I 
I 
I 
H 
1 
b 
I 
I 
I 
I 
Two algorithms for channel routing using simulated annealing are presented. Many of the 
channel routers of the past are for the most part based on greedy algorithms in which special 
heuristics are applied to generate monotonic improvement. These algorithms are called greedy 
because they d e r  from inappropriate selections, getting stuck at suboptimal solutions. Simu- 
lated annealing is an optimization methodology which allows the solution process to back up out 
of local minima that may be encountered by inappropriate selections. By properly controlling 
the annealing proccss. it is very likely that the optimal solution to an NP-complete problem 
such as channel routing may be found. Previous simulated annealing channel routers only per- 
mitted transformations which resulted in a routing without overlapping between nonconnected 
wires. The algorithm presented here proposes very relaxed restrictions on the types of allow- 
able transformations. including overlapping nets. By freeing that restriction and controlling 
overlap situations with an appropriate cost function. the algorithm becomes very flexible and 
can be applied to many extensions of channel routing. The selection of the transformation util- 
izes a number of heuristics. still retaining the pseudorandom nature of simulated annealing. 
The algorithm has been implemented as a serial program designed for a workstation. and a 
parallel program designed for a hypercube computer. The details of the serial implementation 
are presented, including many of the heuristics used and some of the resulting solutions. A 
description of the Intel ipSC Hypercube is given, details on how the channel routing problem 
was partitioned onto the hypercube are discussed. and results for an example and some perfor- 
mance calculations are presented. Finally, some concluding remarks are made concerning the 
applicability of simulated annealing to the channel routing problem, and some possibilities for 
future research work are discussed. 
iv 
ACKNOWLEDGEMENTS 
I wish to especially thank Professor Prith Banerjee for his continual encouragement. ideas, 
and support throughout the development of this work. I would also like to thank my family 
and my fiancee for their love, encouragement, and support. Finally, I would like to thank all of 
the members of the Computer Systems Group, past and present, for plenty of fruitful ideas as 
well as needed distractions. 
This work was supported by the National Aeronautics and Space Administration under 
contract number NAG-1413. 
I 
I 
1 
TABLE OF CONTENTS 
CHAPTER 
V 
PAGE 
8 
I 
I 
8 
1 
1 . 
2 . 
3 . 
4 . 
IiUTRODUCTION .............................................................................................................. 
1.1. Motivation ............................................................................................................... 
1.2. Channel Routing Problem ....................................................................................... 
1.3. Previous Work ......................................................................................................... 
1.4. Thesis Outline .......................................................................................................... 
SIMULATED ANNEALING ............................................................................................. 
2.1. Simulated Annealing .Methodology ......................................................................... 
2.2. Simulated Annealing Applied to Channel Routing ................................................. 
2.2.1. The first simulated annealing channel router ................................................ 
2.2.2. A new simulated annealing algorithm for channel routing .......................... 
2.2.2.1. Channel routing .................................................................................... 
2.2.2.2. Extensions to the channel routing algorithm ....................................... 
SERUL IMPLEMENTATION .......................................................................................... 
3.1. Implementation Details ........................................................................................... 
3.2. Heuristics ................................................................................................................. 
3.2.1. Initial placement ................................................................................................... 
3.2.2. Move selection ................................................................................................ 
3.2.3. Net selection ................................................................................................... 
3.2.4. 'Track selection ............................................................................................... 
3.3. Results ..................................................................................................................... 
PARALLEL IMPLEMENT44TION .................................................................................... 
1 
1 
2 
5 
8 
10 
10 
12 
13 
14 
15 
18 
20 
20 
20 
20 
21 
21 
23 
24 
27 
4.1. Hypercube Architecture .......................................................................................... 
4.2. Hypercube Software ................................................................................................ 
4.3. Intel Hypercube Simulator ...................................................................................... 
4.4. Implementation Details ........................................................................................... 
4.4.1. Selected topology ............................................................................................ 
4.4.2. Data partitioning ............................................................................................ 
4.4.3. Parallel moves ................................................................................................ 
4.4.4. Parallel updating ............................................................................................ 
4.5. Heuristics ................................................................................................................. 
4.6. Algorithm Results ................................................................................................... 
4.7. Performance Analysis ............................................................................................. 
4.7.1. Computation costs .......................................................................................... 
4.7.2. Communication corn ..................................................................................... 
4.7.3. Speedup calculations ...................................................................................... 
5 . CONCLUSIONS ................................................................................................................ 
5.1. Summary of Results ................................................................................................ 
5.2. Convergence Issues .................................................................................................. 
5.3. A?plicability of Simulated h e a l i n g  .................................................................... 
5.4. Parallelizability of the Channel Routing Algorithm .............................................. 
5.5. Future research ........................................................................................................ 
REFERENCES .................................................................................................................. 
27 
28 
29 
30 
30 
31  
32 
35 
36 
38 
39 
41 
41 
42 
44 
44 
44 
45 
45 
46 
47 
LIST OF FIGURES 
FIGURE PAGE 
1.1. Example Channel With Density 5 ................................................................................. 
1.2. Doglegging Examples ..................................................................................................... 
1.3. Vertical Constraint Graph ............................................................................................. 
1.4. Cyclic Constraint Problem ............................................................................................ 
2.1. General Simulated Annealing Algorithm ....................................................................... 
2.2. Local and Global Minima in an Annealing Cost Function ............................................ 
2.3. Example of Illegal Move ................................................................................................ 
2.4. Track Data Linked List Structure ................................................................................. 
3.1. n and u Shaped Subnets ............................................................................................ 
3.2. Final 12 Track Solution - Serial .................................................................................. 
3.3. Annealing Cost vs. Temperature - Serial .................................................................... 
3.4. Subnet Overlap vs. Temperature - Serial .................................................................... 
3.5. Average Xumber of Tracks vs. Temperature - Serial ................................................. 
4.1. Four-Dimensional Hypercube ....................................................................................... 
4.2. Parallel Algorithm for Channel Routing ...................................................................... 
4.3. Domain i m p  for ThrctDimensional Hypercube .......................................................... 
4.4. Move Communication Rquircments ............................................................................ 
4.5. MastedSlave Move Evaluation Steps ........................................................................... 
3.6. Vertical Constmint Graph Example .............................................................................. 
4.7. Final 12 Track Solution - Parallel ............................................................................... 
4.8. Annealing Cost vs. Temperature - Parallel ................................................................. 
n n  w n  
3 
4 
5 
6 
11 
12 
14 
17 
22 
25 
25 
26 
26 
28 
30 
31 
33 
34 
37 
39 
40 
................................................................. 4.9. Subnet Overlap vs. Temperature - Parallel 40 
41 4.10. Average Number of Tracks vs. Temperature - Parallel ............................................ 
LIST OF TABLES 
TABLE PAGE 
4.1. Approximated VCG Data .............................................................................................. 38 
42 
42 
4.2. Computation Timing (mscc) ......................................................................................... 
4.3. Message Transmission Timing (msec) ........................................................................... 
1 I 
I 
1 
I 
I 
I 
I 
1 
I 
1 
8 
cHApTw1 
INTRODUCI'ION 
1.1, Motivation 
During the past few years, we have seen the complexity of VLSI circuit designs increase 
rapidly. One reason for the increase in complexity is the technological advances in the area of 
mask production and fabrication. making it possible to use smaller and smaller devices. 
Another reason for the increase in complexity is the automation of the design process, through 
the use of Computer-Aided Design (CAD) tools. Without the aid of computer programs in the 
design process. the complexity of the design would be far too great for any engineer to handle. 
The design process can be divided up primarily into eight stages as follows [l]: 
1) System Specification (A4rchitectural Design 1) 
2) Functional Design (Architectural Design II) 
3)  Logic Design 
41 Circuit Design 
5 )  Circuit Layout 
6) Design Verification 
7) Ten and Debugging 
8) Prototype Test and Manufacture 
Stage five of the design process includes the placement and routing of components. There are 
usually three steps distinguished at this stage. namely : 
1) Cell Placement 
2) Global Routing of Wires 
3) Detailed Routing of Wires 
A great deal of research has been directed in these three areas over the last few years in an effort 
to develop algorithms to perform these complex tasks in a reasonable amount of time. All three 
of these problem are known to be NP-complete, which means that no known algorithm exists 
which can optimally solve any of these problems in polynomial (nonesponential) time with 
2 
respect to the size of the problem. For this reason, all the heuristics and algorithms that have 
been developed are only able to produce near optimal results. 
The detailed routing step can be modeled in many different ways. Some of these ways 
include: 
1) River Routing 
2) Channel Routing 
3) Switchbox Routing 
The focus of this thesis is to discuss a new algorithm for channel routing. 
1.2 Channel Routing Problem 
The general channel routing problem deals with placing wires connecting modules of a cir- 
cuit within a surface area of the chip using the connection layers provided by the given fabrica- 
tion technology. The surface area can be thought of as a general rectilinear shape. an L shape. a 
rectangular shape, or any other maskable shape. The wires may be fabricated using any of the 
connection layers available. 
In gate array and standard cell designs. the module placement step determines the positions 
of cell blocks in predetermined row sites on the chip. Space is provided between the rows of 
cells to connect terminals of cells to the respective terminals of other cells. These spaces are 
labeled channels. The global routing step then determines which wires to assign to be routed in 
each of the channels available. Finally. a detailed routing is performed on each channel to select 
the exact placement of conductors in the channel. These conductors are called nets. 
In this thesis, it will be assumed that the channel boundaries form a rectangle. and that the 
wire terminals are located at uniform spacings (grid based) along the top and bottom edges. 
Furthermore. only two layers will be used such that all horizontal net segments are routed in 
one layer and all vertical net segments are routed in the other layer. 
Under these assumptions. the goal then is. given a sequence of net terminals along the top 
and bottom borders of a rectangular channel. to determine a placement of the net segments so as 
3 
to minimize the size of the channel space and length or resistance of all connections made. An 
example of a terminal assignment for a channel is shown in Figure l.l(a). Figure l.l(b) shows 
one possible routing of the previous channel. In this figure. horizontal segments are shown in 
solid lines. vertical segments in dashed lines. 
The channel density is dehed  as the theoretical minimum number of tracks rquired to 
successfully and completely route a given channel. The density of any column is easily com- 
puted by counting the number of nets that must pass through the given column. The channel 
densify is then the masimum column density of all columns of the channel. Since this number . 
is channel dependent. it must be calculated for each problem. 
1 4 5 1 6 7  4 9 10 10 
~~ . . . . . . . . . . . .  . . . . . . . . . . . .  . . . . . . . . . . . .  . . .  .......... * ........ .<.. )... i... .>. .  <... ;... .:. . ..; .... :... .. . . . . . . . . . . . .  . . . . . . . . . . . .  . . . . . . . . . . . .  .......................................................... . . . . . . . . . . . .  . . . . . . . . . . . .  . . . . . . . . . . . .  .. ...>...!... >...! ... )...<. . .f. ...: ....! .... >...! .... f .... . . . . . . . . . . . .  . . . . . . . . . . . .  . . . . . . . . . . . .  - - . .  
2 3 5 3 5 2 6 8 9 8 7 9  
(a> Terminal Assignment 
1 4 5 1 6 7  4 9 10 10 
(b) Possible Routing 
Figure 1.1. Example Channel with Density 5 
4 
Doglegging is a term used to describe nets that occupy two or more tracks of the channel. 
Each net consists of a set of horizontal segments and a set of vertical segments. There are two 
forms of doglegging: restricted and unrestricted. Restricted doglegging only allows a net to be 
split into two tracks at a column in which a terminal of the net is found. A simple way to 
model this is to break nets with more than two terminals (multiterminal nets) into two- 
terminal subnets. This is shown in Figure 1.2(a). Each subnet is free to occupy any track of the 
channel. and separations of tracks will automatically occur at  the columns in which terminals 
are found. Unrestricted doglegging allows a net to be split so that it occupies two tracks at  any 
point along the channel. This is shown in Figure 1.2(b). 
One effective graphical technique used to determine relative positions of nets with respect 
to each other is the vertical constraint graph WCG). Each net of the channel is represented by a 
node in the graph. A directed edge from vertex i to vertex j indicates thar in column c of th; 
channel a terminal pin for net i is located along the top of the channel and a terminal pin for 
net j is located along the bottom of the channel. In order to avoid overlap between the vertical 
segments of nets i and j , the track selected for net i must lie above the track selected for net j .  
Figure 1.3 shows the vertical constraint graph for the example channel of Figure 1.3. 
A A B 
I I 
I I 
I + 
1 I + I I 
I I 
I I 
I I 
A A B B 
(a) Restricted (b) Unrestricted 
Figure 1.2. Doglegging Esamples 
5 
8 
I 
I 
1 
I 
6 
Figure 1.3. Vertical Constraint Graph 
If a cycle exists in the vertical constraint graph. then it is impossible to successfully route 
the channel without allowing unrestricted doglegging. Figure 1.4(a) shows a channel example in 
which there is a cycle in the VCG. and Figure l.J(b) shows how unrestricted doglegging is used 
to avoid the cyclic constraints. 
The channel routing problem described above has been proven to be N-P complete. 
1.3. Previous Work 
The channel routing problem, in all of its various formulations. has been a focus of much 
research interest for the past 15 to 20 years. Most of the earlier work was directed toward wire 
routing of multilayered printed circuit boards. After the introduction of LSI and VLSI fabrica- 
tion methods. research intensity increased, with many new ideas presented. 
One of the first algorithms presented was Lee’s More Ruufer (21. Lee based his algorithm 
on the idea of wavefront expansion of a single net and selection of the shortest path found 
between the source and sink terminals. Some of the problems with this method include the 
large amount of memory required. the often inadequate wiring of the last nets to be placed. and 
the tendency to use excessive numbers of vias. Originally, the algorithm was intended for PCB 
1 2 1  3 
I I I 
I I I -
I 
I  -
I I I 
I I I 
3 2  2 
(a) Example 
1 2 1  3 
I I I I 
I I I I -. 
I I 
I I  
I I I 
4; 
I I t 
3 2  2 
(b) Solution 
Figure 1.4. Cyclic Constraint Problem 
routing, but was easily applied to rectangular channels in integrated circuits. 
The nest major contribution was nearly ten years later when Hashimoto and Stevens [3] 
introduced the Loft-Edge Algorithm. This algorithm routes one track at a time. trying to max- 
imize the use of the space in the current track. Nets are placed in a left-to-right fashion until 
the track is filled. The algorithm’s performance is strongly dependent on the order in which 
nets are placed and the presence of vertical constraints in the channel routing problem. 
6 
7 
A 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
1 
1 
Deutsch made several improvements to the Left-Edge Algorithm in his Dogleg Chcvvrel 
Router [4]. most notably being his inclusion of doglegging. Through an effective use of dogleg- 
ging and other improvements. he was able to achieve better results than with the Simpler Left- 
Edge Algorithm. 
A new approach taken by Yoshimura and Kuh [5] derived routing heuristics from graph 
theory concepts. Nets are first grouped according to the vertical constraint graph and an inter- 
val graph based on horizontal constraints. Next. merging takes place between groups of nets to 
minimize the longest path in the modified vertical constraint graph. Their results demonstrated 
a large improvement over the Dogleg and Left-Edge Algorithms, especially in the minimum 
number of tracks required and overall processing time. 
Around the same time. another heuristic-based router was developed by Rivest and Fiddu- 
cia [6] called the r e e d y  chumel rourer. This router applies the same principles as the Left-Edge 
and Dogleg routers do: however. the channel is scanned on a column-by-coiumn basis instead of 
track-by-track methods of the former. Unrestricted doglegging is allowed: however, it may be 
necessary to add estra columns on the end of the channel to complete the wiring. 
Another approach. which combines aspects of both track-by-track and column-by-column 
routers. was presented by Sangiovanni-Vincentelli and Santornauro. called YACR2. for "Yet 
Another Channel Router 2" [71. Instead of requiring extra columns at the end of the channel. 
this router may require extra columns in the middle of the channel. 
A new approach. taken by Bumein and Pelavin [8]. applies linear and dynamic program- 
ming to the channel routing problem which is decomposed hierarchically. The results they have 
presented show a further improvement over previous channel routers. 
A far different approach was proposed by Joobbani and Siewiorek [9]. They have applied 
principles of artificial intelligence and expert systems to the channel routing problem. The task 
of channel routing is divided into subtasks which are assigned to subtask esperts. The eiforts of 
these upens are then coordinated to produce high quality channel routine. 
8 
Shin and Sangiovanni-Vincentelli developed MIGUTY: A 'RipUp Md Reroute' Detailed 
Router in 1986 [IO]. Mighty is a very powerful router, able to route chaMek of various shapes, 
including switchboxes. Mighty is a two-layer router: however, vertical routing is not restricted 
to a single layer and horizontal routing to the other layer. Heuristics are applied for placing 
nets one at  a time, displacing some nets slightly to make room for blocked nets. and ripping up 
some nets currently placed to allow other nets to be placed first. 
Finally, another approach was taken by Leong, Wong, and Liu[ll] through the application 
of a new optimization technique called simulated annealing. Their routing program produced 
very good results: however, the program run time was far too long. 
The above papers were chosen because they represent the major research contributions and 
directions taken in channel routing over the past few decades. Then have been many other 
papers published not mentioned that discuss improvements to previous algorithms. theoretical 
bounds on channel routing, and less restricted problem Statements (including gridless and mul- 
tilayered channel routing). For a good set of references on channel routing, see the introduction 
to the book by Hu and Kuh (11. 
1.4. Tllesisoutline 
In the remainder of this thesis, research in serial and parallel algorithms for channel rout- 
ing based on the simulated annealing methodology will be presented and discussed. This thesis 
is organized as follows. Chapter 2 discusses the simulated annealing methodology and how it is 
applied to channel routing. Frst. simulated annealing is presented as a recent approach to solv- 
ing multivariate optimization problems. Xext. a previous application of simulated annealing to 
channel routing is discussed in detail. Subsequently. our approach to channel routing is 
presented. Finally, we discuss how to apply this approach to the basic channel routing problem 
and also to extensions of channel routing which include unrestricted doglegging, obstacle 
avoidance, and switchbox routing. Chapter 3 will dtst present the details of the serial imple- 
mentation of the channel router. After discussing some of the heuristics used. the results of the 
I 
I 
I 
I 
I 
I 
I 
I 
II 
I 
I 
I 
1 
I 
I 
I 
I 
I 
I 
9 
serial version will be given. Chapter 4 will present the details of the parallel implementation of 
the channel router. The targeted parallel machine will be described, followed by descriptions of 
how the problem was partitioned onto the parallel architecture. Some of the heuristics will be 
discussed, and then the results will be presented. Finally, Chapter 5 will summarize the 
research accomplished and draw some conclusions from the work. 
10 
21. Simulated Annealing Methodology 
In 1983. Kirkpatrick. Gelatt. and Vecchi [121 demonstrated the similarities between sta- 
tistical mechanics and multivariate or combinatorial optimization and proposed a technique for 
optimization. Their technique, called simulated cuvreding, is analogous to the process of slowly 
cooling a bar of metal so that large uniform crystalline structures are formed. These crystalline 
structures represent the lowest possible energy states for that material. The probability of a 
given state xi with energy €(xi is given by 
where kb is Boltzman's constant and T is the absolute temperature. 
To simulate the annealing process of metals. one must first determine how the state of a 
system is defined. A methodology for permuting one state into another must be outlined. The 
selection of components to move can be made in either a purely random fashion or by applying 
specific heuristics generated for the problem at hand. An approximation to the energy of the 
states must also be formulated. usually in the form of a cost function that accurately represents 
the criteria to be minimized. Finally, a simulated temperature range and a schedule for decre- 
rnenting the temperature must be selected to achieve an optimal cost-to-temperature ratio 
throughout the annealing process. 
In 1953. Metropolis et d. E131 outlined an efficient procedure for deciding whether or not a 
given state will erist at a given temperature. At each step of the annealing process. a pseudo- 
random move is made and the resulting change in cost PC from the previous state to the new 
state is calculated. Then. the probability of accepting the new state is 
I 
I 
I 
I 
I 
I 
I 
1 
m 
i 
I 
I 
I 
I 
I 
I 
I 
1 
I 
11 
With this negative esponential function. it is very likely that new states causing a cost increase 
will be accepted at  high temperatures, but not a t  low temperatures. Figure 2.1 shows the gen- 
d i z e d  simulated annealing algorithm which can be applied to many ditferent problems. 
It is precisely this aspect of simulated annealing that makes it attractive over other optimi- 
zation methods. Nearly all of the channel routers presented in the fitst chapter apply a set of 
heuristics in solving the problem. The problem with simply using heuristics is that they can 
easily lead the optimization to a local minimum which could be far  from the optimal solution of 
the problem. Once the local minimum is found. these algorithms are stuck. Simulated anneal- 
ing allows one to get out of local valleys. Figure 2.2 graphically shows local and global minima 
in a typical optimization problem. 
However, there is a price to be paid. Simulated annealing is basically an iterative trial- 
and-error algorithm, and calculating each cost change could be expensive in time and computing 
resources. It is critical. therefore, to determine the cost criteria carefully and eficiently. 
Set Initial Temperature and State 
WHILE (Stopping Criteria Not Satisfied) DO 
FOR Inner-hp-Count - 1 TO MAX DO 
Select Elements to Move 
Select Move Operation 
Calculate Cost Change 
Evaluate AccepdReject Based on Temperature and Cost 
IF (Accept) THEN 
Ad just Tempraturc 
Update State Information 
END Inner Loop 
END While Stopping Criteria Not Satisfied 
Display Final Results 
Figure 2.1. General Simulated Annealing Algorithm 
12 
Objective 
Function 
20 40 60 80 lo0 
Search Space 
Figure 2.2. Local and Global Minima in an Annealing Cost Function 
In their original paper, Kirkpatrick, Gelart. and Vecchi showed how to apply simulated 
annealing to the problem of chip partitioning, cell placement, global wiring, and the classical 
Traveling SOIeJNzn Roblem. Other researchers have applied simulated annealing to logic 
minimization [13]. cell placement [lS, 161, global routing [171, and detailed (channel) rout- 
ing [ll] since then. Furthermore. many of the simulated annealing algorithms have been paral- 
lelized with some very interesting results. Some of these include partitioning and routing [MI, 
standard cell placement [19,20.21]. macro cell placement [22], and floorplanning [23]. 
2.2 Simulated Annealing Applied to Channel Routing 
As was noted above. determining the optimal assignment of nets in the tracks of a channel 
has been proven to be an NP-complete problem. Although many people have reported good 
results from applying heuristics to the problem. we feel that far better results in general may be 
attained by applying simulated annealing. Heuristic algorithms are usually greedy algorithms 
in the sense that only downhill moves are accepted. If a local minimum is encountered any- 
where, these algorithms will accept that state as the minimum. even though a better state may 
esist. 
13 
2.2.1. The ikst simulated annealing channel router 
In 1985, Leong, Wong. and Liu presented the first channel routing algorithm based on 
simulated annealing [ll]. Their algorithm borrows ideas from the net merging router of 
Yoshimura and Kuh [SI. All nets of a given channel are divided into subnets and the vertical 
constraint graph G is formed. This graph is then partitioned into groups in which subnets in one 
group represent subnets placed in the same routing track with no horizontal overlap incurred. 
One of three different types of operations is then chosen randomly and applied to ran- 
domly chosen mbnets to create a new channel state. These operations (or moves) include 
displacing one subnet from one group to another. exchanging two mbnets in different groups. 
and extracting a subnet from a group to form a new group. Further. only legal moves are per- 
mitted: at no time will a move that creates overlap between two subnets be allowed. 
Lcong [24] has demonstrated that this set of moves is sufficient to perform all necessary permu- 
tations on the state of the channel. 
The cost function applied to determine the acceptance or rejection of a move is a combina- 
tion of three characteristics of the current and new states of the channel. These are 
1) Channel Width 
2) Longest Path in G 
3) Track Sparsity 
The channel width requires no calculation. the longest path is found by searching the modified 
vertical constraint graph 6 ,  and the sparsity of each individual track must first be calculated to 
find the overall sparsity of the channel. 
Since annealing takes a long time to complete. one option is to parallelize the process. If 
moves are to be attempted in parallel, some mechanism must be used to prevent two separate 
moves from causing an illegal channel state. An example of how this might occur is shown in 
Figure 2.3. 
Without shared data. there is no easy way to have parallel selection of mbnets and moves. 
.Also. after a set of moves is attempted in parallel. the modified vertical constraint graph. G ,  
I 
1 2 
I I 
I 
I 
I I PO 
- 1 -  - - ; !  I -  
I I 
- :  I - A. - ;  
- A - ;  
I 
I 
I 
I 
I 
I 
I I 
I 
I 
I 
I 
I 
I I 
14 
1 2 Overlap \ , , 
I I 
I 
I I 
I P2 
2 1  
(a) Before 
2 1  
(b) After 
Figure 2.3. Example of Illegal Move 
must be modified to reflect each move. This is done before acceptance decisions are made, and if 
a subset of moves is rejected, reformulation must take place. Furthermore. it is difficult to 
incorporate avoidance of obstacles. such as power and ground wiring, into the vertical and hor- 
izontal constraint checking. 
For these reasons, we decided to investigate alternative approaches to applying simulated 
annealing to the channel routing problem. 
2.2.2. A ncw simulated annealing algorithm for channel routing 
The algorithm presented here is less restrictive during the annealing process than the algo- 
rithm of Leong, Wong. and Liu. First. an algorithm for channel routing is presented. Second. 
extensions of the channel routing algorithm to include unrestricted doglegging. obstacle 
avoidance. and switchbox routing are described. 
1s 
I 
I 
2.22.1. channel routing 
Four aspects of our channel router will be discussed here. The first is the set of moves per- 
mitted to operate on a given channel state. All moves of subnets are legal. We do not distin- 
guish between moving a subnet to an empty area of a track or moving it on top of another sub- 
net. creating overlap that must later be removed. Overlaps of subnets are handled during the 
evaluation of the cost function. A similar idea was successfully used in the Timberwdf cell 
placement program based on simulated annealing 1151. 
There are two basic move types used, displacement and exchange. Displacement moves 
allow a subnet residing in a given track Ti to be moved to track TI. Track Tj is either an esist- 
ing track, or a new track. Displacing subnets to existing tracks is the source of the majority of 
the improvements made to the channel state. It is possible through this move to eliminate 
tracks completely by moving all subnets in the track to other tracks. If the annealing process 
gets stuck at  a local minimum, displacing subnets to a new track can be used to free up the 
channel enough to get out of the local minimum. 
The second set of moves permitted is exchange moves. These moves are also used to pro- 
vide more freedom to the annealing process. Although no tracks are ever freed up by this move, 
eschanging two subnets does provide improvements in cases where a sequence of two moves is 
necessary. -411 exchange moves can be subdivided into a sequence of two displacement moves. 
The first part displaces one subnet into the track of the second subnet. usually causing horizon- 
tal overlap between the two subnets. The second part displaces the second subnet to the original 
track of the first. Since overlap is usually induced momentarily. the first displacement would 
be accepted with an extremely low probability. Thus in situations exemplified by Figure 2.3. it 
is far better to use the exchange move than two displacement moves. 
The second aspect of our channel router is the cost function used for calculating the cost of 
a new channel state after randomly selected moves have been applied to the current state. Since 
the goal of our channel router is to provide a near optimal routing of the given channel. the cost 
16 
for a given state of the channel is a function of the amount of overlap between unique nets 
(OL), the length of the nets (NL), the width of the channel (WC), and the fraction of the track 
not occupied by nets (FU). For each proposed move. the cost change incurred if the move was to 
be accepted is calculated as follows and used to evaluate move acceptance: 
It is necessary to adjust the values of the parameters CY , B , y , and 6 to optimize the annealing 
process. These values should be determined through a study of numerous trials on a variety of 
problems. 
The third aspect of our channel router is the data structure employed. Since overlap is an 
important part of the cost function and requires the most computation. the design of the data 
structure should concentrate on providing efficient calculation mechanisms. Each net occupying 
a given track is given a structure in a linked list that specifies the grid points of the left and 
right endpoints of the subnet segment found in the horizontal track. Each track list is linked 
with the list of the preceding track to form a two-dimensional linked data structure. The sub- 
net structures in each track list are also sofled by leftmost gridpoint value so that searches may 
be terminated early without traversing the entire linked list. Linked list structures are used for 
the track data because the number of subnets in a track varies greatly from track to track. along 
with the total number of tracks varying throughout the annealing process. 
For the vertical segments of subnets placed in specidc columns. there is no need for linked 
lists (at least not in the case of channel routing) and so dynamically allocated column structure 
arrays are used. The number of columns is always fixed. and each column has exactly two end- 
points where net terminals are located. The only other way to place more nets in a column is by 
unrestricted doglegging. Since those numbers are very small it is possible to use fixed sued 
arrays. Figure 2.4 shows the linked list structure used for the track data. 
I 
1 
1 
I 
1 
1 
1 
E 
1 
J 
1 
1 
I 
1 
1 
I 
1 
1 
17 
I 
I 
. 
Fkt N E m k  
Ldt Edpolat 
Ri@t Endpolat 
N.c Stmet Polator 
I 
J( 
Fkt Nu- 
Ldt Endpoint 
R Q h t  Bndpoln: 
hi S l n r t  Polnur 
I 
I 
Figure 2.4. Track Data Linked List Structure 
Finally, the fourth aspect of our channel router is the annealing schedule used. Many 
researchers have investigated optimal and efficient cooling schedules for annealing processes. 
The cooling algorithm can be modeled by Markov-Chains. One method has been developed to 
approximate the optimal cooling schedule by analyzing ked-length Markov-Chains in polyno- 
mial time [25]. Another method attempts to control convergence by adjusting the temperature 
so that the average cost decrease is uniform [26]. 
Initially we decided to take a simplified approach by applying a predefined temperature 
adjustment schedule. The annealing temperature T is adjusted based on the following schedule: 
Ti+l = ALPHA (T i )  x Ti 
in which the function ALPHA (TI ranges from 0.8 for large values of T to 0.95 for small values 
18 
of T. This schedule allows more permutations at low annealing temperatures to make many 
small improvements. To determine the initial temperature, 100 random moves with a positive 
cost change are evaluated without accepting any of them. The average cost change ACUST’, 
for those move is then calculated and TzNn is solved for as follows: 
The value 0.90 is used because at  the initial temperature we would like to accept 9Wo of all 
moves attempted. 
2.222. Extensions to the channel routing algorithm 
The algorithm presented above can easily be extended to include unrestricted doglegging, 
obstacle avoidance. and switchbox routing. 
To allow unrestricted doglegging it is necessary to add two more move .types to the set 
already used. one to split a selected subnet into two different tracks at a selected column, and 
one to restore a separated subnet back into a single track. Furthermore. a penalty or cost should 
be assessed to any move that creates unrestricted doglegs because of the additional vias required. 
In cases where cycles are found in vertical constraint graphs. it is necessary to allow unres- 
tricted doglegging. 
Since overlaps are allowed during the annealing process. the algorithm is also well suited 
for extending to include obstacle avoidance. Obstacle avoidance is important to consider if some 
sections of the routing area could be used for power or ground routing or any other element of 
the chip that must be placed there. By applying a very high cost to any subnet occupying those 
areas it is possible to retain the necessary freedom for the subnets at high temperatures to be 
piaced almost anywhere, and then as the temperature is reduced. those interferences can gradu- 
ally be eliminated. 
19 
I 
1 
I 
1 
I 
Switchbox routing is similar to channel routing, except nets are given terminals on all sides 
of the rectangle instead of just two sides. Although this problem is much more diflkult than the 
channel routing problem, it is not as m c u l t  to extend our algorithm to handle switchboxes. 
Since there are many more constraints on the placement of subnets. it is even more important to 
allow the subnets to overlap during high temperature annealing. In some sense. at high tem- 
peratures it appears that each subnet is being placed in the best location independent of all other 
nets around it, and as the temperature is reduced, the effect of the other nets is slowly increased. 
The linked list representation of the track data could easily be replaced with a representation 
similar to that used for the column data. Unrestricted doglegging would have to be included to 
successfully route nearly all switchbox examples. 
20 
cHGpTER3 
SERIAL IMPLEMENTA'ITON 
3.1. Implementation Details 
The algorithm presented in the previous two chapters was implemented as a serial version 
with approximately 7,000 lines of C code and was executed on a Sun Microsystems 3/50 works- 
tation under Sun CPJIX 4.2. Release 3.4. 
3.2 Heuristics 
In the following we will discuss various heuristics used for different characteristics of the 
annealing algorithm. After a simple initial implementation of the simulated annealing algo- 
rithm, i t  was clear that many more improvements on the algorithm would have to be made. 
The initial placement of nets and selection of moves. nets. and tracks for displacement were ori- 
ginally made in a purely random fashion. It is necessary to include some intelligent heuristics 
to all of the selection processes in order to achieve convergence within a reasonable amount of 
time. In the following pages. we will attempt to describe those heuristics that were applied to 
the uniprocessor implementation. 
3.21. Initial placement 
One simple heuristic was used for the initial placement of the nets into tracks. First, nets 
with all terminals on the top border of the channel are placed in unique tracks. No horizontal 
overlap is created because subnets in the Same track always belong to the same net. Next. all 
nets that have terminals along the top and bottom borders of the channel are placed, one per 
track. Finally, all nets that have all terminals along only the bottom border are placed. one per 
track. 
21 
3.2.2. Move selection 
Sechen [15] reported that for a simulated annealing algorithm for standard cell placement, 
the number of displacement moves should outweigh the number of eschange moves. The ratio 
used was 5:l in favor of displacements. After a series of tests. we found that for channel rout- 
ing, a ratio between 15:l and 20:l produced better results. 
After further analysis of the moves selected at  low temperatures, it was decided that 
exchange type moves should be eliminated for temperatures below a given threshold. The cost 
function used is able to accurately predict overlap for a given subnet displacement. but due to 
the complexity of the calculations. the overlap between exchanging subnets is only approxi- 
mated. Because of this. overlap could mistakenly be created at  low temperatures, not allowing 
enough time for the annealing to gradually clear it out. 
3.23. Net selection 
Net selection could be one of the most important aspects of the annealing process. If the 
best placed subnets are always selected to be moved, it will be impossible to make any progress. 
Originally, the mbnet to be selected was drawn at random from the set S of subnets. This 
approach is analogous to walking a random path in a forest. hoping to find the way out. 
c. 
One solution to the problem is to apply a weighting to each subnet in the set S. forming S. 
Subnets currently incurring some overlap should be weighted much higher than subnets with no 
overlap. This can be reflected by adding a cost term proportional to the amount of overlap the 
given subnet has. The subnets with overlap should be selected more often, and overlap should 
be quickly eliminated. A similar idea was also use by Kling and Banerjee [27] for selecting the 
queue ordering of modules in an evolution-based standard cell placement (ESP) program. 
Another possible factor that could be included in the selection of subnets is the current 
position of the subnet with respect to the best possible placement of that mbnet taken individu- 
ally. This idea applies primarily to "n" and "u" shaped subnets as shown in Figure 3.1, or in 
22 
other words, subneiG having either both terminals along the top border of the channel or both 
terminals along the bottom border. There are two reasons for wanting these types of subnets 
drawn to their respective borders. The first is that it frees up the central tracks so that other 
subnets having both top and bottom terminals may use those. The second reason. more impor- 
tantly, is that it shortens the length of the conducting wires of those subnets. reducing the resis- 
tance and propagation delay. We decided to add another term to the approximated subnet cost 
to reflect the escess length that is proportional to that length. During the high temperature 
ranges of the annealing process. the effect of the length is much less than the effect of the over- 
lap, so to save computation time, the length computation is only added below a given tempera- 
ture threshold. 
One other subnet selection biasing technique is to increase the probability of selecting sub- 
nets in nearly vacant tracks. If it is possible to displace a subnet out of an almost vacant track, 
then it might be possible at the same time to eliminate that track and decrease the channel 
width. 
u u 
I I 
I I 
I I 
- 
I I 
I I 
I I 
n n 
Figure 3.1. "d and 'u" Shaped Subnets 
23 
3.2.4. Track selection 
Selecting the track to displace a subnet to is also a very important decision. Purely ran- 
dom selection is simply not enough to secure improvements quickly, especially at high tempera- 
tures when bad track selections are accepted equally well as good track selections. Note. how- 
ever, that it is important not to eliminate "bad" moves because they are an integral part of the 
annealing process. 
The first method for biasing track selection to consider is to increase the probability of 
selection of a track based on track vacancy. Since subnets are selected to vacate nearly empty 
tracks, the track selected for displacement to should be reasonably full. or few gains are made. 
Another heuristic applied to *n" and *u* shaped subnets is to bias the displacement track 
selection toward those tracks on the inside of the subnet. This approach should encourage such 
subnets to move toward the border tracks to reduce wire length and congestion. Again. it is 
important not to over-bias the selection because it is absolutely necessary that some subnets 
move away so that better subnets can be moved inward. 
In physical annealing, during very low temperatures. molecular movement is usually lim- 
ited to a very small area around the molecule's current position. This same idea has b u n  
applied by many in standard and macro-cell placement by simulated annealing [20.22.281. The 
idea can take on two forms: One. a fixed sized window enabled for temperatures below a thres- 
hold, and two, a variable sized window proportional to the temperature. The first is the easiest 
to implement. but the second is better suited to annealing because of its gradual changes. X 
thorough testing was not done to determine the feasibility of either approach: further research 
in this area is necessary. 
A more accurate way of determining which track to choose is to evaluate the anticipated 
overlap and wire length changes that would occur for each track under consideration. This 
estimated cost is then used to find the weighted probability of seiection for each track. 
Although this is one of the better heuristics. it is also very costly in computation time. 
24 
33. Results 
Due to the large number of variations possible in heuristics. a thorough testing of each 
heuristic independently was impossible. Many trial runs were performed combining many of 
the heuristics together and adjusting the parameters and heuristics by analyzing the output of 
each run. Instead of listing the results of every trial, this section will present the results of 
applying some of the "better" heuristics to one channel example in particular. 
The ratio of exchange to displacement moves to eschange moves was 15:l. The threshold 
temperature for cutoff of exchange moves and including net length in the cost calculations was 
20.0 At each temperature 500 iterations were performed. The density of the channel was 12. 
and there were 21 nets broken up into 39 subnets. 
The weighting applied to each subnet was a function of the overlap. the current track 
vacancy. and if below the threshold. the excess length of the mbnet. Subncts for displacement 
and the first subnet for exchanging were selected randomly biased by the calculated weighting. 
The second subnet selected for exchanging was biased by precalculating the resulting overlap for 
each eligible subnet. 
The tracks for displacement were biased by calculating the expected overlap if the track 
was selected and adding a constant factor to bias "n" and "u" shaped nets toward the appropriate 
border. N o  windowing was used in selecting either the tracks or subnets. 
Figure 3.2 shows the final solution for the 12 track example. Figure 3.3 shows the anneal- 
ing cost with respect to temperature for that example. Figure 3.4 shows the average overlap 
with respect to temperature. Finally, Figure 3.5 shows the average number of tracks with 
respect to the temperature. 
1 
I 
Figure 3.2. Final 12 Track Solution - Serial 
2000 
4 - 1  
cost 
0 
0.001 0.01 0.1 1 10 
1.0 / Annealing Temperature 
Figure 3.3. Annealing Cost vs. Temperature - Serial 
26 
Ave. Overlap 
0.001 0.01 0.1 1 10 
1.0 / Annealing Temperature 
Figure 3.4. Subnet Overlap vs. Temperature - Serial 
15 1 
I \ 
0 '  I I I I 1 
0.001 0.01 0.1 1 10 
1.0 / Annealing Temperature 
Figure 3.5. Average Xurnber of Tracks vs. Temperature - Setial 
27 
PAMLLEL IMPLEMENTATION 
Once we demonstrated the viability of our simulated annealing approach to solve the chan- 
nel routing problem, we decided to implement a parallel version which was the original intent 
of this thesis. The parallel algorithm would serve to cut down the run time of the algorithm. 
The machine for which the parallel version is targeted is the Intel iPSC Hypercube. The iPSC 
was chosen because one machine is readily available for use here at  the University of Illinois for 
testing. However, even though a system was wailable. due to lack of time and resources. no 
testing could be performed on it. Instead, simulations were carried out using the Intel Hyper- 
cube Simulator, version 3.0 running on a Sun Microsystems 3/50 workstation. 
4.1. Hypercube Architecture 
A hypercube computer is a collection of P = Xv processor nodes interconnected by a 
binary N-cube topology. Each node of the hypercube is a self-contained computer with a cpu. 
memory. and communication hardware. Each node can communicate directly with exactly N 
neighbors through communications channels connecting adjacent nodes. Figure 4.1 illustrates a 
four-dimensional (16 node) hypercube, showing the nodes and communication channels between 
them. Each node is labeled with a unique N-bit binary number so that adjacent node numbers 
differ in exactly one bit position. 
The diameter of a network is defined as the maximum number of hops required to send a 
message between any two nodes. and the node CoMectivity is the maximum number of commun- 
ication lines required for any single node. For the hypercube, the diameter and node connec- 
tivity are both logz P. The hypercube offers a good balance between node connectivity and com- 
munication diameter. Furthermore, the topology of the hypercube allows a USCT to embed many 
dif€erent communication mappings such as meshes. trees, linear arrays, and smaller dimensional 
28 
Figure 4.1. Four-Dimensional Hypercube 
cubes. Each of the sixteen nodes of the available iPSC contains an Intel 80286 cpu. an Intel 
80287 numeric coprocessor. 4.5 MBytes RAM. and communication hardware based on the Intel 
82586 Ethernet Controller Chip. It is possible to have up to a seven-dimensional ipSC hyper- 
cube: however, such an array is difficult to draw and harder to visualize. 
The iPSC cube nodes are connected to a System Manager computer through which a user 
can interface. The cube manager is made up of a monitor, hard and floppy disk drives. and eth- 
m e t  ports for connecting to both the cube and other computers on a local area network. 
4.2. Hypercube Software 
A typical program to be run on the ipSC is made up of two separate executable parts. One 
part, called the host program. is executed by the hofl or System Manager provides the user 
interface, file access, and downloading of the node program to each node. The second part, called 
the node program. is executed by each node in parallel. Since the hypercube is a message-passing 
based architecture. special constructs and functions are used to establish communication for the 
node with the host and for the node with other nodes. 
29 
The functions used for sending and receiving messages have the form: 
send(ci. type. buf. Ien. node, pid); 
r e d c i .  type, buf. ten. &cnt. &node. &pid): 
where 
ci 
type- Type of message being sent or waiting to  be received 
buf - Starting address of buffer to read message from or to write message into 
Im = Number of bytes to send or the size of the receive buffer 
node- Number of node to send to or number of node received from 
pid - Process id of process sending message 
cnt - Number of bytes actually received 
- Channel identifier for the channel to transmit the message on 
Other functions are available for reading the clock, checking the status of a channel. writing to a 
logfile, and some diagnostic functions. 
43. Intel Hypercube Simulator 
The Intel Hypercube Simulator is a tool distributed by Intel to provide the user with an 
environment for developing and debugging programs written for the hypercube. The simulator 
simulates the actual hypercube by forking a UNLY process for each node. Communication 
between nodes is simulated by using U r n  pipes and signals. Aside from a few minor limita- 
tions. Intel claims that programs successfully run on the simulator will run on the hypercube 
with few to no changes. 
The material for the preceding sections of this chapter was taken from 
[29.30.31.32,33.34]. 
30 
4.4. Implementation Details 
Before developing an algorithm for implementation on a hypercube, one should consider 
first the number of processors required. how the problem can best be partitioned, how to map 
that partition onto the hypercube, and what data structures would be most efficient for such an 
implementation [35]. Given a highly parallelizable problem like matrix multiplication. choosing 
the right partitioning, mapping, and data structure could greatly affect the performance of the 
implementation. For example. partitioning the data of a matrix according to the back diagonals 
of the matrix would not make any sense. For this reason. care must be taken in developing the 
parallel algorithm and implementation. The parallel algorithm implementation for channel 
routing is outlined in Figure 4.2, and will be discussed in more detail in the following sections. 
44.1. Selected topology 
Since the hypercube topology can be used to embed many other topologies. we choose to 
map the processors into a linear array as shown in Figure 4.3. The lines and arcs on the figure 
show the communication channels for the three-dimensional hypercube as it  is embedded into a 
line. Adjacent procssors in the array are chosen to be adjacent nodes of the hypercube. 
Determine Initial Annealing Temperature and Parameters 
Make Initial Track Assignment to Each Processor of Hypercube 
'WHILE (Temperature > E )  DO 
FOR Inner-Loop-Count - 1 TO . W - I  DO 
FOR Cube-Dimension - 0 TO log (P) - 1 DO 
Randomly Select One Subner in Each Processor in Parallel 
Randomly Select P/2 Moves For Node Pairs of Cube-Dimesion in Parallel 
Evaluate Cost Change for Each Move Between Pairs of Nodes in Parallel 
Evaluate AccepdReject Based on Temperature and Cost in Parallel 
IF (Accept) THEN 
Update Local State Information 
Broadcast Updates to All Other Nodes 
END Dimensions of Cube 
Adjust Temperature in each Node 
END Inner Loop 
END While Stopping Criteria Not Met 
Display Final Results 
Figure 4.2. Parallel Algorithm for Channel Routing 
CHANNEL 
U K  
Trk k 
Trk k+l 
Tr 2k 
Trk%k+1 
Trk 3k 
Irk 3k+l 
Trk 4k 
Trk 4k+l 
Trt Sk 
Trk 5k+l 
Trk 6k 
Trk 6k+l 
Trk 7k 
Trk 7k+l 
Trk 8k 
} --3 
} --3 
} --3 
}--- 
} ---> 
} ---3 
}---> 
} --* 
PROCESSORS 
31 
Figure 4.3. Domain Map for Three-Dimensional Hypercube 
following a pseudo-gray code. The pattern is not a true gray code since we chose not to have the 
topmost and bottommost processors adjacent. This distributes the long range connections more 
evenly. 
4.4.2. Data partitioning 
After the initial placement of the subnets into tracks (similar to the serial implementa- 
tion). sets of adjacent track are assigned to corresponding nodes in the linear array of Figure 4.3. 
The tracks are distributed as evenly as possible so that the work load of each node is as uniform 
as possible. The channel area is divided into strips of tracks because the algorithm used assumes 
that subnets are dispiaced or exchanges between different tracks. 
32 
Each node is given information about the horizontal space used by each subnet in each 
track assigned to it. In other words, each node receives Uph of the corresponding serial linked 
list structure for the tracks. It is unnecessary for each node to know what sections of its 
neighbor's tracks are occupied or not. However. a copy of the entire column data array is main- 
tained in each node because of faster accessing and the small amount of updating required for 
the column data. 
4.43. Parallel m o m  
The moves used to transform one channel state into another will still be based on the dis- 
placement and exchange moves of the serial algorithm. In this case. however. two nodes 
cooperate together as a pair to perform the desired transformation. During the evaluation of a 
move. one processor of the pair acts like a master. and the other a slave. The following moves 
can then be identibed: 
MOVE 0: Intra-Displace - each node of a pair performs a displacement move within 
its own sets of subnets and tracks 
MOVE 1: Inter-Displace - master node displaces a subnets from its domain to a track 
within the domain of the slave node. 
MOVE 2: Intra-Exchange - each node of a pair performs an exchange move within its 
own sets of subnets and tracks 
MOVE 3: Inter-Exchange - master and slave nodes each select a subnet to exchange 
with each other 
By applying the Inter-processor moves, it is possible to utilize the connections to nodes not adja- 
cent on the linear array to move a subnet a large distance up or down the channel in a single 
move. 
It is important to select which node should be the master and which node should be the 
slave for a given iteration. The node numbers of the two nodes of a pair always differ in 
exactly one bit posirion. An algorithm specifying that the node with a one in the bit position 
should be the master and the other, the slave, would not work because then sooner or later. all 
of the slave's subnets would get displaced to the master. Instead. the mastership of a pair of 
33 
nodes should alternate after each iteration. 
The selection of the move is performed at the beginning of an iteration by the master pro- 
cessor. The ratio of intraproccssor to interprocessor moyes is 1:l. Intraprocessor moves improve 
the performance and speedup. but interprocessor moves are equally necessary to be able to move 
the subnets throughout the Channel. The ratio of displacement moves to exchange moves ranges 
between 15:l and 20:l. the same as in the serial implementation. 
Since the hypercube has no shared memory, it is necessary for the nodes of a pair to com- 
municate through messages while evaluating each current move. Figure 4.4 illustrates the com- 
munication requirements for each of the four types of moves discussed earlier, and Figure 4.5 
lists the steps performed by the master and slave processors in evaluating the move. Note that 
for MOVE 3, Inter-Exchange, it is possible to overlap the first message sent by the master with 
the first message sent by the slave to gain some parallelism. Furthermore, some calculations can 
be performed by each processor during the transmission of the first message. 
MOVE 0 MOVE 1 
MOVE 2 MOVE 3 
Figure 4.4. Move Communication Requirements 
MASTER 
34 
SLAVE 
Select Subnet m 1 
Select Move 
Calculate Cost Change for Removing m 1 
Send Subnet mn 1 Data to Slave (m 1) 
Case (MOVE) 
0: Select New Track mt 1 
Calculate Cost for Placing mn 1 in mt 1 
If (Accept(Cost Change)) 
End Case 0 
Update mn 1 
1: Wait For Cost Data From Slave (12)  
. . 
... 
Receive Cost Data From Slave (s 2)  
If (Accept(Cost Change)) 
End Case 1 
Update mn 1 
2: Select Subnet mn2 for Exchange With llyz 1 
Calculate Cost for Placing mn 1 in mt2 
Calculate Cost for Placing mn2 in nrt 1 
if (,Accept(Cost Change)) 
Update mn 1 and mn 2 
End Case 2 
3: Wait for Subnet m 1 From Slave (s 1) 
Receive Subnet sn 1 Data From Slave (s 1) 
Calculate Cost of Placing m 1 in mt 1 
Wait for Cost Change Data From Slave (s2) 
Receive Cost Change Data From Slave (s 2) 
If (Accept(Cost Change)) 
Updatemnlandm1 
End Case 3 
End Casc(M0VE) 
Broadcast Updates 
Select Subnet sn 1 
Calculate Cost Change for Removing sn 1 
Wait for Subnet mn 1 Data From Master (rn 1) 
Receive Subnet mn 1 Data From Master (rn 1) 
case (MOVE) 
... 
0: Select New Track st 1 
Calculate Cost for Placing m 1 in sr 1 
If (Acccpt(Cost Change)) 
End Case 0 
Update sn 1 
1: Select New Track sr 1 
Calculate Cost for Placing mn 1 in st 1 
Send Cost Change to Master (52)  
End Case 1 
2: Select Subnet sn2 for Exchange With sn 1 
Calculate Cost for Placing sn 1 in st 2 
Calculate Cost for Placing sn2 in sr 1 
If (AcccpdCost Change)) 
Update sn 1 and sn 2 
End Case 2 
3: Send Subnet sn 1 Data to Master (I 1) 
Calculate Cost of Placing mn 1 in st 1 
Send Cost Change Data to Master (s 2) 
End Case 3 
End Case(M0VE) 
Broadcast Updates 
Figure 4.5. ?vaster/Slave Move Evaluation Steps 
35 
4.4.4. Parallel updating 
The data defining all aspects of a subnet are split up into two separate structures, the track 
linked lists and the column data array. If a move is evaluated favorably (using the same cost 
and acceptance evaluation functions as the serial implementation). the information in the data 
structures must be updated to reflect these changes. It should be noted. however, that with 
evaluating moves in parallel. the information is. for all purposm. out of date. Processor pair 
(i  , j )  evaluates their rnoveb). tacitly assuming the rest of the data on the channel is constant, 
while at  the same time processor pair (k I) is doing likewise. It is very possible that the moves 
may offset each other and result in the state of the channel being worse than expected. 
Jones and Banerjee [20] have found that the convergence properties were nearly main- 
tained despite the use of parallel moves similar to what we propose. Furthermore. they were 
able to apply those results to a uniprocessor or serial implementation of their placement algo- 
rithm [36]. They applied what was termed pseudqxnzZ&L moves in which a series of moves 
would be evaluated and accepted before performing an update on the data structure. By doing 
this, they were able to maintain the convergence of their algorithm while decreasing the compu- 
tation time dramatically. It is possible to apply pseudo-pluallel movej because of the nondeter- 
ministic behavior of annealing. An offshoot of that idea, shown by Grover 1161. is to use 
approximate cost calculations to save computation time. As others have pointed out [37.221, it  
is important to carefully control parallel or approximated moves, especially at low tempera- 
tures. 
As shown in Figure 4.2, updating of all data structures is performed after each pair of pro- 
cessors has esecuted moves in parallel. Every node must receive the updated information. so 
some method must be used to broadcast the information across the network. The simplest 
method. provided the hypercube network could support the function. is to have every node 
broadcast the information to every other node. Unfortunately, the iPSC hypercube does not 
have appropriate hardware for global communication: instead. a global send is performed by 
36 
sending a copy of the message out on a tree embedding E381. Another possible method is to 
embed a ring network into the hypercube and transmit update information around the network 
until it returns to the originating node. The tree broadcast scheme is O(2og PI, but contention 
and congestion on the channels will likely slow the performance down considerably. On the 
other hand. the ring network scheme is O(P), but will not have problems with contention and 
congestion because of the uniform uni-directional flow of data around the network. 
43. Heuristics 
In general. nearly all of the heuristics discussed in the previous chapter for the serial 
implementation were also applied in the parallel implementation. modified slightly for the 
different moves and data structures. One problem faced by using the Intel Hypercube Simulator 
was the extremely long time needed for each trial run. It became important then to h d  addi- 
tional heuristics to speed up the convergence of the algorithm. 
The first idea was to improve the selection of the second subnet of an intraprocessor 
exchange move. Assuming that the first subnet i is already chosen. it is possible to evaluate the 
overlap that would occur for each subnet of the set of possible subnets to be chosen. This over- 
lap can then be used to bias the selection of the subnet in favor of the subnets causing the least 
damage, and which will likely provide some improvement. 
The nest heuristic applied was to adjust the constant factor y for the change in channel 
width of the cost function to reflect whether or not tracks should be removed, added. or neither. 
For example. if the channel density for a problem is D ,  and if along the annealing process the 
current channel width is D + 2, then it is favorable to increase the chances of removing a track 
and decrease the chances of adding a track. If. however, the current channel width is D , then it 
is usually better not to create new tracks or remove any of the current tracks. 
The final heuristic used is to approximate the vertical constraint graph G created from the 
original problem statement and estimate track positions based on the struczure of the graph. 
Figure 4.6 and Table 4.1 show the solution to a simple channel routing problem. the vertical 
37 I 
I 
I 
I 
constraint graph for the problem. and the data resulting from approximating the vertical con- 
straint graph. Source nodes of G are nodes which only have directed arcs pointed away from 
the node. Sink nodes are nodes which only have directed arcs pointed into the node. Assuming 
1 4 5 1 6 7  4 9 10 10 
1 ~ 1 1 1 1  I l l 1  
1 I I I I I I  
I I l l 1  I 
I I 1 
I I I I 1 ; -  
I I  I 
I I 
1 I 
I I  
I I I  1- 
I 
I I 
I l l  
I 1  
r 1 i + l I  
I I 1  I l l 1  
I I 1 I I I I I  
I I I I I  1 I I I 1 I I  
I I I I I I I I I ~ I I  
1 * I  
I I I  
; +  I 
2 3 5 3 5 2 6 8 9 8 7 9  
(a> Problem 
(b) VCG 
Figure 4.6. Vertical Constraint Graph Example 
38 
Table 4.1. Approximated VCG Data 
Subnet 
1 
2 
3 
4 
5a 
5b 
6 
7 
8 
9a 
9b 
10 
Position 
in Path 
1 
4 
3 
1 
2 
2 
3 
2 
3 
2 
2 
1 
Total Path 
Length 
3 
4 
3 
3 
3 
3 
4 
4 
3 
3 
3 
4 
Percentage 
of Total 
0.25 
0.80 
0.75 
0.25 
0.50 
0.50 
0.60 
0.40 
0.75 
0.50 
0.50 
0.20 
Approximate 
Track 
2 
5 
5 
2 
3 
3 
4 
2 
5 
3 
3 
1 
for now that there are no cycles, for each node of the graph one can find a path through that 
node which starts at  a source node and ends at  a sink node. Let pi be the longest path from 
source to sink passing through node i . For node i the final track placement can be approximated 
by the position of the node along pi. For channel width w ,  net number 7 of the example would 
be assigned to'a zone ranging from track 0 . 2 5 ~ ~  to track 0.50Xw. 
This approximation can then be applied to the cost evaluation at several points. One possi- 
ble use is for subnet selection. Subnets not placed in the tracks of their zones can be biased for 
selection higher rhan those inside their zone. Another way to apply this is in the selection of 
tracks for displacement. Tracks to be selected can be biased inversely proportional to the dis- 
tance from the subnet's zone. 
4.6. Algorithm Results 
For reasons similar to those in Chapter 3. we will be presenting a summary of the results 
of one Channel routing problem for the set of heuristics that arrived at the best results. The 
same channel routing problem with channel density twelve was used for the testing. 
39 
1 
I 
8 
I 
i 
I 
I 
8 
I 
1 
8 
I 
8 
I 
I 
8 
I 
1 
The ratio of displacement-to-exchange moves was 15:l. At the same time, the temperature 
was decreased according to the annealing schedule. and the number of iterations at that tempera- 
ture was increased by 45% over the number of iterations at the previous temperature. The cost 
factor y was dynamically changed to d e c t  the need to add or remove tracks in the channel. 
Finally, the VCG was approximated and the information was used to bias the selection of sub- 
nets by weighting mbnets not found in their expected track range with a higher probability. 
Furthermore, the approximated track position was used to bias the selection of tracks for dis- 
placement. 
The h a 1  routing solution for the twelvttrack example is shown in Figure 4.7. The plots 
of Cost vs. Temperature, Average Overlap vs. Temperature, and Average Number of Tracks vs. 
Temperature are found in Figures 4.8.4.9, and 4.10. respectively. 
4.7. Performance Analysis 
Although the parallel implementation used the Intel Hypercube Simulator. we did perform 
an analysis of the expected speedup of the algorithm when run on the Intel iPSC Hypercube. 
Figure 4.7. Final 12 Track Solution - Parallel 
1500 4 
cost 
0 '  I I I I 
0.01 0.1 1 10 
1.0 / Annealing Temperature 
Figure 4.8. ,4nnealing Cost vs. Temperature - Parallel 
"1 
c 
Ave. Overlap 
0.01 0.1 1 10 
1.0 / Annealing Temperature 
Figure 4.9. Subnet Overlap vs. Temperature - Parallel 
40 I 
1 
I 
8 
I 
41 
0 '  I I I I 
0.01 0.1 1 10 
1.0 / Annealing Temperature 
Figure 4.10. Average Number of Tracks vs. Temperature - Parallel 
4.7.1. Computation costs 
The amount o f  time spent on processing each new move was measured by applying the 
CLOCK0 function of the simulator to random moves repeated thousands of times. Since the 
simulator was run on a Motorola 65020 CPU and the hypercube uses Intel 80286 processors. 
some adjustment must be made to account for the difference in processing speeds. The 68020 is 
rated at 2.7 AMPS. while the 80286 is rated at 0.78 MIPS. This is a dserence of approximately 
3.5. Table 4.2 gives the computation times for the master and slave nodes for each of the four 
types of moves for both the 68020 and the 80286. The computation costs for updating subnets 
after each move are also given in Table 4.2. 
4.7.2. Communication costs 
The simulator does not provide any mechanism for estimating the amount of time needed 
to send a message from one node to one of its neighbors. so we will use timing information 
reported in the literature for sending onthop messages on the Intel iPSC Hypercube [391. Table 
4.3 summarizes the message timing for the different types of messages used in our implementa- 
tion. The messages m l .  sl. and s2 are from Figure 4.4. The update message is the packet 
42 
- 
MC68020 CPU 80286 CPU 
Operation (Measured (Pro jetted) 
Master Slave Master Slave 
Table 4.2. Computation Timing (msec) 
iMOVE 0 17.0 15.8 59.5 55.3 
MOVE1 11 8.4 1 6.3 1 29.4 I 22.0 1 
MOVE 2 
MOVE 3 
22.0 1 17.1 I 77.0 59.9 
5.8 I 3.5 I 33.3 25.9 
~~ 
I II 
Operation 
MOVE0 
I 48 bytes I 48 bytes 16 bytes 48 bytes 
I 1.83 I - - - 
MOVE 1 
MOVE 2 
1.83 - I 1.74 - 
1.83 - I -  - 
transmitted around the broadcast ring for updating subnets after a move. There are two other 
MOVE3 
types of messages used, one is for sending the original net data from the host to the nodes and 
1.83 I 1.83 1.74 - 
- I 
the other is for sending the final routing data from the nodes back to the host. These. however. 
do not affect the speedup of the algorithm and are not discussed here. 
4.73. Speedup calculations 
Assuming a 16 node hypercube. the ratio of moves is 15:15:1:1, respectively. for MOVE 0. 
MOVE 1. MOVE 2. and MOVE 3. which means that during every iteration in which pairs of 
nodes are evaluating a move. at least one pair will be performing either MOVE 0 or AMOVE 1. 
MOVE 0 and Move 2 are bottleneck moves because of the computation required. The average 
time for the bottleneck moves then can be found by weighing each move by its probability of 
occurrence. The average move computation time per iteration is then 60.6 nu. Since there is an 
43 
equal probalility of selecting intcrprocessor moves and intraprocessor moves, approximately 
half of the node pairs will evaluate a single move and the other half will evaluate parallel 
moves. Thus there are usually 0.75 X P moves at once. or for the sixteen-node case. twelve 
parallel moves. Including the communication costs for messages m 1. I 1. and 12 gives a worst- 
case move time of 62.5 m e c  for one iteration. Using a tree-based broadcast strategy, the com- 
munication time is log P X 1.83 mec. The update computation time is 18.0 m, giving a total 
time of 
(4 x 1.83) + 18.0 + 62.5 = 87.8 m e c  
For the uniprocessor case. twelve moves would require 
(11.25 X 59.5) + (0.75 x 77.0) + 14.0 = 745.6 m e c  
to complete, resulting in an overall speedup of 8.4 on a 16 processor hypercube. 
44 
CHAPTER5 
CONCLUSIONS 
5.1. Summary of Results 
In this thesis. we have presented serial and parallel algorithms for channel routing using 
simulated annealing. Simulated annealing is a powerful optimization tool and we have demon- 
strated its use in a new uniprocessor channel routing algorithm. This algorithm permits maxi- 
mal freedom to the nets in the channel being assigned to achieve near-optimal results. 
The algorithm has been parallelized for implementation on a hypercube computer. The 
channel is partitioned horizontally by tracks. and adjacent nodes of the hypercube cooperate in 
parallel to gradually improve the state of the routing. The data have been partitioned to try to 
minimize the overhead of message passing between pairs and complete updates. 
5.2 Convergence Issues 
One important issue to consider carefully in the design of any algorithm for simulated 
annealing is how quickly the algorithm will converge. There is always a tradeoff between the 
total number of moves attempted and the time taken to evaluate each move. It is an intractable 
problem to analyze the problem enough to find one move that would solve the whole problem, 
and it would take extremely large numbers of moves to solve the problem without analyzing 
any of them. Between those extremes is the optimal point for minimizing total time to con- 
verge. To find that point it is necessary to perform many tests on various strategies and param- 
eters for selecting and evaluating moves. 
One of the main features of the algorithm presented here is the allowing of channel states 
at high temperatures that would be unacceptable as the final solution. These states usually 
include overlap between nonconnected wires. It has been shown that for certain annealing algo- 
rithms convergence is guaranteed. but since illegal intermediate States are allowed, there is no 
I 
I 
I 
I 
1 
1 
1 
1 
8 
8 
I 
I 
1 
8 
1 
1 
I 
I 
8 
45 
longer any guarantee of convergence. For this reason, it is essential to evaluate all aspects of the 
algorithm carefully. 
5.3. Applicability of Simulated Annealing 
Part of the issue of applicability of simulated annealing to channel routing involves the 
convergence question. If convergence is not achieved. nearly all of the time in a reasonable 
amount of time, then the problem. by nature. may not be well suited for simulated annealing. 
For the algorithm presented, good convergence was achieved for small cases. especially for the 
serial version. However, for large examples. the quality of the results dropped off. This may be 
due in part to improper selection and evaluation heuristics. 
Another aspect concerns the nature of the problem itself and how the current model affects 
it. For simulated annealing, the choice of neighboring spaces is very important. The algorithm 
of Leong. Wong, and Liu [ll] only allowed legitimate solutions to be in the neighboring space of 
a current channel state. This greatly reduces the number of possible moves. 
The algorithm we propose here allows any possible permutation of the current channel 
state to be in its neighboring space. It is then much harder to determine the best state to select 
next. so much more computation is needed. There is another tradeoff here between the benefits 
of the new algorithm's flexibility and the added work to determine the next state. This is an 
important area of future research. 
5.4. Parallelizability of the Channel Routing Algorithm 
How to write parallel algorithms has been a lively topic over the past decade. There are 
many ways to look at the parallelization of the serial simulated annealing algorithm for channel 
routing, and the method presented in this thesis is the way we determined to be the best suited 
for the hypercube facilities available. Our approach can be looked at as a parallelization of indi- 
vidual moves. or multiple moves at once. Another approach is to parallelize the computation of 
a single move. hopefully providing a high enough computation / communication ratio to be 
46 
effective. Sets of moves can be parallelized in which each node performs serial annealing on the 
entire channel for a k e d  number of moves. then all nodes combine and take the best of the 
results. This method may be an effective alternative. but it seems to sidestep parallel algorithm 
designing. 
For the algorithm presented. there are a few issues of concern. First. the cost of updating 
each node after every set of parallel moves is high because of the overhead. Research into par- 
tial or delayed broadcasting is necessary. Second. the overall parallelism is limited. Most chan- 
nels for routing in industry consist of fewer than 100 tracks. It  is impractical to distribute 
those tracks to more than 32 hypercube nodes. The theoretical speedup is limited to 32. asmm- 
ing linear speedup. which is unlikely. Other ways to partition the problem to allow for higher 
ranges of speedup should also be looked into. 
55 .  FutureResearch 
Throughout this thesis work. a good. solid base algorithm for simulated annealing has been 
developed. The current results indicate a need for more research into the areas mentioned 
above. Furthermore. other applications of this algorithm should be studied. 
8 
1 
I 
I 
I 
I 
I 
8 
I 
I 
8 
I 
I 
I 
I 
1 
I 
I 
I 
I 
I 
I 
I 
I 
8 
I 
I 
I 
I 
I 
47 
REFERENCES 
T. C. Nu and E. S .  Kuh. VLSI Circuit Layout: Theory and Design. New Yorlc, NY: IEEE, 
C. Y. Lee, "An algorithm for path connection and its applications." IRE Tram. Electron. 
A. Hashimoto and J. Stevens, '"Wire Routing by Optimizing Channel Assignment," Boc. 
8th Design Automutwn Conf.. pp. 214-224. June 1971. 
D. Deutsch. "A Dogleg Channel Router." Rm. 13th Design Automation Conf. pp. 425- 
433. Jun. 1976. 
T. Yoshimura and E. S. Kuh. "Efficient algorithms for channel routing." IEEE Trans. 
Compufer-Aided Design. vol. CAD-1, pp. 25-25, Jan. 1982. 
R. L. Rivest and C. M. Fidducia, "A Greedy Channel Router," Roc. 19th Design 
Automation Conf.. pp. 418-424. Jun. 1982. 
J. Reed, A. Sangiovanni-Vincentelli. and M. Santomauro. "A new symbolic channel 
router: YACR2." I .  Transactwns Computer-Aided Design. vol. CAD-4, pp. 208-219, 
July. 1985. 
M. Bumein and R. Pelavin, "Hierarchical Channel Router." Roc. 20th Design Automation 
Conf.. pp. 591-597, June 1983. 
R. Joobbani and D. P. Siewiorek. "WEAVER A KnowledgcBased Routing Expert." I€E& 
Design and TeJt of Computers. vol. 3. no. 1. pp. 12-23. Feb. 1986. 
H. Shin and A. Sangiovanni-Vincentelli. "Mighty: A 'Ripl.Jp and Reroute' Detailed 
Router." Roc. I d .  h f .  Computer-Aided Design. pp. 2-5. Nov. 1986. 
H. W. Leong, D. F. Wong. and C. L. Liu. "A Simulated Annealing Channel Router." Roc. 
22nd Design Automation Gmf.. pp. 226-228. June 1985. 
S. Kirkpatrick. C. D. Gektt. and M. P. Vecchi. "Optimization by Simulated Annealing." 
Science. vol. 220. pp. 671680. May 1983. 
N. Metropolis. A. W. Rosenbluth. M. N. Rosenbluth. -4. H. Teller, and E. Teller. 
"Equations of State Calculations by Fast Computing *Machines." J.  C h m .  Phys.. vol. 21. 
J. Lam and J. M. Delosme. "Logic Minimization Using Simulated Annealing," Roc. IEEE 
Int. Conf. Computer-Aided Design (ICCAD-86). pp. 348-351. Nov. 1986. 
C. Sechen and A. Sangiovanni-Vincentelli. "TimberWolf3.2: A New Standard Cell 
Placement and Global Routing Package." &oc. 23rd Design Automutwn Conf.. pp. 432- 
439. Jun. 1986. 
L. K. Grover. "A New Simulated Annealing Algorithm for Standard Cell Placement." 
Roc. Int. Conf. Computer-Aided Design, pp. 378-380. Nov. 1986. 
M. P. Vecchi and S. Kirkpatrick, "Global wiring by simulated annealing," I- Trans. 
M. J. Chung and K. K. Rao. "Parallel Simulated h e a l i n g  for Partitioning and Routing,.' 
Roc. I d .  Conf. Computer Design. pp. 238-242, Oct. 1986. 
S. -4. Kravitz and R. A. Rutenbar. "Multiproccsssor-Based Placement by Simulated 
Annealing." €he. 23rd Design Automation Conf.. pp. 567-573. Jun. 1986. 
hc.. 1985. pp. 3-18. 
Cornput.. V O ~ .  EC-10. pp. 346-365.1961. 
pp. 1067-1092.1953. 
cornputerr. V O ~ .  C-7, pp. 215-222. Oct. 1983. 
E251 
[261 
48 
M. Jones and P. Uanerjee, "Performance of a Parallel Algorithm for Standard Cell 
Placement on the Intel Hypercube." A.oc. 24th Design Automation Conf.. pp. 807-813. 
Jun. 1987. 
F. Darema and G. F. Pbster. "Multipurpose Parallelism for VLSI CAD on the RP3," IEEE 
Design a d  Test of Computers. vol. 4. no. 5. pp. 19-27, October 1987. 
A. Casotto. F. Romeo, and A. Sangiovanni-Vmcentelli. "A Parallel Simulated Annealing 
Algorithm for the Placement of Mam-Celk." Roc. Id. Conf. Computer-Aided Design. 
Nov. 1986. 
R. Jayaraman and R. A. Rutenbar. "Floorplanning by Annealing on a Hypercube 
Multiprocessor." Roc. Int. Conf. Computer-Aided Desip. pp. 346-349. Nov. 1987. 
H. W. Leong. Routing problems in the physical design of integrated circuits. Ph.D. 
dissertation, Dept. of Computer Science. Univ. of Illinois at  Urbana-Champaign, January 
1986. 
E. H. L. Aarts and P. J. M. van Laarhoven, "A New Polynomial-Time Cooling Schedule." 
Roc. Int. Conf. Computer-Aided Design. pp. 206-208, Nov. 1985. 
M. D. Huang. F. Romeo. and A. Sangiovanni-Vincentelli. "An Efficient General Cooling 
Schedule for Simulated Annealing." A.oc. Int. Conf. Computer-Aided Design. pp. 381- 
384. Nov. 1986. 
R. M. Kling and P. Bancrjee. "ESP: A new Standard Cell Placement Package Using 
Simulated Evolution." h. 24th Design Automation Conference. pp- 60-66. Jun. 1987. 
L. K. Grover, "Standard Cell Placement Using Simulated Sintering." 24th Design 
Automation Confersncc. pp. 56-59. June 1987. 
Hypercube Simulator - InteTtrrJ Roduct Description, Version 3.0. Intel Corporation. Oct. 
1986. pp. 1-15. 
J. P. Hayes. T. N. Mudge. Q. F. Stout. S. Colley. and J. Palmer. "Architecture of a 
Hypercube Supercomputer." Roc. Int. Gmf. Pardel  Processing. pp. 653-660. Aug. 1986. 
iPSC Simulator M a d .  Intel Corporation. Oct. 1986. 
iPSC System Overview. Intel Corporation. Nov. 1986. 
iPSC Rogrcv~ler's Reference Guide. Intel Corporation. Mar. 1987. 
M. H. Jones. A pardel  simulated annealing algorithm for standard cell placement on a 
hypercube computer. M.S. thesis. a p t .  of Electrical and Computer Engineering. Univ. of 
Illinois at  Urbana-Champaign. January, 1987. 
L. 34. Xi. C. King, and P. Prim. "Parallel Algorithm Design Considerations for Hypercube 
Multiprocessors." Roc. Int. Conf. on Parollel Processing, pp. 717-720. Aug. 1987. 
M. Jones and P. Banerjee. "An Improved Simulated Annealing Algorithm for Standard 
Ceil Placement," Roc. Znt. Conf. Computer Design. Oct. 1987. 
R. A. Rutenbar and S. A. Kravitz. "Layout by Annealing in a Parallel Environment." 
Roc. Int. Conf. Computer Design, pp. 434-437. Oct. 1986. 
iPSC Software I n t d  Speci$catiim. Intel Corporation. March 1987. 
T. H. Dunigan. "Hypt.tcube Performance." Roc. SIAiM 2nd Conf. on Hyper& 
M J t i p r o c c s ~ o r ~ .  p ~ .  178-192.1986. 
~ 
I 
1 
I 
I 
I 
8 
I 
I 
1 
I 
I 
I 
I 
I 
I 
I 
8 
I 
I 
la REPORT SECURITY CLASSIFICATION 
21. SECURITY CLASSIFICATION AUTHORITY 
Unclassified 
Approved for public release; 
distribution unlimited Zb. OECUSSIFICC\TION I DOWNGRADING SCHEDULE 
1 b. RESTRICTIVE MARKINGS 
None I 
3 OlSTRlBUTlON / AVAILABILITY OF REPORT ! 
I 
4 PERFORMING ORGANIZATION REPORT NUMBER61 5. MONITORING ORGANIZATION REPORT NUMBER(S) I 
(CSG-84)  UILU-ENG-88-2213 
b. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 
Coordinated Science .Lab (If applicable) 
University of Illinois N/A 
6 ~ .  ADDRESS (city, srato, a d  Z I P C O ~ ~ )  
7r. NAME OF MONITORING ORGANIZATION 
NASA 
7b. ADDRESS (City, Stat.. a d  ZIP Cod.) 
1101 W .  Springfield Avenue 
Urbana, IL 61801 
ea. NAME O f  FUNDING /SPONSORING ab. OFFICE SYMBOL 
ORGANIZATION (/f applKablo) 
NASA N/A 
8C. AOORESS (City, state, d d  ZIPCod.)  
NASA Langley Research Center 
Building 1268A 
Hampton, VA 23665 
Nasa Langley Research Center 
Hampton, VA I 
9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 
NAG- 1-6 13 
10 SOURCE OF FUNDING NUMBERS 
PROGRAM PROJECT TASK WORK UNIT 
ELEMENT NO. NO. No. ACCESSION NO. 
I , SUB-GRoUP Parallel algorithms, VLSI computer-aided design, channel FIELD I GROUP I 
12 PERSONAL AUfHOR(S) 
Brouwer , Randall Jav 
Technical FROM TO Februarv 1988 
13a TYPE OF REPORT 13b. TIME COVERED 14. DATE OF REPORT (Yeat, Month. Day) 
16 SUPPLEMENTARY NOTATION 
'9 ABSTRACT (Continuo on reverso if necessary and Identtfy by Mock n u m k r )  
Two algorithms for channel routing using simulated annealing are presented. Many of the 
channel routers of the past are for the most part based on greedy algorithms in which special 
heuristics are applied to generate monotonic improvement. Thest algorithms are called greedy 
because they d e r  from inappropriate selections. getting stuck at suboptimal solutions. Simu- 
lated annealing is an optimization methodology which allows the solution process to back up out 
I 
r r {  routing, simulated annealing, hypercube multiprocessors 1 
BUNClASSIFIEORINLIMITED 0 SAME AS RPT 0 OTIC USERS 
22d NAME Of RESPONSIBLE lNDIVIOUAL 
Unclassified 
22b. TELEPHONE (Include Area Cod.) 22c. OFFICE SYMBOL I 
20 DlSTRlBUrlON /AVAILABILITY OF ABSTRACT 12  1 ,  ABSTRACT SECURITY CLASSIFICATION i 
00 f ORM 1473,84 MAR 83 APR odrtlon may bo used until exhausted 
All other edltlans are obsolotr. 
SECURITY CLASSIFICLlTlON OF THIS PAGE 
UTTCUS S IF1 ED 
ORIGINAL PAGE 
POOR QUUm 
L’NCUS S IFIED 
sacumw CLAUICICATION oc TWIS CAW 
the annealing process. it is very likely that the optimal solution to an NP-complete problem 
such as channel routing may be found. Previous simulated anneafing channel routers ody per- 
mitted transformations which resulted in a routing without overlapping between nonconnected 
wires. The algorithm presented here proposes very relaxed restrictions on the types of allow- 
able transformations, including overlapping nets. By freeing that restriction and controlling 
overlap situations with an appropriate cost function. the algorithm becomes very flexible and 
can be applied to many extensions of channel routing. The selection of tha transformation util- 
izes a number of heuristics. still retaining the pseudorandom nature of simulated annealing. 
The algorithm has been implemented as a serial program designed for a workstation, and a 
parallel program designed for a hypercube computer. The details of thc serial implementation 
are presented, including many of the heuristics used and some of the resulting solutions. A 
description of the Intel ipsC Hypercube is given. details on how the channel routing problem 
was partitioned onto the hypercube are discussed. and results for an example and some perfor- 
mance calculations are presented. Finally, some concluding remarks are made concerning the 
applicability of simulated annealing to the channel routing problem. and some possibilities for 
future research work are discussed. 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
UNCWSSIFIED 
JCCURITY CL*sSIfICATION O f  THIS C4CE 
