A parallel algorithm for channel routing on a hypercube by Banerjee, Prithviraj & Brouwer, Randall
b . 
8 8 
. I  
Randall Brouwer and Mhvira j  B4nerjee 
Computer Systems Group 
Coordinated Science Laboratory 
University of Illinois at Urbana-Champaign 
1101 W. Springfield Ave. 
Urbana, IL 61801 
(217) 333-6564 
. 
In this paper we present a new parallel simulated annealing algorithm for channel routing on 
a P processor hypercube. The basic idea used is to partition a set of tracks equally among proces- 
sors in the hypercube. In parallel, P / 2  pairs of processors perform displacements and exchanges of 
nets between tracks, compute the changes in cost functions, and accept moves Using a parallel 
annealing criteria. Through the use of a unique distributed data structure, we are able to minimize 
message traf€ic and add versatility and ef6ciency in a parallel routing tool. The algorithm has been 
implemented and is being tested on some of the popular channel problems from the literature. 
/6ASB-CR-180563) A PABALLEL ALGCEITfiH EGB N87-265 18 
C b k N P E L  BC;OTIIE;G G 1  A BYPERCflII ( I l l i n o i s  
U G ~ V - )  12 p A v a i l :  h l l S  EC B O i / l ! ! E  A 0 1  
CSCL 09B Unclas  
G3/61 0 0 6 9 6 1 0  
Acknowledgement This rucuch was supported in p u t  by the National Aeroarutia and Spux Administration under 
contract NASA NAG 1-613. Pleue addm all corrupondcnca to seeond author. 
' '  * 
https://ntrs.nasa.gov/search.jsp?R=19870017085 2020-03-20T09:49:21+00:00Z
b . 
1 
1. Introduction 
Over the past few years. much research has been directed toward ways to apply simulated 
annealing, a multivariate optimization technique [l]. to many difficult problems in computer aided 
design. Some of these include logic minimization [2], cell placement i3.41, global routing [SI. and 
detailed routing [6]. These research efforts have demonstrated that ncar-optimal results can be 
achieved for NP-hard problems using simulated annealing. The major hawback to the use of 
’ simulated annealing is the excessively long run times required to achieve good results. Some recent 
work has applied simulated annealing to parallel architectures to reduce the long run times. 
Extremely good results have been shown for parallel standard cell placement algorithms [7,8,9,10] 
and parallel global routing algorithms Ell]. 
.. 
The problem of channel routing deals with a rectangular wiring area called a channel with 
pins on the top and bottom edges of the channel, and a collection of nets which are sets of pins that 
must be interco~e~ted. Nets are touted with horizontal wire segments on one layer and vertical 
wire segments on another. Connections between the two layers are made through via holes. The 
objective of a channel router is to interconnect al l  the nets so as to minimize various criteria such as 
the area of the channel and the number of vias used. Several algorithms for channel routing exist 
in the literature that apply heuristics. Since they are greedy algorithms, they have the possibility 
of getting stuck at local minima [12,13.14.15,161. 
Recently Leong, Wong, and Liu [6] have proposed a uniprocessor simulated annealing channel 
routing algorithm: however, long run times are required to achieve good routings. The algorithm 
they employed requires the detection of cycles in the vertical constraint graph: however. it is not 
feasible to detect cycles in a parallel processing environment. Furthermore. their algorithm is 
unable to handle switch-box routing. obstacle avoidance, and unrestricted doglegging, all of which 
are important in any good routing tool. 
2 
In this paper, we present a new parallel simulated annealing algorithm for channel routing. 
Parallelizing a uniprocessor algorithm should provide faster run tines. Additionally, we hope to 
achieve better convergence due to the parallel state changes as was experimentally observed in a 
parallel algorithm for cell placement 1101. The algorithm we propose is also more versatile, as it 
can easily be extended to include switch-box routing, unrestricted doglegging, and obstacle 
avoidance. Unlike the simulated annealing channel router mentioned earlbk.  our algorithm permits 
overlap between distinct nets in early stages of the annealing process. allowing more freedom for 
getting out of local minima and finding a global minimum. 
2 Description 
21. Parallel4hAlitccture 
The algorithm we present in this paper is targeted for implementation on the Intel iPSC 
Hypercube Qmputer. A hypercube computer is a message passing architecture consisting of P = 2d 
processor nodes in which each node is directly connected to d other nodes. Communication between 
the processor nodes is restricted to passing variable sized messages between adjacent nodes. Figure 
1 shews a three-dimensional hypercube. 
To partition the data uniformly among all processors of the hypercube, adjacent tracks of the 
channel are grouped together and assigned to a single processor. W e  de6nt the processor domain 
PDi as the set of adjacent tracks along with all nets currently assigned to the tracks of the channel 
over which processor Pi is given control. Consecutive domains are assigned to adjacent processor 
nodes in the hypercube topology to provide a balance in communication distance. Figure 2 shows a 
channel partitioning for a hypercube with dimension dm3. This partitioning arrangement allows 
for net displacement to adjacent domains and to other domains separated by large vertical distances 
3 
in the channel. At high temperatures in the annealing process. the algorithm can allow moves 
across both small and large distances in the hypercube. At lower temperatures, moves along 
dimensions that correspond to large vertical distances are inhibited. 
2.3. Movw 
In the restricted doglegging version currently implemented, four move types are provided for 
permuting the current state of the channel into a new state. Given that Pi and P’ are connected in 
dimension k of the hypercube, the moves are as follows: 
Move 1: Pi and PI independently displace a net from one track in their respective domains to 
another track in the same domain. 
Move 2: Pi displaces net from track in PDi to PD, of Pi. 
Move 3: Pi and PI independently exchange track assignments of two nets in their respective 
, domains. 
Move 4: Pi exchanges net from PDi with net from PDI of PI. 
Sequences of these exchange and displace moves are sufTicient for all permutations of the channel 
state. These moves are chosen randomly, with relative frquencies of 4:4:1:1 respectively. 
24. Algorithm 
The annealing algorithm we use is outlined in Figure 3. The value of inner-loop_count is 
specified to be 100 times the number of nets in the given channel routing problem. During each 
iteration of the inner loop, each of the d dimensions of the hypercube is sequenced through in 
which each of the P/2 processor pairs in dimension k attempt one of the four types of moves in 
parallel. 
4 
2s. AMealingschdule 
The annealing temperature is adjusted based on the following schedule: 
T m  =AtPHA(T) X Tau, 
in which the function ALPHA (T) ranges from 0.8 for large values of T to 0.95 for small values of 
T. This schedule allows more permutations at low annealing temperaturk to make many small 
improvements. 
To determine the initial temperature, 100 random moves with a positive cost change are 
evaluated without accepting any of them. The average cost change A C O S A V ~  for those moves is 
then calculated. and we solve for  TIN^^ as follows: 
26. CostFunction 
The cost for a given state of the channel is a function of the amount of overlap between 
unique nets(0L). the length of the nedNL). the width of the channel(WC), and the fraction of the 
track not occupied by nets(FU). For each move, the cost change incurred if the move was accepted 
is calculated as follows and used to determine move acceptance. 
Since move costs are calculated in parallel. the calculated cost change is only an estimate because it 
does not account for interactions on the channel state by other processor pairs accepting moves. 
Jones and Banerja [lo] have shown, experimentally. that this property of parallel simulated 
annealing improves the overall convergence for the cell placement problem. We are expecting to see 
the same benefits in the channel router problem. 
. '  
5 
27. Distributed Data Strumare 
Since a hypercube computer is a message passing local memory parallel architecture, there is 
no shared memory, and one cannot assume the use of a central data structure for storing all of the 
channel state information. We therefore propose a distributed data Structure among processors in 
the hypercube such that each processor only stores the information that it needs for performing its 
computations. The data structures we propose help minimize the a m o k  of message passing 
required, reduce the memory space used for storing the necessary data, and take advantage of the 
fact that the cost of a message is almost independent of the message size. For each net n in PDf of 
pi the positions of the horizontal and vertical segments of net n, along with the positions of all 
other vertical segments of nets also occupying the columns of net n must be stored. All of this 
data is necessary for calculating the expected overlap, channel width, and net length changes for a 
given move. The data-tures used for storing the track and column data for one processor node 
is shown in Figure 4. 
2.8. NetLocationUpdat.Ang 
To ensure straightforward and accurate updating of net positions in the new channel state. the 
position data for those nets is passed from node to nod0 along a Hamiltonian cycle through every 
node of the hypercube. A Hamiltonian cycle in a graph is defined as a cycle in a graph which 
traverses every node of the graph exactly once. A Hamiltonian cycle in a 3 - d b d o n a l  hypercube 
is shown in bold lines in Figure 1. Each node updates the data it has and then forwards the mes- 
sage to the next node along the cycle. All updating completes within P time steps. 
39 Implemestation 
We have implemented the above algorithm using 3500 lines of C code using an Intel Hyper- 
cube SimulatodVersion 3.0) running on a Sun 3/50 workstation operating under Sun Unix 4.2. 
The initial version of our program was debugged one week ago. We are presently carrying out tests 
6 
of our parallel algorithm on various test cases. Figure 5 shows an example solution of a problem 
found in the literature [131. 
Figure 6 shows a plot of the annealing channel cost as a function of temperature. We will be 
reporting the results of our algorithm for many of the other conventional channel routing test cases 
in the h a 1  paper at the conference, and we plan to implement this version on an actual hypercube 
and report on the performance (ie. speedup. etc.) at the conference. 
4. Conclusions 
In this paper we have proposed a new parallel algorithm for simulated annealing channel 
routing for implementation on a hypercube computer. By the use of a novel distributed data struc- 
ture and partitioning of the Ehannol, we have a versatile algorithm for channel routing that is 
easily extensible to switch-box routing and obstacle avoidance routing. 
7 
[l] 
E21 
[31 
[41 
[51 
161 
[71 
S. Kirkpatrick. C. D. Gelatt. and M. P. Vecchi. "Optimization by Simulated Annealing," 
Science. vol. 220. pp. 671-680, May 1983. 
J. Lam and J. M. Delosme. "Logic Minimization Using Simulated Annealing." A.m. IEEE Int. 
Conf. Computer-Aided Design (ZCCAD-86). pp. 348-351. Nov. 1986. 
C. Sechen and A. S. Vincentelli, "TimberWolf3.2: A New Standard Cell Placement and 
Global Routing Package." Rae. 23rd Design Automation Conf. .. pp. 432-439. Jun. 1986. 
L. K. Grover, "A New Simulated Annealing Algorithm for Standard Cell Placement," Roc. 
Int. Conf. on COmputer-Aided Design (ICCAD-86). pp. 378-380. Nov. 1986. 
M. P. V w h i  and S. Kukpatrick, "Global Wiriig by Simulated Annealiig." IEEE 
Transactions on COmputet-Aided Design. v01. CAD-2, No. 4, pp. 215-222. October 1983. 
H. W. Leong, D. F. Wong, and C. L. Liu. "A Simulated Annealing Channel Router." Roc. 
22nd Design Automation Conf... pp. 226-228. June 1985. 
A. Casotto. F. Romeo. and A. Sangiovanni-Vincentelli. "A Parallel Simulated Annealing 
Algorithm for the P-lacunent of Macro-cells." Roc. IEEE Int. Gmf. COmputet-Aided Design 
( I C C M ) .  pp. 30-33. Nov. 1986. 
R. A. Rutenbar and S. A. Kravitz. "Layout by Annealing in a Parallel Environment." Roc. 
IEEE Int. Gmf. on Computer Design fICCD-86). pp. 434-437. Oct. 1986. 
P. Eberjee and M. Jones. "A Parallel Simulated Annealing for Standard Cell Placement on a 
Hypercube Computer." Roc. IEEE Int. Gmf. Computer-Aided Design (Iccturss), Nov. 
1986. 
M. Jones and P. Banerjce. "Performance of a Parallel Algorithm for Standard Cell Placement 
on the Intel Hypercube." A.M. 24th Design Automation G m f . ,  June 1987. 
M. J. Chung and K. K. Rao. "Parallel Simulated Annealing for Partitioning and Routing," 
ROC. IEEE Int. Conf. 
A. Hashimoto and J. Stevens, "Wire Routing by Optimizing Channel Assignment," Roc. 8th 
Design Automation Gmf... pp. 214-224. June 1971. 
T. Yoshimura and E. S. Kuh. "Efficient Algorithms for Channel Routing." IEEE Trans. 
Computer-Aided Design. vol. CAD-1, pp. 25-25. Jan. 1982. 
R. L. Rivest and C. M. Fidducia, "A Greedy Channel Router." Roc. 19th Design Automution 
Conf. .. pp. 418-424. June 1982. 
[81 
[91 
[lo] 
[111 
[121 
[131 
1141 
Coonprtst Design (ICCD86). pp. 238-242. Oct. 1986. 
1151 , D. Deutsch. **A Dogleg Channel Router." Roc. 13th Design Automation. pp. 425-433. June 
1976. 
[161 M. Burskin and R. Pelavin. "Hierarchical Channel Router." Proc. 20th Design Automatwn 
Conf.. pp. 591597. June 1983. 
8 
Figure 1. 3-~imensional Hypercube Showing a Hamiltonian Cycle 
e 
I-- 
} ---- 
I-- 
}-- 
I-- 
I- 
}- 
}--* 
Figure 2. Channel Map onto Hypercube of 3 Dimensions 
9 
STEP 1. Perform track assignments to P processors. 
STEP 2. Determine initial annealing temperature. 
STEP 3. While "Stopping criteria" : temperature < I not reached 
STEP 4. Generate new temperature according to annealiig schedule 
STEP 5. For inner-loop-count - 1 to USER-PARAMETER 
STEP 6. For each dimension k - 0 to log(P)-l do 
STEP 7. Randomly select P/2 moves (exchange or displacement of nets) in parallel among pairs of 
PES connected in dimension It. 
STEP 8. Evaluate change in cost for each move between pairs of PES indep&dently. 
STEP 9. Acceptlreject moves based on exponential function independently. 
STEP 10. Broadcast new net locations to all other processors using Hamiltonian cycle. 
STEP 11. ENDFOR; ENDFOR; ENDWHILE; 
Figure 3. Parallel Algorithm for Channel Routing 
c 
4 . .  
10 
Figure 4. (a> Track Data Structure (b) Column Data Structure 
.. I c 
11 
a a 
1 I 
a a 
I 
a 
a 
I 
I I I I m 
Figure 5. Example Routing Solution 
4o0001 
I I I I I I 
0.1 1 10 100 lo00 loo00 
Figure 6. Temperature vs. Cost 
Annealing Temperature 
