Abstract
Introduction
The recent advances in the VLSI technology allow the fabrication of highly complex circuits on single chips. Time-consuming software tools are needed to successfully design such chips. Parallel processing has become an essential approach to handle the rapid growing computational complexity. During the past few years, many parallel algorithms have been developed for various VLSI physical design problems. In this paper we present a parallel algorithm for global routing in general cell layouts.
Most physical design systems divide the whole process into three phases: Floorplanning (or placement), global routing and detailed routing. Floorplanning generates a relative position for all cells based on the connectivity and the dimensions of the cells. Global routing then determines the rough topology of the interconnections (nets) among the cells so as to achieve a uniform and low wiring density. Finally, detailed routing transforms the cells and rough wires into a geometric layout. The methods used for global routing depend greatly on the layout structure. In standard cell or gate array layouts, cells are placed in rows or in a matrix pattern; while in general cell layouts, cells with arbitrary sizes are placed freely in the layout area. Given the many degrees of freedom, the routing problem for general cell layout is more difficult to solve.
The first generation of parallel routing algorithms were designed for special purpose machines, such as 2-dimensional array processors [9, 121. The basic scheme is that each processor routes one net a t a time. The same scheme was also applied to algorithms designed for hypercubes [lo, 131 and shared-memory machines [ll] . This "one at a time" scheme has data d e p e n d e n c y problem: the cost of a route depends on the routes already chosen by other nets. In order to relax this dependency, one has to rely on some inaccurate data, which can cause deterioration of the routing quality.
Hierarchical approaches evaluate all nets simultaneously and take into account the global information to decompose the routing problem into smaller subproblems. Hierarchical decomposition can be done in a way such that subproblems have independent routing areas and netlists. Therefore, this approach offers a natural way for parallelization. Hierarchical global routing was first proposed by Burstein and Pelavin for gate array layout [l] . Luk [a] . Our parallel algorithm adopts the method of [7] to reduce each routing problem into up to four subproblems. We also modified the methods of [4, 81 to decompose nets among the subproblems.
Section 2 provides an overview of our hierarchical routing approach. Section 3 discusses the design of the parallel algorithm in a shared-memory environment. Experiment results are given in Section 4. Conclusions are presented in Section 5.
Overview of the Hierarchical Approach
The global routing problem is given by a floorplan, which describes the physical locations of cells in the chip, and a netlist, which defines the interconnections among the cells. The floorplan represents a partition of the chip into a set of rectangles with each rectangle containing a cell. It also defines a routing graph: each cell is represented by a vertex, two vertices are connected by an edge if the corresponding cells share a common boundary. Each edge has a capacity which is the maximum number of wires which are allowed to cross the boundary.
Routing a net is to find a Steiner tree connecting all pins of the net in the routing graph. The goal of the global routing problem is to find a set of tree connections for all nets satisfying the constraints, and if possible, minimize the total wire length. The main constraint is that the number of wires allowed to use any edge of the routing graph is limited by its capacity.
For hierarchical routing, we consider a special class of floorplans known as slicing structures. A floorplan is a slicing structure if it is a single rectangle or if it can be cut by a line into two slicing structures. A slicing structure can be considered as the result of a recursive partitioning in a simple pattern, 4-partition, and it degenerate forms. A 4-partition is obtained by first applying a horizontal or vertical cut and then applying a perpendicular second cut on each side of the first cut. There are two degenerate cases: In 3-partition only one of the two second cuts is applied, while in 2-partition none of the second cuts is applied. This recursive partitioning can be represented by a slicing tree with each node representing a partition (cf. Figure 1) . A 4-partition (and its degenerate forms) can be considered as a floorplan with up to 4 rectangles. Cells in each of these rectangles form a super-cell. The corresponding routing graph is called a super-graph. It has up to five super-edges, whose capacities are the sum of the capacities of the corresponding member edges.
Our hierarchical algorithm uses the slicing tree to guide the routing procedure in a top-down manner. At each level of the slicing tree, a node corresponds to a routing task, which consists of 1) routing in the supergraph and (2) decomposition o 1 nets among the supercells. Routing in a super-graph is also a, global routing problem, which, due to the simple graph structure, can be solved efficiently by integer programming. The result of the super-graph routing determines which super-edge each net will take. In order to decompose the nets into independent subnets, nets also have to be assigned to individual edges of the super-edges. We apply a modified min-cost flow algorithm to solve the net assignment problem. After the net decomposition, the global routing problem is divided into up to four totally independent subproblems, each of which is represented by a node at the next level of the slicing tree. The top-down procedure proceeds until t,he bottom level of the slicing tree is reached and all super-cells become actual cells of the chip.
Routing in a Super-Graph
Super-graphs only have three different structures: A routing graph of a 2-partition b. nz is the t80tal number of route patterns for net type i.
We then assign the nets of type i to the different route patterns.
Net Decomposition
For every net the route pattern obtained during the super-graph routing determines which super-edges the net will take. We have yet to assign the nets to individual edges contained in the super-edges. Luk et al. [7] proposed a top-down refinement approach, which assigns nets to super-edges of the next level and hence requires to consider the entire routing problem at each level of the hierarchy. Here we follows a different strategy: assign nets directly to the routing graph edges of the original floorplan. the edge that crosses the boundaries between cells A and B , we can create a pseudo-pin on both cells hollow circles) and divide the net into two subnets, w h ich can be routed independently. Therefore, assigning net directly to individual cell boundaries can divide routing problems into independent subproblems.
The goal of the assignment problem is to minimize the total wire length of the nets without violating the capacity constraints. We follow the ideas by Lauther [4] and Marek-Sadowska [8] to formulate the problem as a min-cost flow problem as follows. We build a network which contains two sets of nodes along with the source s and sink t , one set V, = {ui} representing nets, the other set V b = {vi} representing cell boundaries (cf. Figure 4.1) . Remember that the cell boundaries are dual to the edges of the original routing graph. Edges (s, ui) and ( v j , t ) have non-zero capacities and zero costs; while edge (ui, vj) have non-zero costs c [i, j ] and unlimited capacities.
First we assume that each net crosses the super-cell boundary only once. Then the.capacity of ( s , ui) is set to 1 and the capacity of (vj, t ) is equal to the capacity of the cell boundary; while c[i,j] is the minimum length to connect the two parts of net U ; via cell boundary v j . , The min-cost maximal flow on this network will deliver an assignment of nets to cell boundaries, which is optimal under the above assumption. The issue of allowing multiple crossings was discussed in [4, 81 without mentioning the problem that increasing crossings for some nets may affect the routability of other nets. Our solution to the problem is as follows. First we solve the assignment problem under the constraint of single crossings. Then we construct for each net a minimal spanning tree connecting its pins and its crossing point. For each edge e of the spanning tree which cross the boundary, replace it by another edge e' which does not crosses the boundary (cf. Figure 4.2) . The difference of their lengths is the gain of e. The crossing edges of all nets are sorted based on their gains, and each of them is given a priority number accordingly. Then we update sthe network functions: cap [ui] is the number of crossing edges of net ui; cap[vj] is the free capacity of boundary vj after the first assignment; c [ i , j ] is the priority number of the spanning tree edge of net u i which crosses boundary v j . The second round of min-cost flow on the network then gives an assignment of nets with multiple crossings without changing the routability. 
The Parallel Routing Algorithm
The important aspects of our parallel algorithm has already been presented in the last section, such as slicing tree guided task generation and the hierarchical net decomposition. In this section we describe more about the organization of our algorithm and further exploit parallelism in individual routing tasks.
The Overall Structure
The parallel algorithm takes a top-down approach. For each node in the slicing tree a routing task will be generated. Each task includes performing global routing in a super-graph by means of integer programming and solving the corresponding net assignment problem by finding min-cost flows. The parallel execution of the node tasks follows a breadth-first scheme: Tasks at the same level of the slicing tree will be carried out in parallel by the available processors. Because of our decomposition strategy, these tasks (at the same level) are completely independent from each other. All information exchanges follow through global (shared) data structures. Each processor gets the global data before starting a task and update the global data after completing the task. There is no need for date exchanges among processors during task execution periods. After all tasks a t a slicing tree level are completed, a new set of independent tasks corresponding to the next level nodes are generated and ready to be assigned to the available processors for parallel execution.
Parallelization within a Routing Task
Exploitation parallelism in individual routing tasks is important for further speed-up of the overall routing process. It is particularly effective at higher levels of the slicing tree, where there are not enough tasks for parallel processing. On the other hand, we have to take into consideration the overheads caused by the parallel task generation and management. If the task size is too small, the overheads may outweigh the benefits gained by parallelization. After experimenting with several options, we decided to choose the net assignment for parallel processing, because solving the min-cost flow problem is fairly time-consuming. An H4 super-graph has five assignment tasks, each for one super-edge. These tasks can be executed in parallel. If the number of processors is not large, we apply this parallelism only a t the first few levels of the slicing tree, because a t lower levels there are enough nodes to be assigned to the available processors.
The basic structure of our parallel algorithm is as follows. 
Algorithm Parallel Global Routing

Experimental Results
We have implemented the parallel routing algorithm on a BBN GPlOOO machine under Mach 1000 operating system. The program contains about 7000 lines of C code. Experiments were performed on two MCNC general cell benchmarks, ami33 and ami49, and two randomly generated larger examples. Testing was done on one, four and eight processors. Table 1 
Conclusions
We have presented a parallel algorithm for global routing in general cell layouts. The algorithm takes a hierarclhical approach to decomposed the global routing problem into independent subproblems for parallel processing. It achieves very good speedup without compromising the routing quality. We are currently developing parallel algorithms for integrated floorplanning and global routing. These two design phases have close interaction with each other. Research efforts to perform them simultaneously by hierarchical decomposition have shown very promising results [3, 61. To achieve the full integration, global routing must be invoked whenever changes to floorplan are made. This approach requires much more intensive computation and therefore parallelization of the process becomes necessary.
