Multi-chip modules are now required to achieve higher system speed and greater density than the traditional single chip packages mounted on printed circuit boards. Algorithms for placement of bare dies and and routing of their interconnections on MCM substrates are reviewed in this paper. Comparisons are given to point out the strengths and weaknesses of each approach. This information can assist researchers in identifying those areas which need improvement and application designers in selecting the most appropriate algorithm for a speci c application.
Introduction
MCM physical design consists of two operations: placement and routing. Placement refers to the positioning of unpackaged or bare dies on the MCM substrate such that one or more gures of merit are optimized. Typical criteria include netlength minimization, proper heat distribution and overall system timing performance. Routing is the task of interconnecting the dies on the substrate such that one or more gures of merit are optimized. For routing, typical optimization criteria include netlength, via, layer or crosstalk minimization.
These two operations are mutually dependent, but the size and complexity of practical problems generally dictate that each one be solved separately. Each operation is order-dependent in that the placing of one die or the routing of an interconnection leave fewer options for the remaining dies to be placed or interconnections to be routed. Thus, those placements or interconnections which are considered to have a greater impact on the optimization criteria are performed rst.
MCM physical design requires consideration of many issues which cannot be handled satisfactorily with computer-aided design tools developed for integrated circuits or printed circuit boards 49, 42, 12, 16] . Factors like the number of layers in the MCM substrate, the type of bonding and the type of substrate (ceramic, laminate or deposited) must be an integral part of the optimization procedure 39]. Considering the packaging issues early in the design cycle leads to better physical design and overall superior system performance 12].
An overview of the MCM physical design process is presented in the next section. In Section 3, MCM placement algorithms are detailed along with a comparison of their strengths and weaknesses. In Section 4, MCM routing approaches are presented and compared in a similar manner. The nal section presents some general conclusions and outlines some areas for future research.
2 Physical Design of MCMs Figure 1 illustrates the operations involved in MCM physical design with respect to the overall design ow. The top-level hardware description language (VHDL or Verilog) is synthesized into a structural netlist of library components and their interconnections. For a very large design the netlist can then be partitioned into a set of di erent dies based on certain constraints such as area, thermal, delay and testability. This step is known as MCM partitioning or system partitioning. The goal of partitioning is to improve the functionality and routability of the design. After the layouts for the individual dies have been obtained, the dies have to be placed and then routed on the MCM substrate.
The actual physical design process does not include system partitioning and begins with the placement process. The system partitioning step is eliminated in cases where o -the-shelf bare dies are used as is the case with most MCM designs of today. The rst main step in the physical design process is chip placement. MCM placement involves placement of dies on the MCM substrate such that the objectives of proper heat distribution, minimization of interchip delays, noise and tolerant load distance are taken into consideration. However when the fabrication issues are considered, additional constraints in the form of net separation and via constraint become important. This is because the fabrication of densely routed designs may result in low fabrication yield 16] or a non-manufacturable design.
The next step in the physical design process is pin redistribution 19] which uses a chip layer to improve the substrate routability of the design. The pin redistribution problem can be stated as: after the placement of the dies on the MCM substrate, redistribute the pins using chip layers such that the total number of pin redistribution layers, crosstalk and the maximum signal delay are minimized. Another objective of pin redistribution is to maximize the number of nets that can be routed in a planar fashion. During pin redistribution each pin is to be assigned to the nearest point on a grid in the pin redistribution layers. Pins are redistributed uniformly with su cient spacing so that the connections between the nets and the pins can be done without any design rule violations.
The nal step in the physical design process is routing which usually consists of global routing, layer assignment and detailed routing 42]. The global routing step assigns nets or wires to di erent routing regions. Whereas the router in the detailed routing step actually nds a geometric path for each net and routes the terminals connected by the net. The primary goal of the global routing step is to reduce the length of the global routing trees since this reduces the area and delay. On the other hand, the layer assignment problem, which is in itself an NP complete problem, is a step in the routing process where each net is assigned to an x-y layer pair subject to the feasibility of routing the nets on a grid. The layer assignment step is important as it determines the number of layers in an MCM and hence the cost and the cooling mechanism required.
The MCM routing problem is di erent from the VLSI routing problem primarily due to the larger number of layers available in an MCM (e.g. 60 versus 4). Also the total number of nets and the density of interconnections in an MCM are much larger necessitating the development of a di erent set of CAD tools for the MCM design process. However at the present time most of the CAD tools for MCM physical design are either repackaged PCB or IC design tools. MCM routing has been considered as a three-dimensional routing process by 41] because the routing is not only done in the two-dimensional plane but also in the vertical direction in the multiple substrate layers. The number of nets in the design, the substrate size, the number of chips mounted on the MCM substrate and the number of layers will continue to increase progressively necessitating that the MCM routing problem be e cient. The main objectives of MCM routing are reduction of the number of routing layers and the reduction or elimination of crosstalk which nally leads to an improvement in the overall system performance.
MCM Placement Algorithms
The MCM die placement problem can be de ned as below :
Given a set of chips (dies) C and a set of chip sites S, nd a mapping : C? > S subject to the timing (delay), thermal and area constraints in such a way so as to minimize the number of layers needed for routing.
Hence some of the important goals for the MCM placement tool are: minimize the total area of the substrate, ensure proper heat distribution, minimize the total length of the wire needed for routing and ensure routability of the design in minimum number of routing layers.
MCM placement is a di cult combinatorial problem as a good placement involves a tradeo between these mutually con icting constraints. Some of the di erent placement algorithms and approaches are described next.
Performance-Driven Placement for MCMs
In MCMs some of the netlengths for connection between bare dies can be so long that they have a resistance which is comparable to the resistance of the driver. Such resistances cannot be neglected during placement when the delay due to the nets is estimated. Few approaches have been developed to perform a delay or performance-oriented placement. Performance-driven placement is very important since the interconnect delays form a major part of the system cycle time. Several performance-driven algorithms based on delay models have been developed 44, 1].
The timing constraints x the upper bounds on the netlengths and these upper bounds guide the placement process. The delay in multi-terminal nets depends on the net topology as well as the length of the net. The net topology is di cult to estimate at the placement stage of the design process making accurate delay estimation di cult. One way to get over the problem is to consider global routing and placement simultaneously as proposed in 43, 5] . A resistance-driven placement algorithm for placement in multichip modules has been developed by 42] . In this approach the net delay (cost) is modeled as the combination of the delay contributed by the wire that forms the net and the delay contributed by the sink capacitances. The second contribution is calculated based on all driver-to-sink capacitances.
Some researchers have considered the system partitioning and placement as one composite problem 42, 6] . Partitioning-based placement methods tend to spread the wiring across the layout surface and thus produce very routable placements. However it is di cult to model a truly integrated partitioning and placement approach that deals with all the issues involved in system partitioning and also routing. This approach applies for those cases in which the design of the dies has not been nalized yet MCM routing considerations are incorporated.
Conventional MCM Placement Approaches
The traditional placement methods can be categorized into two groups: constructive and iterative. Constructive placement methods take a partial placement as the input and produce a complete placement as the output. On the other hand, iterative placement methods begin with one initial guess for the placement and then re ne this placement taking into consideration certain constraints to obtain a better placement. A detailed description of these techniques is found in 51]. In the iterative placement approach probabilistic search optimization algorithms like the simulated annealing or genetic algorithms are used and the process is iterated until some stopping criteria is satis ed. In each iteration the components are moved around or rotated, and if the new con guration is better than the previous one, then the new con guration is selected.
A cost function is invoked to determine the relative merit of each con guration. Some of these iterative methods are described in 51, 13, 17, 7] Esbensen and Mazumder 15] have combined the genetic algorithm and simulated annealing algorithm to speed up the optimization search and obtain better placements compared to either algorithm alone. This approach has been tested for macrocell placement and can also be extended to placement of individual dies on the MCM substrate.
Placement Using Multiple Design Criteria
Placement of multiple dies on an MCM substrate is a non-trivial task in which multiple criteria need to be considered simultaneously to obtain a true multi-objective optimization. Most researchers have considered only one criteria or objective in the MCM placement algorithms. In most cases it is netlength minimization which results in overall area and delay minimization. An ideal placement tool should result in the smallest layout while conforming to the electrical and thermal requirements. By doing a placement considering both the thermal and netlength criteria simultaneously, it can be seen that the nal netlength after optimization is slightly higher.
However the heat spread on the MCM substrate is more uniform compared to the case when only netlength minimization is done alone.
Comparison of Di erent Placement Approaches
MCM placement is much more complicated than the conventional IC placement problem in VLSI.
Even though the number of dies in a MCM is generally well below 100, many interrelated factors determine the nal layout quality. Table 1 compares di erent MCM placement approaches. The netlength minimization approach as followed by di erent researchers is not su cient. Accurate delay estimation to account for the resistance of the nets is necessary.
MCM Routing Algorithms

Introduction to the MCM routing problem
The MCM routing problem can be de ned as follows:
Given a set of placed chips (dies) and a netlist interconnecting di erent pins on the chips, route all the nets in such a way such as to use the minimum number of routing layers and satisfy some constraints.
Some of the objectives that need to be considered for optimization are netlength minimization, crosstalk minimization, via minimization and meeting the manufacturability constraints that guarantee that the yield is high and the designs are routable.
As MCM technology develops the number of chips on the MCM substrate and the number of layers will increase dramatically causing the need for very e cient routing algorithms. For example an MCM with 100 chips and 63 layers has been reported in 50]. The main performance constraints involved in MCM routing are delay, noise and manufacturability constraints. Delay constraints are converted to netlength constraints in most cases and require consideration to ensure the the MCM functions properly at the desired clock frequency. Noise constraints are converted to path separation and parallelization constraints, and they require consideration to avoid unwanted logic switchings. In 41] the author describes in detail the fabrication constraints that should be considered for MCM routing.
As the complexity of the MCM routing increases more computation time will be required as the solution space of the combinatorial optimization problem becomes larger. A distributed computing environment may be necessary to obtain a good solution in a shorter computing time. A variety of routing approaches have been developed in recent years and are described next.
Maze Routing
The most commonly used MCM routing approach is maze routing 42, 13] . This algorithm, which was originally developed by Lee 32] , is very simple to use conceptually but su ers from many inadequacies since the quality of the maze routing solution is very sensitive to the ordering of the nets. There is no e ective algorithm for determining a good ordering of the nets in general and a large number of vias are required for routing the nets towards the end even though there are a large number of signal layers. There is also the requirement for large memory to run the algorithm. It is impossible to consider performance constraints such as delay, noise and fabrication constraints in the maze routing algorithm. The high memory requirement for solving large problems as well as the large execution times have made the maze router highly unsuitable for most MCM applications. However, the maze routing algorithm is the backbone of many commercial routers that are available for MCM designs.
Multiple-Stage Routing
In multi-stage routing, the problem is decomposed into several smaller subproblems. The whole routing cycle is broken up into pin redistribution, layer assignment and detailed routing. These steps were explained in detail earlier in Section 2. Pins on the die layer (top layer) are rst redistributed evenly with su cient spacing in such a way that connections between nets and pins can be done without any design rule violation on signal distribution layers. Then layer assignment is performed so that each net in an x-y pair of layers are assigned subject to the feasibility of routing the nets on a global routing grid in each plane pair. The problem of layer assignment is NP-complete and is discussed in detail in 25, 42, 51, 26, 19] . After layer assignment, the next step is to route the nets using the signal distribution layers. This step, which is known as detailed routing depends on the layer assignment step. The approach may use only one single layer or a layer pair. Detailed routing which has been presented earlier in this paper is discussed further in 28].
Integrated Pin Redistribution and Routing
In this approach instead of distributing the pins before routing, the algorithm redistributes the pins with routing in each layer. After routing on one layer the terminals of the unrouted nets are propagated to the next layer. This concept was developed by Khoo and Cong 9, 30] as follows.
In 9] Khoo and Cong present an integrated routing approach called SLICE for MCM routing.
Instead of the usual approach of distributing pins prior to routing, the SLICE router redistributes pins along with routing in each layer. This algorithm follows a layer-by-layer planar routing approach. The algorithm tries to connect as many nets as possible in each layer and the nets which cannot be completely routed in a particular layer are partially routed and then taken over to the next layer after scanning the routing region from left to right. After completing the planar routing in a layer the terminals of the unconnected nets are distributed so that they can be propagated to the next layer without causing local congestion.
In 30] Khoo and Cong have described a general area multi-layer router for MCMs called \V4R" since it uses no more than 4 vias for routing each of the two-terminal nets and no more than 4(k-1) vias for each k terminal net. The routing approach is somewhat similar to that of SLICE but V4R operates on x-y plane pairs instead of considering a layer at a time after the whole routing grid has been sliced into a number of layers. There is no topological routing step in V4R unlike other MCM routers. The physical routing is generated directly in V4R. The 2-D routing grid is divided into vertical channels by vertical columns. A vertical column is de ned by a grid line that contains at least one net terminal. Similarly the horizontal channels are de ned. Every net is routed using one of the two types of routing topologies: Type-1 topologies which have three vertical segments and two horizontal segments and Type-2 topologies which have three horizontal segments and two vertical segments. In each column of the grid, V4R executes the following steps: horizontal track assignment of the right terminals, horizontal track assignment of the left terminals, routing in the vertical channel and extension of the routing to the next column. V4R is able to achieve signi cant improvements over a 3-D maze router. For example V4R uses 44 % fewer vias and 2 % less wirelength compared to a maze router 30].
V4R has a much faster execution time compared to the maze router and is also very suitable for the increasingly dense MCM designs of today. The inputs to this router are a set of terminals, a set of obstacles and a set of wiring rules.
The global routing approach followed by the authors of SURF uses two principles from the area of arti cial intelligence mainly known as the least commitment principle and the notion of maximal use of information. The local routing is done one net at a time within the limits of the bin. Both the global router as well as the local router rely heavily on the data structure built on the constrained Delaunay triangulation. SURF generates a topological speci cation for exible rubber-band routing. This approach is used for MCM routing across entire MCM substrates that have no channels. The SURF router builds on previous approaches in more than one way. Flexible bin interface speci cations allow the global routing to be adjusted as more detailed information is available. By doing the layer assignment during the partitioning step and not restricting the results to the one-layer-one-direction constraint, fewer vias are used. This approach to routing represents a good balance between the need to make global decisions and satisfying the constraints imposed by the more detailed local levels.
Performance-Driven Global Routing
This approach has become one of the most important approaches to MCM routing recently since the role of interconnect in determining the performance of high-speed digital systems has been steadily increasing. It is estimated that interconnect delays account for 50% of the cycle time today, and this number will soon increase to 80%. This is especially true at the MCM level, since the wire-lengths are longer, and the o -chip drivers are bigger and contribute less to the overall delay. Thus, it is very important to consider interconnect delay in MCM routing.
In simpli ed terms, the delay of an interconnection net is a function of its topology, wire width and metal layer. Ideally, a performance-driven routing algorithm should be able to optimize all three of these simultaneously to achieve maximum performance. However, this is di cult to do in practice because of other constraints such as routability and problem complexity. Typically, the MCM routing problem is solved in three steps: (1) A global routing step determines the topology of each net on a coarse grid. Wire widths may be adjusted to minimize delays after determining the topology; (2) A layer assignment step assigns nets, or segments of nets, to di erent layers to optimize routability. Sometimes, layer assignment algorithms also consider crosstalk; (3) A detailed area router completes the routing on each layer, following the topology and width suggestions of the global routing.
This section considers the global routing step, and some aspects of the layer assignment problem. Before discussing the algorithms, it is helpful to gain an idea of the models used for the delay of an interconnection.
Delay Models
VLSI interconnect delay models either use the lumped capacitor delay model or the RC tree model. The traditional \lumped capacitance" model for an interconnect, in which the delay of a net is proportional to its length, is not accurate enough at the MCM level. Since the e ective resistance of o -chip drivers is comparable to the resistance of the interconnect wires, the wire resistance signi cantly a ects the delay and cannot be ignored. Also the delay in a multiterminal net is not only proportional to the netlength but also to the net topology. Hence we can say that there is a need for more accurate and elaborate delay models for MCMs compared to general VLSI design. The next few paragraphs outline this in greater detail. For a detailed description of the interconnect delay models for MCMs please refer to 42].
For reasonable accuracy, interconnects at the MCM level must be modeled as distributed RLC trees. In this model, each wire segment in the tree is represented by a distributed-RLC section, which consists of a distributed resistance and inductance in series, and a distributed capacitance in parallel to ground. For delay estimation purposes, each distributed section is typically approximated by splitting it into several lumped RLC sections. A rst order model for the delay of an RLC tree is the Elmore model 14]. This model ignores the inductance component, and also ignores higher order e ects such as \resistance shielding". However, it has been demonstrated in 3] that this delay model has high \ delity", i.e., optimization done based on this model tends to lead to high-performance routes. Figure 2 shows the di erent delay models used. In 42] the author has developed a second order delay model for multi-terminal interconnects which demonstrates the e ect of line inductance on second order delay models.
Higher-order models for interconnect delay can be computed e ciently using techniques such as AWE 37] . Programs such as RICE 38] are e cient enough to be used inside optimization loops.
Topology Optimization
When interconnect resistance becomes signi cant compared to the e ective resistance of the driver, the interconnect delay of a multi-terminal net becomes a function of the topology of the routing. One of the rst algorithms which attempted to optimize the topology for delay was presented in 34]. The algorithm iteratively constructs a tree for a multi-terminal net, by starting from a partially constructed tree (which is initially just the driver node), and connecting it to a sink node. The connection path is found using the \A" search algorithm, which is guided by a bene t value on each vertex in the routing graph based on estimates of the Elmore delay to all sinks in the tree, and the total wire length of the tree. An advantage of this algorithm is that it can be used on arbitrary routing graphs, so it can be used on channel graphs, or for area routing in the presence of obstacles. Many other performance-driven algorithms are restricted to the manhattan plane.
Another class of algorithms attempts to minimize Elmore delays directly during construction of the spanning tree for the net. The Elmore Routing Tree (ERT) and Steiner Elmore Routing Tree (SERT) algorithms 4] construct spanning and Steiner trees, respectively, using a greedy algorithm to minimize Elmore delay.
Using a slightly di erent delay model called the dominant time-constant model, a nearoptimal algorithm for minimum-delay routing is presented in 10]. The algorithm is based on the concept of Arborescence-trees, which have the property that every driver-sink path is a minimum length path. An A-tree, in general, may have longer wire length than a minimum Steiner tree, but reduces delays when wire resistance is signi cant. It is shown in 10] that minimizing delay is equivalent to nding an A-tree with minimum wire length. Figure 3 shows the A-trees vs the Steiner tree.
An algorithm for nding near-optimal solutions is presented, and experimental results demonstrate delay reductions of up to 40% based on typical MCM parameters. The A-tree algorithm can also be extended to handle obstacles in the Manhattan plane.
Some performance-oriented algorithms do not directly minimize delay, but instead focus on objectives related to delay. For example, close examination of the Elmore delay model reveals that delay is related to the total wire length as well as the square of the source-sink path lengths.
This observation is used in 8] to derive a Bounded Radius Bounded Cost (BRBC) algorithm, which takes a user-de ned parameter and constructs a tree whose total wire length is at most 2(1 + 2= ) of the optimal Steiner length, and the longest path is at most (1 + ) times the maximum source-sink distance. A performance-oriented minimum rectilinear Steiner tree (POMRST) algorithm 33] minimizes total wire length subject to constraints on the lengths of individual source-sink paths.
In addition to delay, it is sometimes important to minimize the skew of a tree, i.e., the maximum di erence between the RC delays at di erent sink nodes in the tree. This is important in clock trees, since the clock signal should arrive at all clock nodes at the same time for correct operation of a synchronous digital circuit. Clock skew can be a signi cant part of the total cycle time. An algorithm for constructing a tree with zero skew, based on the Elmore delay model, was presented in 48]. The algorithm uses a recursive bottom-up strategy, building up a zero-skew tree (ZST) by recursively merging zero-skew subtrees. Figure 4 shows a zero skew tree construction. The original ZST algorithm has undergone improvements to minimize wire length 47] and to guarantee planarity of the resulting tree 27].
Wire sizing
When choosing the wire width of the route for a net, a careful tradeo has to be made. If minimum design rules are used for a long wire, the wire resistance will cause large delays. On the other hand, the wire cannot be made arbitrarily wide, since its capacitance will then dominate the delay. The best situation occurs when the wire width is allowed to vary in di erent parts of the net, being wider near the driver, and narrower near the sinks. This approach, known as tapering or wire sizing, can reduce RC delays signi cantly, while simultaneously achieving reduction in wire area (and hence power dissipation).
An optimal algorithm for wire sizing under the dominant time constant model is presented in 10]. A greedy algorithm is also presented in the same paper, which is much more e cient.
A combination of the two algorithms produces an e cient optimal algorithm, which can reduce RC delays by as much as 50%.
For the Elmore delay model, a wire sizing algorithm is presented in 40]. A signi cant feature of this algorithm is that instead of simply minimizing the delay, it allows the user to specify delay constraints at the sink nodes. The algorithm then constructs a wire sizing solution using a sensitivity-based approach, such that the delay constraints are met and the wire area is minimized. This results in signi cantly better engineering solutions, since it is usually not important to make the delay smaller than the constraint, and attempting to do so causes the wire area to increase rapidly. By keeping the delay targets just 15% over the minimum delay, area savings of as much as 46% are observed.
Wire sizing can also be used to solve the skew minimization problem, using an approach similar to the ZST construction. Zero-skew algorithms based on wire length adjustments tend to generate solutions which are very sensitive to process variations: although the tree may have nominally zero skew, the actual skew due to process variations may be large. The algorithm of 35] uses wire sizing to achieve a reliable minimum skew solution, by varying wire widths.
Crosstalk Minimization Routing
Crosstalk refers to the parasitic coupling between neighboring wires due to the mutual inductance and capacitance e ects. Crosstalk can increase delays or cause glitches which can cause The researchers who have developed this approach claim that the general objective of performance optimization using as few layers as possible has been satis ed using this three-dimensional approach.
Comparison of the Di erent Routing Algorithms
A lot of di erent routing algorithms for multi-layer MCM routing are available, each having some pros and cons. A comparative evaluation of the di erent algorithms is given in Table 2 and Table 3 .
Conclusions and Scope for Future Work
A lot of work has already been done by di erent authors in the area of placement and routing.
Most of the work in the area of MCM placement has focused on netlength minimization alone though some of the researchers have also considered the problem of proper heat distribution when placing the dies on the substrate. An automatic placement tool that considers both the netlength minimization as well as the thermal criteria has also been developed. However there is a need to consider the wire resistance in delay estimation as the net delay is not just a function of the wirelength. More accurate algorithms need to be developed that would extend the concept of resistance-driven placement to RLC driven placement, to further reduce the interchip signal delays. Thermal constraints also have to be taken into account more carefully. Also it is no longer su cient to do the substrate and IC placement in multi-chip modules individually in a standalone fashion 16]. Optimizing an MCM system requires close attention to the physical designs of the ICs and the substrate simultaneously. This type of approach will relax the routing constraints which are usually placed on the substrate when the IC designs have been completed before the substrate.
It is becoming clear to the IC and MCM design community that interconnects are no longer just \parasitics" -they are an integral part of the circuit design. Taking interconnects into account during circuit design can signi cantly improve system performance. To this end, new research, such as 18], is focusing on the problem of simultaneous driver and wire sizing. 18] presents e cient optimal algorithms for simultaneous driver and wire sizing for delay minimization, or delay and power minimization. Figure 5 shows a tapered interconnect.
Since clock distribution trees on an MCM can be very large, and heavily loaded, it is very di cult to drive them with a single driver. Typically, bu ers are introduced at various nodes in the tree, to avoid having a single very large driver, and to keep the slope of the received waveform reasonably high. Such multi-stage clock trees introduce some interesting new optimization problems, such as nding the optimal locations for the bu ers, minimizing the number of bu ers, nding optimal sizes for the bu ers etc. An algorithm for minimizing the number of bu ers in a given clock tree subject to a clock period constraint is presented in 46] . The basic algorithm is extended to handle upper bound constraints on the skew, and allow area and delay tradeo s. Figure 6 shows a multi-stage clock tree distribution.
The disadvantage of constructing a matching-based hierarchical zero-skew clock tree is that the wires dictated by the matching edges in higher level of hierarchy are relatively long and they may introduce both a severe crosstalk noise due to bu er congestion of the generated tree topology. The local congestion problem can be eliminated by distributing the bu ers over the plane with minimum impact on wirelength. By reducing the congestion of bu ers, crosstalk delay will also be signi cantly reduced. Thus, the authors in 24] investigated the problem of reducing congestion, wirelength and clock skew during the growth of the clock tree. To take the three performance constraints into account simultaneously, the bu er distribution was formulated as a minimum length degree distributed spanning tree problem. An e cient solution to the problem is proposed in 24]. The proposed algorithm can be applied to MCM clock net routing. H-tree is preferable for clock distribution in the transmission line mode, since re ections and crosstalk can be minimized 2]. Thus, a clock tree routing scheme for hierarchical packaging system (e.g., MCM) would be as follows. An H-tree is used for inter-chip clock routing generating a set of clock sub-sources, each of which is inside each chip. Whereas pins in each chip are interconnected from the sub-source only inside the routing subregion using the proposed clock tree construction scheme.
The past few years have generated a large body of important results on interconnect optimization. However, there are still many areas which are unexplored. For example, performancedriven tree construction and wire sizing algorithms do not handle situations where a net has more than one driver. This frequently occurs on global signals like data and control busses, where performance is critical. Other important areas requiring further work are more accurate modeling of transmission line e ects and driver non-linearities. Currently, most routing algorithms use distributed RC delay models which do not take into account the inductive behavior of MCM interconnects. Furthermore, the driver is usually modeled as a single linear resistor, which can be a very inaccurate model. Finally, the various performance-optimization approaches -topology optimization, wire sizing and layer assignment -need to be uni ed into a single algorithmic framework and made to handle additional physical constraints such as congestion 
