Introduction
Multi-Chip Modules MCM's provide a medium for integration of several bare dies on a multi-layer substrate. MCM technology has gained popularity in recent years with the promise of orders of magnitude reduction in inter-chip delay and power dissipation over single chip packaging 2 . By virtue of a faster interconnect MCM's aim at alleviating, to a large extent, the bottleneck o ered by conventional packaging. However the interconnection medium of a typical MCM substrate is characterized by a signi cant inductance and routing distances of several centimeters. Accurate analysis of MCM interconnect requires the use of lossy transmission line models. In comparison, the on-chip interconnect design has been traditionally been performed by modeling lines as lumped resistance-capacitance RC networks. The need for new methods which take i n to account the requirements of the MCM scenario while being comparable in e ciency to traditional IC design techniques is of prime importance. This paper addresses the problem of designing the clocking network for an MCM.
The design of the clock distribution network of an MCM is critical from the point of view of achieving desired system speed along with reliable operation. The clock distribution network on a MCM substrate can be considered to be a tree of lossy transmission-lines delivering the clock signal to the various dies placed on the substrate, with bu ers inserted into the network to maintain performance constraints. The interconnect typically shows considerable inductive e ects, and it is not enough to model its behavior using Elmore time constants, as has been done for IC's for example, in 17 . Therefore, the problem of constructing a clock tree for MCM has three primary considerations: a that the lines are either critically damped, or the overshoot is acceptably constrained. b that the clock s k ew is minimized. c that constraints on the slew rate of the clocking waveform are met at each clock pin.
The design of zero-skew clock-trees for IC's using an Elmore delay equalization algorithm was proposed by Tsay in 17 . This algorithm hierarchically merges zero-skew subtrees by selecting a tapping point on the line interconnecting the two trees such that the delay to the leaf nodes of the trees is equal. Subsequent re nements such as 3, 5, 6, 8, 9, 13 , have attempted to improve the results by adding criteria for selecting the zero-skew subtrees to be merged in each step, using wire width optimization, nding planar routings, etc. However, all of these have restricted themselves to the IC environment, and have not considered inductive e ects. In this paper, we generalize the zero-skew design methodology in 17 to a higher order approximation of the voltage waveform, so as to facilitate the design of zero-skew clock-trees for an MCM scenario, while meeting the above mentioned objectives for clocktree design for MCM. We also use a completely distributed model for the interconnect here, rather than a cascade of lumped sections, as has been done previously, and we nd that this gives more accurate estimates of delay v alues. Further re nements to this work, such as those in references 3, 5, 6, 8, 9, 13 , are possible under the same framework but are not addressed here.
Some related research in 18 presents a method for the design of interconnect exhibiting transmission line behavior using the S-parameter model. The method involves the selection of appropriate widths for various branches of the given network such that the delay to the various sinks is minimized and the waveforms exhibit a speci ed damped behavior. This wiresizing strategy involves minimization of the maximum delay and the damping ratio error of all the sinks nodes as a nonlinear programming problem. The approach presented in 18 was aimed at the design of general signal networks and did not encompass the objective of clock-tree design, where the minimization of clock-skew requires that the delays to all sinks be exactly equal. The design of a clock-tree under lossy-transmission model was considered in 20 . This involves a strategy for minimizing the clock-skew by wire-sizing using a modi ed Gauss-Marquardt's least-square estimation technique. General RLC and transmission-lines were modeled as a part of this method, but it did not address the problem of ensuring a speci c damping condition for the RLC lines of the clock-tree.
This paper uses a second-order distributed parameter transmission-line model 15 to construct a zero-skew tree. The computational expense of the approach presented here is very low. In comparison to the methods mentioned in the previous paragraph, the work presented in this paper aims at minimizing both the skew as well as ensuring an appropriate damping for clock trees. It is important t o control ringing in the clocking network because this could lead to undesirable cross-talk, and because high overshoots undershoots could also cause devices to switch incorrectly. However, we consider the possibility o f allowing a small overshoot of controlled magnitude, as this can ensure an improved signal slew rate 1, 10, 19 . The speci c contributions of this work are the use of a higher-order moment matching to reduce the skew, the design of the clock-tree with controlled damping criterion to ensure signal integrity while improving the timing performance, and a strategy for the use of bu ered clock-trees in the MCM scenario. Our method has the advantage of low computation while being able to meet the design objectives. The accuracy of the models used has been validated by SPICE simulations of the clocking networks obtained from the algorithm.
A brief introduction to the second-order distributed model and its use to design zero-skew trees is presented in Section 2 and Section 3 respectively. Section 4 presents the concept of extending the rst order delay moment matching to higher order moments. A method for damping clock trees is presented in Section 5, and involves the selection of a suitable seriestermination. The procedure of selecting a suitable termination resistance to control the damping of the waveforms at the sinks of the clock-tree is also presented here. Next, the construction of bu ered clock-trees is considered with the goal of meeting delay requirements and satisfying the damping criterion in Section 6. Section 7 provides an overview of the method and the results of the procedure on typical MCM examples are presented in Section 8.
Evaluation of Interconnect Response
The concept of asymptotic waveform evaluation AWE 12 has been used widely in recent years to simulate and design the interconnect. AWE involves approximation of the exacttransfer function of a circuit by a l o wer order transfer function. This process consists of two steps: a moment computation to the represent the original transfer-function as a Maclaurin series. b moment matching the series to a lower order transfer-function by P ad e approximation. Hs to the Maclaurin series. This proves useful in the computation of admittance as described below. It is also advantageous due to the fact that for a transfer function of the type 1 the Pad e approximation step reduces to taking the reciprocal of the Maclaurin series. 1 REX is a good choice for the kind of higher order approximation we use for the purpose of clock tree construction.
Approximation of Transfer Function

Distributed Parameter REX
A second-order approximation to the transfer-function of RLC lines, presented in 15 , is used in the design of clock-tree in the following sections. The conductance to ground is assumed to be zero. It should be pointed out that the frequency of operation of contemporary MCMs is 1 It should also be pointed out that while Equation 1 is a 0=n P ad e approximant, it is possible to perform moment-matching to obtain a m=n P ad e approximant using REX. 
Admittance Computation
Considering an in nitesimal section dx of the line at position x, the admittance at x + dx is as follows: Hence given Y 0 at the end of a line of length l, the second order approximation of the looking-in admittance Y l can be determined from equations 8 and 9. 
Coe cient Computation
18
Hence the given voltage V 0, the second-order approximation to V l can be computed using the Equations 17 and 18.
3 Zero-Skew Clock Tree Construction for RLC Lines
The process used here to construct a zero-skew clock tree involves a bottom-up recursive process, as in 17 . The input consists of a description of the location and loading capacitance of each clock pin. We begin by assigning each clock pin to a separate subtree; initially, each subtree consists of just one node corresponding to the clock pin, and this node is considered to bethe root of the subtree. At each step, we merge two subtrees. The merging process requires the determination of a point on the line interconnecting the trees such that the delay to all the leaf nodes is the same; this point is taken to bethe root of the merged subtrees, and the process continues recursively. We describe one step of the recursive process here, involving the determination of a zero-skew merging point for two subtrees. The procedure comprises of three steps: 
can bedetermined using the distributed-parameter REX. The rst and second-order coefcients of the voltages at the other end of the lines is given by using equations 17 and 18.
We will rst consider the case where the rst order moments of 1 Hs is matched; the generalized case is considered in the next section
substituting equations 24 and 23 and solving for r,
The admittance of the left and right branches of the tree formed by this merging can be calculated by using equations 8 and 9. The looking-in admittance of the new tree formed after the merge is the sum of the admittance of the left and right branches of the tree. The voltage at the root of this tree is given by a veraging the coe cients corresponding to the left and right subtrees. Hence starting from the leaf nodes clock pins, at the end of each stage of merging a set of subtrees with their second-order admittance and voltage estimates are available for the next stage of the merging process.
Selection of Merging Point based on Higher Order Moments
In the previous section the selection of a merging point w as based on matching the rst order moments. If a tapping point could be selected matching the higher moments up to some order n, the waveforms at the leaf nodes would beidentical to a higher degree of accuracy. This in principle should minimize the skew. Consider the waveforms at the root of the left and right subtrees which are to be merged. larger than the next. Weights w 0 w n are introduced so that the contribution from each moment is re ected equally or in a desired proportion in the objective function. For typical parameter values from 7 it was found that the rst-order moments dominate the delay characteristics of the waveform in the nal subtree. Increasing the weight of the higher order moments with the intention of matching higher order moments resulted in the rst order moments to di er slightly. This, however, manifests itself as a noticeable skew. Therefore, we prefer to use the rst order moment for matching purposes.
An explanation for this can be found in the fact that our procedure suitably damps the nal subtree using a terminating resistance, so that the response is close to monotone. For a monotone response, it is known that the rst-order moments dominate the response, and hence matching these moments contributes to ensuring near-zero skews. However, this does not mean that our computations of the second-order moments has no purpose; we use this information in ensuring that the network is suitably damped.
Damping Of Zero-Skew Trees
Due to non-negligible inductance, the routing constructed on the MCM substrate results in under-damped behavior. The zero skew trees are damped appropriately here by series- Figure 3 : Damping of a zero-skew tree termination at the source end. The clock tree construction as described in the previous section provides a second-order approximation of the waveform at all the sink nodes. This second-order estimate is used to calculate the value of the series-termination for this tree of transmission lines. The series termination could implemented either by sizing the clock driver for a speci c output resistance or by including a resistance in series with the driver. This process is illustrated in Figure 3 .
Consider a driver with output resistance R b driving a tree whose root node is labeled as u. The downstream admittance at u is Y u and the voltage is V u. The secondorder polynomials corresponding to these quantities computed as a part of the zero-skew-tree construction are given by conductance to ground is assumed zero Substituting the value for in Equation 46, the clock-tree can be designed for any acceptable overshoot. In general, under-damped condition implies a lower driver resistance which, in turn can, beinterpreted as either the fact that longer lines can bedriven, or that wires of lower widths can beused to increase the packing density. It can been seen that the delay t d , de ned as the time to reach 50 amplitude, is not a ected signi cantly by . However the rise-time t r , de ned as the time taken for the signal to rise from 10 to 90 of the nal amplitude, is strongly dependent on the damping ratio, as illustrated in the section on results in a substantial decrease in t r while the overshoot is minimal 4. Hence the choice of driver using this method can improve the rise time characteristics of the tree while maintaining an acceptable waveform at the clock-tree sink nodes.
Insertion of Bu er Stages
The phase delay of large clock-trees can be signi cant and may limit the system clock-speed. In order to reduce the phase delay, bu er stages are introduced as part of the clock-tree construction. Bu ers also perform the role of regenerating the signals, by supplying the current necessary to drive the sub-trees. In addition to achieving these advantages, we consider sizing of bu ers to hierarchically damp the clock-tree. It should be pointed out that in the MCM scenario the bu ers can be fabricated only on the dies placed on the substrate. We assume the availability of two bu ers per die and include the detour to a bu er as part of our delay computation.
A linear model for a bu er as shown in Figure 4 is assumed. The computation of bu er resistance is as shown in the previous section. The appropriately sized bu ers are introduced at all nodes at a particular level of the clock-tree as shown in Figure 5 . Even though a level by level insertion scheme may not be globally optimal, it is known to work well for full binary trees 21 zero-skew merging will always result in full binary trees. Further techniques such as shown in 3 can be adopted to reduce imbalance in subtrees being merged. Hence a level by level bu er-insertion solution will not be far from optimality in reality. The e ect of level at which bu ers are inserted is presented in Section 8.
At any stage of the clock-tree construction of the zero skew tree construction, there are a set of n subtrees which are zero-skew-merged to construct n 2 trees. A di erence in the input admittance of two sub-trees is expected as part of the merging process. This di erence in the admittance of the subtrees implies a unequal loading of the bu ers. This will lead to skew due to the non-linearity of the bu ers. We attempt to minimize the e ects of this by ensuring that the admittance seen by all bu ers at a stage is the same thereby ensuring similar switching behavior for all bu ers. The concept presented below is similar to bu er relocation 4 or delay equalization 13 . This procedure was experimentally seen to bevery e ective when the subtree admittances were very di erent from Y u T , and was not applied when the admittance value was close to Y u T .
Summary of the Algorithm
The following pseudo-code summarizes the procedure for clock-tree synthesis. The routine Zero-Skew Merge nds a tapping point which minimizes the skew for two given sub-trees. The routines Compute Admittance, Compute Voltage calculate the input-admittance and the voltage at the root of the new sub-tree formed as a result of the Zero-Skew Merge procedure. To meet delay speci cations for the clock-tree, bu ers are inserted at the root of sub-trees at appropriate levels by the Insert Bu ers routine. This routine calculates the size of the bu er for the speci ed damping criterion. The Equalize Load routine calculates the appropriate length and width of the wire feeding the root of a sub-tree so that the load of the bu ers inserted at a level are nearly equal. The recursive application of these routines on a set of pins results in the desired zero-skew clock-tree. The termination at the root of the tree is calculated by the Series Termination routine. In the case where the numberofpins is not a power of two, this procedure is applied recursively.
Experimental Results
The procedure described above w as tested on a set of examples which portray a t ypical MCM routing scenario. A substrate with area 10 cm x 10 cm was assumed. The distribution of the clock pins and the loading capacitances at each pin was generated randomly. The numberof clock pins varies from 8 to 128 in the examples MCM-1 through MCM-5. The examples were tested by constructing the clock-trees using MCM-L process parameters obtained from 7 = 0 :24 =cm; = 7 :2nH=cm; = 0 :76pF=cm. The quarter wavelength for a line with these parameters is about 2 meters, and therefore our assumption about electrically short lines is a valid one.
A constant width of 10 m and a Manhattan geometry is assumed for the routing. The clock-trees constructed, were studied with SPICE to verify the accuracy of the routing procedure. The wires were represented by m ultiple RLC segments to model the distributed nature of the lines. Some of the SPICE simulations are presented in the gures in following section.
Clock Tree Construction
The characteristic parameters of the clock-trees constructed using our approach are presented in Table 1, Table 2 
Damping of Clock Trees
The damping of the clock-tree was performed with series termination at the source end. The e ect of a range of damping conditions = 0.5 to 1.0 was studied. Figure 13 shows the step response at a particular sink node of MCM-2 for the range of . Table 3 summarizes the study of the variation of rise-time t r , delay t d , 50 V dd , peak overshoot V peak , driver resistance R b with the damping factor . The variation of these parameters are depicted in Figure 14 . The gure also shows the strong dependence of t r on as compared to t d . A signi cant improvement in delay and rise-time is observed under under-damped conditions. 
Bu er Insertion
The e ect of improvement in delay and slew rate was studied by inserting bu ers as described in Section 6. The e ect of level at which the bu ers are inserted is shown in Figure 15 for the example MCM-3. The sharpest slew rate is observed for the level closest to the leaf nodes and as the level of insertion is moved toward the root the slew rate and delay become worse. There is a trade o between the delay c haracteristics and the number of bu ers required. For example lowest level in this case requires 16 bu ers where as the highest level requires only two. A c hoice of one or more bu er levels for a tree could be made depending on the timing constraints and criterion such as bu er availability.
MCM-3
Level The module has dimensions of 7.52 cm x 3.68 cm. The general placement of the dies is as shown in Figure 16 . Assumptions about the exact pin locations and loading capacitances have been made since these data were not available in 14 . The module has a supply voltage rated at 3.3 V. The clock distribution network speci cations are as speci ed in 14 . Figure  17 symbolically illustrates the process of clock-tree construction for this example, however it should be noted that this not the nal physical routing. The SPICE simulations are shown in Figures 18, and 19 for critical, under-damped conditions respectively. These gures depict the clock-signal at the output of the clock-tree driver, and the waveforms at the clock-pins of various dies of the MCM. It may benoted at this resolution the skew at the clock-pins is indistinguishable for both cases. The details of these characteristics are shown in Table 4 . The skew is the maximum di erence in delay t o clock pins of the 20 dies. The parameters skew, propagation delay, transition time, were measured as per their de nitions described in section 8.1. The simulations show that the speci cations are met with su cient margin. It may bepointed out that the design of the clock-tree has been dealt in 14 by a procedure involving repeated SPICE simulation and analysis.
MCM-Pentium
The execution times for the construction of clock-trees of examples studied here are tabulated in Table 5 .
Conclusion
This work considers the design of clock-tree for MCM environment. A distributed-parameter AWE-based technique was used to estimate the response of clock lines for the construction of zero-skew trees. A second-order approximation was found to provide su cient accuracy for selecting a series-termination for the clock-tree. The series termination could be selected so that the responses at all sinks were critically damped. An alternative to using critical damping, the idea of designing the clock-tree with controlled overshoot with the aim of improving slew rates, was also investigated. The procedure also incorporates the introduction of bu ered clock trees to meet constraints on the slew rate of the clock. The algorithms presented here are computationally very e cient, and the experimental results exhibit low values of skew for clock-trees constructed and the damping of sink waveforms was achieved as desired.
