We present a technique for rapidly calculating printed circuit board (PCB) 
Introduction 1
Interconnect delay estimation plays an important role in timing verification and performance driven layout. Because interconnect is taking up more and more of the clock budget, the ability to calculate interconnect has become an important computer aided design problem.
But interconnect on printed circuit boards (PCBs) is fundamentally different from interconnect on integrated circuits (ICs). Whereas IC interconnects are highly damped; PCB nets usually are not. In the former, R dominates; in the latter, L. In the one, signals propagate by diffusion of charge, described by a 'heat' equation; in the other, signals propagate by traveling waves, described by a wave equation. ICs tend to have monotonic waveforms; PCB nets have ringing, plateaux, and other signal integrity problems.
Accordingly, PCBs and ICs present distinct delay estimation problems. In the IC domain, formulas for estimating the delay--the Elmore delay [Elmore] or the Rubinstein, Penfield and Horowitz bounds [Rubinstein] --have been used liberally. The Asymtotic Waveform Evaluation method (AWE) has also been applied successfully to ICs.
For PCBs, quick delay estimation has been more elusive. One popular way of estimating delay (especially across backpanels) is the loaded time of flight formula [Baxter, Blood] ,
but its applicability is unjustified for topologies other than the daisy chain (and even then, it applies only for firstincident switching). To fill the void of formulas for interconnect performance on PCBs, precharacterized curve fits of SPICE delays have been proposed [Mehrotra] .
When time allows, SPICE or behavioral simulators based on the method of characteristics can be used for accurate delay numbers [Branin] . Losses can be handled by convolution of the impulse response [Djordjevic] ; ladder methods have also gained popularity, especially when attempting to model the skin effect [Kim] .
What we seek is a method midway between very rough formulas like (1) and the more accurate but also more time consuming calculations of delays from SPICE.
To this end, we describe in this paper an AWE method that has proved effective in practice for quickly estimating delays of printed circuit board interconnect. The method is based on a tree traversal algorithm that computes moments using ABCD matrices and, as in [Sriram] , truncated polynomial arithmetic. A key asset of the method, which distinguishes it from some other AWE methods [Pillage, Ratzcliff] , is that it is not limited to lumped RLC nets; it Figure 1 . Representative node in a tree network.
can treat distributed transmission lines directly, without substituting iterated ladder networks (see also [Liao] ).
We restrict our discussion to tree networks with branches that are either transmission lines or ladders of lumped RLGC components. We also include in our formulation an admittance G sC v v + at each node to model pin parasitics.
Background
The AWE method begins by computing the initial terms of the Maclaurin series for the transfer function from the driver to each receiver of interest:
The coefficients in this expansion are related to the moments of the impulse response h(t):
which is why the method is often described as a moment matching method. 
The Pade approximation [L,M] leads to a finite pole macromodel for the transfer function. The elegance of the method is somewhat tempered by the fact that unstable poles often crop up in the denominator in (4), rendering the approximation useless.
AWE methods differ in the way the moments are calculated, and in their method of taming the stability problem.
Figure 2. Definition of ABCD matrix

Moments Using ABCD Matrices
For a tree topology, moment expansion (2) can conveniently be calculated by doing two traversals of the tree: first, a postorder traversal to calculate the admittance Y v looking into the subtree rooted at each node v (see Fig. 1 ), then, a second traversal to do a preorder calculation of the transfer functions from the driver to each node. These calculations are done in terms of the ABCD matrices of the two port networks composing the branches of the tree (see Fig. 2 ).
Step 1. Input Admittance Y From (5) we get the operation ⊗ which relates the admittance at the input to the admittance at the output of an ABCD matrix:
where
We use transformation (6) to compute the input admittance at a node v of a tree network in terms of the input admittances of v's children u u p 1 , , Λ (see Fig. 1 ):
This calculation requires the input admittance Y ( ) u i of each child be known before Y(v) of the parent node can be computed; hence, Y is a synthesized attribute computed by depth-first traversal of the interconnect tree.
Step 2. Transfer Function Propagation From (5), we also get a relation between the input and output voltages of an ABCD network, once the load admittance at the output port is known (from Step 1):
Given the transfer function H(v) from the source to a node v, we can compute the transfer function from the source to
This calculation requires H of the parent to be known before calculating H for the children; hence, it is done during a preorder traversal. A slight simplification, and better stability properties [Sriram] , is had by computing 1/H instead of H in (9).
Iterated Ladder and Transmission Line Expansions
All calculations in equations (6)-(9) are done with truncated polynomial arithmetic; that is, all the factors in equations (6)- (9) are to be regarded as polynomials (truncated power series) of some order q, which are added, subtracted, multiplied, or divided to produce another truncated series of order q. These calculations require us to know the moment expansions for the elements of the ABCD matrices. We limit our discussion to iterated ladder networks and distributed transmission lines, these being the most important in practice.
Uniform RLGC Ladders
The ABCD matrix of a ladder of 2 p π sections, each composed of a series impedance Z(s) flanked by a shunt admittance Y(s)/2 on each side, is readily computed by: 
where R, L, G and C are the resistance, inductance, conductance, and capacitance of the line per unit length, and d is the length of the line. This iterated product method, while quite efficient, still leaves us guessing as to how many sections we need to accurately model the line. Direct expansion of the ABCD matrix of a transmission line obviates this consideration, and is discussed next.
Distributed Transmission Line
The ABCD matrix for a transmission line of length d [Ghausi] is cosh sinh sinh cosh
where the impedance and propagation constant are given, respectively, by
and
Using the method of undetermined coefficients, the expansions of Zo and γd in powers of s is readily computed:
Then, using composition of power series, the expansion for γd is substituted into the well known Taylor series for cosh(x) and sinh(x). So that this process terminates, we must first isolate the constant term in γd: 
Stability Tactics
From the moment expansion of the transfer function H(s) we next compute a Pade approximation that matches the first N moments and can be used as a reduced order macromodel of the system. The sticky point in the process is
Step Input
Step Response Figure 3 . Assumed form of step response. avoiding unstable poles in the Pade approximation.
The waveform at a receiver of a transmission line network will generally have a step response that consists of a delay T f , a step d, and additional ringing or exponential movement (see Fig. 3 ).
The transfer function corresponding to such a response has the form
The number of poles k required depends on the feature complexity of the response; using too many poles can be as bad as too few. The intuition behind our strategy is that the Pade approximation will tend to be stable if the assumed functional form fits the actual waveform reasonably well. Since (18) is expected to fit common responses that occur in practice (provided k is chosen to be the correct order), we expect such a Pade fit to be stable. This conjecture has been born out in our tests.
Specifically, our methodology is
, where the sum is over all transmission lines on the path from the source to the receiver in question.
2) Remove the transport delay T f from the moment ex-
3) Set k=(q-1)/2, where q is the highest power of s in (2).
4) Attempt a Pade approximation of ∃ ( ) H s with denomi-
nator and numerator both of degree k. An improper Pade form (numerator of same degree as denominator) is chosen to include the direct transfer term d. 5) As long as the Pade approximation from step (4) is unstable, reduce the order of the approximation by 1 (set k=k-1) and try step (4) again. This process is guaranteed to stop at k=0, if not sooner, since a Pade approximation of order 0 has no poles and therefore is trivially stable--it is just the multiplicative constant d. 6) The plant described by the Pade approximation is simulated numerically (a rational function in s can be viewed as a system of first-order linear constant coefficient differential equations) to get the delay T plant of the plant.
The delay of the interconnect network, finally, is given as
We do not recommend looking for Pade approximations whose numerators are of a different degree than the denominator ([L,M] with L≠M). Our argument is twofold. On the one hand, physical reasoning leads us to expect the direct transmission term d normally to be present. Secondly, choosing L=M allows us to match the maximum number of moments for a given order of plant.
Our experience shows that the combined strategies of extracting time of flight and descending to the correct order for the plant together result in a very stable method.
Accuracy and Speed
It is traditional in a paper like this to demonstrate the effectiveness of a new method by applying it to several test circuits; we are particularly interested, however, in the method when applied to circuits en masse. In this section, therefore, we give delay results for an entire PCB design. 
Printed Circuit Board Delays (ns)
-
Figure 4. Comparison of delays by two methods
The test case we choose was a rather large (14"x16") multilayer PCB that had a mix of ECL, ABT, and CMOS components comprising a total of 588 nets--including clock nets, tri-state busses, memory banks, ASICS, and glue logic. The large dimensions of the board and the fast switching speeds (1-3ns) ensured that transmission line effects, inductance, propagation delays, ringing, and reflections would be significant. The signals would be traveling waves, not voltage from charge diffusion through an RC network.
The accuracy of the AWE method can be judged from the scatter plot of Fig. 4 . Delays computed by the AWE method are plotted along the ordinate against corresponding delays computed by a transmission line simulator based on the cascaded method of characteristics (CMC) [Kim] . Delays for the rising edge are plotted as positive numbers (1st quadrant), delays for the falling edge as negative numbers (3rd quadrant). For signals with multiple timing-threshold crossings, delays are computed as the first threshold crossing (predicting last threshold crossing is considerably more difficult).
For both the AWE and comparison simulations, device drivers were modeled behaviorally with IBIS non-linear devices models 2 . 10 moments (e.g., terms up to m s 10 10 in (2)) were used in the AWE calculations. Trial-anderror convinced us that using few moments significantly degraded accuracy while carrying more moments gave diminishing returns. It seems that at least 5 poles are needed to describe adequately the variety of waveforms seen on PCBs.
A histogram of delay discrepancies for the same design is shown in Fig. 5 . 
Figure 5. Histogram of delay errors
The AWE method simulated the entire board--588 nets, 4496 individual delays--in 12 seconds on a Pentium 90MHz PC. This does not count the time to load and parse the file describing the design and write the results to disk; with the overhead of these supporting tasks, the simulation took about 1 min 20 sec. The cascaded method 2 IBIS, or IO Buffer Information Spec., is an industry standard for describing buffers using IV curves, rise/fall times, and other behavioral data that does not reveal vendor processes.
of characteristics simulations took 5 min 20 sec, not counting supporting tasks.
Conclusion
The moment matching method described in this paper is very fast. Its speed qualifies it for use in performancedriven layout systems that may require repeated evaluation of layout alternatives.
One and two moment analytic delay models [Sriram2], while they may have some validity when applied to ICs, seem inadequate for predicting PCB delays. Our studies indicate that at least 4 or 5 pole macro-models are required to capture the complexities of PCB waveforms (ringing, plateau, etc.).
