Abstract
Introduction
As fast timing analysis is desired for today's complex interconnect structures, the moment matching technique is widely used to approximate a higher order linear network using the waveforms generated by its lower order moments [ll, 14,161 . These approaches can also be used to improve the efficiency of existing time-domain circuit simulators by incorporating macromodels of interconnect networks with recursive convolution [12, 14, 171 , or generating an equivalent circuit based on a numerical integration method [9] .
The macromodels generated by moment matching technique are approximations of the port behavior of the interconnects. In order to incorporate the macromodels into SPICE-like simulators, they are usually described by admittance matrices (Y-matrices) [17, 91. The Y-matrices are generally full matrices. When the number of external ports of an interconnect network is large, the number of entries of the matrix may be larger than the number of cir-* Haifang Liao is now with 'Ultima Interconnect Technology, Inc., Santa Clara, CA.
cuit elements of the original interconnect network. For example, a clock network with 500 fanouts will create more than 250,000 entries of the corresponding Y-matrix.
This huge full matrix may even increase the computational burden to circuit simulators.
On the other hand, trying to model the huge full matrix by lower order approximation with moment matching techniques is impractical. Severe numerical problems may result in extremely ill-conditioned matrices for computing high order moments. The complex frequency hopping method [2] and the PVL (PadC approximation via the Lanczos process) [41 are efforts to compute high order moments while avoiding the ill-condition problem. However, increasing the approximation order will increase the modeling and simulation time. Especially, when a large circuit with large number of ports is terminated with nonlinear drivers and loads, there is no efficient method to obtain reduced-order approximate models of linear networks. Fur- The difficulty of reducing a large linear network with large number of ports can be dealt with by partitioning the network into smaller subnetworks. In this paper, we present a new reduced-order macromodel of RC interconnect networks for transient analysis. Instead of modeliig a single large interconnect network using higher order approximation, we develop a linear-time algorithm to partition a complex interconnect network into subnetworks to reduce the total number entries of Y-matrices. We syntheshe a lower order equivalent circuit for each subnetwork. Since all parameters of reduced circuits are non-negative, the circuit is always stable. Simulators only need to analyze the synthesized low order equivalent circuit which will greatly reduce the running-time. Because networks are partitioned into smaller subnetworks, high accuracy can be achieved using lower order models approximating those subnetworks.
Network reduction
Scattering parameters (S-parameters) is a powerful way to describe and model interconnects. They can be measured directly at high frequencies and they exists for all distributed-lumped circuit elements including open and short circui1,s. Scattering parmeters represent the interrelationship of a set of incoming and outgoing power waves of a multi-port system. To develop the macromodel, we begin with the description of all elements in the distributedlumped network in terms of their scattering parameters. Some basic elements are utilized to characterize a general interconnect network, e.g., one-port impedance, two-port impedance, single transmission line and multiprt interconnect node. These basic elements are shown in Fig. 1 . S- parameters of the first three elements can be found in [3] .
The interconnect node as shown in Fig. l(d) can be used to connect elements described in Combining these basic elements and all multiport elements described by S-parameters, one can represent a variety of distributed-lumped network topology including capacitive cutsets, inductive loops, and lossy transmission lines. Given the individual component scattering parameters, a systematic reduction algorithm has been developed in [ll] . Two merging rules are employed to reduce the original arbitrary distributed-lumped linear network into a multiport component (See Fig. 2 ). The Adjoined Merging Rule (See Fig. 3 ) is used to merge all intemal components, and the Self Merging Rule (See Fig. 4 ) is applied to eliminate the self loops introduced by the Adjoined Merging process. Each step of the merging process amounts to contracting an edge in a circuit graph G(V, E), where vertices V consists of circuit elements shown in Fig. 1 and edges E represent the adjacency between elements in the circuit. 
Circuit partitioning
The description of an N -port component is generally a full S-matrix or Y-matrix. When N is very large, the synthesized equivalent circuit could have a very large number of elements. In order to minimize the number of elements of the synthesized circuit, we propose to partition a multiport component into several subcomponents with a smaller number of ports (See Fig. 5 ). Since the number of elements of the reduced circuit is proportional to the number of the entries of S-matrix, the objective of the partitioning problem is to minimize the total number of entries of the Smatrix. More precisely,
where x is the number of subcomponents and Ni the number of ports of the component i . We propose a lineartime heuristic algorithm to partition the interconnect network into several multiport components.
Consider the circuit graph of a network G(V,E).
Define the weight W(e) of an edge e E E as the number of ports of the resultant component after contracting e with Adjoined Merging (Fig. 3) or Self Merging (Fig. 4) 
where NZ + N: is the number of entries of S-matrices before contracting e , ( N , + N y -2) is that after contracting e , and p the contracting factor. Now we have two problems: one is how to select efficiently an edge with minimum weight, and the other is how to update the edge weights after contracting an edge. Those operations can be done in constant time with a bucket list structure shown in Fig. 6 . This structure includes a bucket 
Circuit synthesis
nent, the admittance matrix is Given the scattering matrix S of a multiport compo-
where Z , is the reference impedance, and E identity matrix. The synthesis of a general Y-matrix without using a transformer is still a open-problem [l, 8, 151. Here we propose a new synthesis method without using transformers. As we know, the admittance to ground of the i th port is given by the sum of the ith row (or column) of its Ymatrix. The admittance connecting port-i and port-j is the value of -y . . . Thus, the circuit synthesis problem amounts to synthesizlng admittances between a port and the ground and between pairs of ports with lumped R and C elements. Unlike the method proposed in [18] where the connection between any two ports is synthesized with a fixed circuit conliguration and the values of circuit elements are determined based on the admittance at two frequency points, we use moment matching technique to synthesize the admittance matrices of an RC network. With Taylor series expansion, yij can be described as
1)
y.. Y = mii') +ml("j)s+...
(5)
To synthesize the circuits between pairs of ports, the T-model shown in Fig. 7(a) is used, and the circuit parameters are determined by matching the moments of yii . The T-model together with the capacitors at two ports, to be synthesized with y l i , forms a 2lI model which provides an accurate description of connection between pairs of ports. The admittance matrix of the T-model is
y . . 
To determine the values of Rijl , Rij2 and C i j , let
The reason why we match ml(ii) /~~l(ii) , and not match "l(ii) and directly, is that ml (l' ) and m y ) are the second moments of yii and yjj which are total contribution of the circuit, not just a branch between port-i and port-j .
Solving the Eqs. (6) , (7) The above formulas could be used to synthesize all off-diagonal elements of Y-matrix with m(i') 2 0 However, if there are floating capacitors, mii') h a y be negative. In this case, we can also use a floating capacitor to model the yij between port-i and port-j (see Fig. 7(b) ). 
Stability of synthesized circuits
The sufficient condition for the reduced circuits to be stable is that all synthesized circuit parameter are non-negative [8, 15, 181. In this section, we prove indeed that the reduced circuits are stable. See [13] for the proofs of the lemmas and theorems. 
Efficiency and accuracy of synthesized circuits
The common method to increase accuracy of modeling a large interconnect network with moment matching techniques is to increase approximation order [2, 4, 91.
This will increase the modeling and simulation time. The problem becomes evident when the number of port is very large. Only recently, an improved PVL method [5] was proposed to compute the reduced-order approximate models of linear networks with multiple inputs and outputs. However, the huge full Y-matrix with high order of approximation will still be a heavy burden on simulation.
Instead of modeling a single large interconnect networks with higher order approximation, we partition the network and use lower order models to approximate multiple small subnetworks. High accuracy can be achieved using lower order models approximating the all subnetworks. The partitioning reduces the simulation time significantly. Furthermore, since partitioning chooses the optimal reduction order and does not merge large components, the reduction time actually decreases.
In addition, partitioning greatly reduces the number of elements in the reduced circuits. When using a Y-matrix to describe the port behavior of a network, the number of yelements is O(N2) , where N is the number of external ports. However, as we will show in experimental results, for typical clock networks, the number of circuit elements in reduced circuits with partitioning is O(N).
Experimental results
To illustrate the efficiency and generality of the proposed approach, some industry circuits were tested. We compare results of the original circuits and the reduced circuits using SPICE3e2 [7] on a Sun Sparc 20 workstation.
The first circuit is a clock network which consists of 26747 RC elements and has 547 external ports. The n mber of RC elements is reduced to 2084 which is less than 10% of that of the original circuit. Consequently, the simu- Fig. 9 . The figure shows that the number of reduced circuit elements is O ( W , where N is the number of external ports. The simulation time versus the number of external ports is plotted in Fig. 10 . Sirnulati08 results at the same port of those reduced circuits with different number of ports are almost identical (See Fig. 11 ). The error of the simulation results is less than 2%.
The reduction results of a loop-circuit and a mesh-circuit are shown in Table 2 . After reduction, the simulation time of the loop-circuit and the mesh-circuit was reduced 7.7 times and 20 times, respectively. The error of the simulation results is less than 2%, compared with that of the 30000 I 1 5000 ' 0 100 200 300 400 500 600 # of ports original circuits.
The circuit partitioning plays a very important role in the circuit reduction. Another clock circuit consists of 7159 resistors and 6875 capacitors with 1953 external ports (7.19 elements per port). If we use only a simple multiport macromodel, the number of elements in the synthesized circuit would be much more than that of the original circuit. However, since the efficient combination of partitioning and reduction, the reduced circuit which keeps all 1953 ports extemal has only 5546 resistors and 2460 capacitors The circuit reduction provides a new strategy to handle nonlinear circuits. A seven-port clock network with nonlinear drivers and loads is shown in Fig. 12 . The interconnect consists of 9427 RC elements. After reduction, the circuit has only 73 lumped elements. The simulation time for this reduced circuit is 0.6 second, while simulation time for the original circuit is 43.0 seconds. Again, the results (See Fig. 13) show excellent agreement in response waveforms.
Conclusions
The efficient combination of circuit partitioning and reduction provides an efficient circuit analysis strategy. Based on scattering parameter macromodel, the reducedorder models of linear networks with multiple inputs and outputs can be obtained in one pass of reduction. Circuit partitioning speeds up the reduction process, decreases the total number of circuit elements, and achieves the same accuracy using lower order approximation. The proposed circuit synthesis method guarantees the reduced circuit to be stable. The resultant equivalent circuits are fully compatible with general-purpose simulators. Since no assumption is made to port voltages and currents, the proposed approach can be applied to the simulation of analog circuits and mixed-signal circuits.
