Abstract
1: Introduction
Industry trends aimed at integrating higher levels of circuit functionality have triggered a proliferation of analog and digital subsystems fabricated side-by-side on the same die. The combined requirements for both high speed digital and high precision analog circuitry produce unique challenges to mixed-A/D circuit designers. Specifically, monolithic mixed-signal ICs are often characterized by parasitic analog-digital interactions which can cripple the operation of high-performance designs. Noise coupling through the common chip substrate has been identified as a significant contributor to this important problem [l] .
Modeling the electrical behavior of non-ideal semiconductor substrates is of key interest to the mixed-signal design community. For state-of-the-art circuits, chip-level verification which excludes the effects of substrate coupling is of questionable validity. As a result, substrate modeling for circuit simulation has been the focus of much research in recent years. Early work in this area [ 2 ] used a box integration technique to construct 3-D rectangular RC mesh networks as equivalent circuit representations of the modeled substrates. The mesh topology could be correlated to the circuit's physical design by distributing grid points according to the layout features on relevant fabrication photomasks. Unfortunately, layout-driven rectangular grid generation is prone to substrate "overpartitioning", which yields unnecessarily dense grid crowding in many regions of the chip. The strategy produces enormous circuit networks, even for moderately sized layouts. Since a primary objective of equivalent circuit macromodeling is to build simulation-ready networks, the inordinate complexity of the generated models is self-defeating -subsequent simulation on conventional CAD workstations becomes virtually impossible.
To address the complexity issues, intermediate processing is required to approximate the generated linear RC networks by smaller circuits which exhibit similar electrical properties. Since a typical mesh is dense and three-dimensional, only a small percentage of network nodes, called ports, are physically connected to the external circuit (at the top surface of the modeled substrate). In theory, an "equivalent" network can be formulated by eliminating a substantial fraction of the internal nodes. The resulting network is appropriate for simulation if its port characteristics remain consistent with those of the original mesh. This technique is generally referred to as network reduction.
Asymptotic Waveform Evaluation (AWE) [3] has been proposed as a method to reduce RC networks for mixed-signal switching noise analysis [4] . The AWE algorithm approximates a network's multiport behavior by recursively calculating the moments of the port characteristics and then fitting these moments to pole-residue functions via the Pad6 approximation. A well-known problem with this technique is that calculation of the higher moments is inherently ill-conditioned-increasing the number of poles used to model a given network does not guarantee a better approximation. Heuristic methods have been developed to address this issue (e.g., see references in [5]), but only at the cost of increased computational complexity. Another problem with AWE relates to the stability of the network approximation. While asymptotic stability is maintained by eliminating positive poles, absolute stability is not easily ensured. Consequently, non-physical, artificial oscillations may appear during subsequent transient simulations.
Network models detailed enough to accurately predict the chip-level impact of substrate coupling are, by necessity, very complex. It is not surprising that tangible results have been demonstrated only for small device-scale examples, as existing methods possess inherent limitations which render them impractical for even moderately sized circuits. The mixed-signal design process can be greatly enhanced by the development of software tools which can efficiently extract accurate chip-level simulation substrate models directly from a physical design specification. Reliable verification of circuit functionality obviously reduces the length of the overall design cycle and promotes the likelihood of first-time silicon success. Perhaps more importantly, robust new noise reduction techniques can be developed more rapidly if capabilities exist to accurately assess and analyze the impact of switching noise in proposed design methodologies.
We propose a substrate modeling strategy which addresses the mesh complexity issue at bofh the model generation and network reduction levels. For initial mesh generation, a well-known geometric construct can be efficiently applied to overcome the single most important drawback of rectangular mesh formulation methods-that is, localized mesh refinement, often required in regions of dense switching activity, is propagated to distant layout regions where a coarser mesh might otherwise be adequate. By using a non-rectangular gridding method based on Voronoi tessellation, we extract a mesh which automatically and locally adjusts itself to the density of substrate features as inferred from the layout specification. Mesh extraction based on this approach generates substrate circuit networks containing orders of magnitude fewer circuit nodes than those of conventional gridding techniques. A description of the substrate model formulation is given in Section 2.
In spite of the improved model generation technique, extracted full-chip substrate networks still promise to be exceedingly complex. For model reduction, we demonstrate a new multiport algorithm which fully exploits the pure-RC property of our formulated networks and directly generates reduced equivalent circuit models in a well-conditioned manner. Using congruence transformations, full-network conductance and susceptance matrices are transformed to reduced equivalents which can be directly realized with resistors and capacitors. The approximated networks are guaranteed to be passive, and thus well-behaved in subsequent simulations. Proper formulation of the transformation ensures that the networks possess a minimal number of intemal nodes, and yield a specified accuracy from DC to a maximum frequency of interest. The requisite transforms are generated using a symmetric Lanczos method which exploits the specialized structure of the extracted substrate networks. Required matrix inversions are performed using efficient methods which also profit from the problem symmetry. Since matrix inversion often accounts for a substantial network reduction bottleneck, this strategy can be significantly faster than general AWE methods, which employ nonsymmetric techniques. Section 3 provides the theoretical details and an implementation description of the proposed substrate model network reduction methodology. The non-rectangular gridding strategy and the congruence-transformation-based network reduction algorithm combine to form a unified, efficient strategy for developing parasitic substrate models for mixed-signal circuit simulation and design verification. The overall approach has been applied to several mixed-signal design examples, which we present in Section 4.
2: Substrate model extraction via Voronoi tessellation
A popular and physically-based approach to parasitic substrate modeling employs an equivalent mesh representation of the modeled substrate [2] , [4], [6] . A common drawback of previously reported modeling strategies, however, is that the derived networks ultimately contain circuit nodes in substrate regions where they are not required to obtain accurate simulation results. As emphasized in our introduction, subsequent mesh processing typically involves network reduction, and, ultimately, simulation. Since the computational efficiency of these procedures is directly impacted by the complexity of the generated network, it is of enormous advantage to constrain the size of the original mesh by adopting techniques to formulate more compact models. For the mesh generation phase of substrate modeling, we employ a non-rectangular substrate discretization based on geometric methods called Voronoi tessellation and Delaunay triangulation [7] . The derived mesh efficiently conforms to the substrate feature topology as dictated by the physical layout of the circuit. This section summarizes the modeling approach.
Our strategy is motivated by the observation that most real chips contain areas of intricate complexity surrounded by comparatively large regions with little structural detail. In gate anay, standard-cell, and most custom designs, large chip areas contain no active devices but are dedicated to routing channels. Since transistors and contacts are the primary sources and collectors of noise current, it makes sense to partition the chip according to the "localized" densities of relevant substrate features.
Voronoi tessellation is a geometric method which subdivides the Euclidean plane according to a distribution of point sites. Applied to substrate modeling, the sites are determined during layout extraction by the locations of the transistors, well edges, and substrate tie-downs. For each site, the tessellation assigns a convex polygon which spans the region of the plane that is nearer to the enclosed site than to any other site. The polygons partition the substrate surface into a convex net called the Voronoi diagram. Fig. l (a) illustrates the Voronoi diagram for a random collection of tessellation sites. Circuit nodes corresponding to the sites are used to define a nodal mesh topology in the (x,y) plane, which we term a nodeplane. Multiple nodeplanes are then stacked on top of each other and interconnected site-to-site to extend the mesh in the z-direction. The extent of physical separation between adjacent planes largely depends on the substrate doping profiles, and the total mesh thickness corresponds to the thickness of the modeled substrate.
Connecting each pair of sites that share a common Voronoi edge is called Delaunay triangulation. Fig. 1 (b) shows the triangulation for the sites in (a). The edges of the triangulation physically connect adjacent sites in each nodeplane and can be used to represent the branches of a corresponding network. The model- ing strategy combines information about the substrate properties and the geometries of the individual Voronoi polygons and associated Delaunay triangulation edges to derive the values of the resistors and capacitors which comprise the network. To accommodate the depletion capacitance associated with the well junctions, special sites are introduced in pairs which straddle the well boundaries. As a result, the corresponding triangulation is always perpendicular to the well edges, and modeling the junction capacitance is simplified. Fig. 2 shows a CMOS D flip-flop layout and a pictorial representation of the triangulated network topology used to model the top ( x y ) substrate nodeplane.
The tessellation procedure is computationally efficient, with an O(N1ogN) time complexity and O(N) memory requirement. Since the (x,y) mesh topology is duplicated plane-to-plane, N is the number of sites (i.e., circuit nodes) in a single nodeplane. Strictly speaking, it is the Delaunay triangulation which forms the basis for the network topology. But since the tessellation and triangulation form mathematical duals, the triangulation is derived at no additional expense. In [8] it was shown that the quantity of circuit nodes in models generated by Voronoi tessellation are orders of magnitude fewer than in networks derived by rectangular discretization techniques. The accuracy of the approach has been verified by both detailed 2-D device simulation, and, for small circuits, by SPICE circuit-level simulation. Extensive comparisons were made to networks obtained by conventional gridding methods. The simulation results and details of the Voronoi-tessellated network formulation appear in [8].
3: Efficient RC network reduction using congruence transformations
In spite of the reduced complexity achieved by improved model generation, the extracted substrate networks are still too large for conventional circuit simulation. Some method of mesh reduction is required to approximate the networks using simpler equivalent circuit models. To maintain accuracy, efficiency, and flexibility, an acceptable network reduction algorithm should meet the following requirements:
The simulation macromodels must accurately approximate the multiport admittance of the original RC network from DC to a specified maximum frequency.
To obtain accurate, physically-based simulation results, the reduced substrate networks must be passive so that absolute stability is preserved.
For efficient simulation, the reduced network models must contain a near-minimum quantity of internal nodes and branches.
To maintain compatibility with an assortment of circuit simulation and timing analysis software, the reduced models must be realizable with standard SPICE-compatible circuit elements.
As mentioned in Section 1, Asymptotic Waveform Evaluation has been proposed as a technique for reducing the computational expense incurred by the simulation of substrate model circuit networks. Some fundamental drawbacks of AWE (ill-conditioning, no guarantee of absolute stability, etc.) were discussed ear- ,-. lier. Feldman and Freund recently introduced a new multiport linear circuit approximation technique called Matrix Pad6 Via a Lanczos-type process (MPVL) [9] . In contrast to AWE, this method avoids using ill-conditioned recursive moment calculations for pole-residue formulation, but the issue of reduced network stability and passivity has not yet been addressed. And because MPVL is a general technique, the algorithm does not capitalize on the unique properties of pure RC networks -as a result, relatively inefficient matrix techniques must be used for the required matrix inversions.
In this work, we propose a multiport algorithm which generates passive circuit approximations of the extracted substrate networks. The network poles are retained from DC to a specified upper frequency limit, within a specified error bound. Using congruence transformations, the full-network admittance matrices are transformed to simpler equivalents, which are used to generate SPICE-compatible RC netlist representations of the reduced circuits. Proper formulation of the transformation ensures that the networks possess a minimal number of internal nodes. The approach capitalizes fully on the inherent properties of the substrate models. This section summarizes the reduction strategy.
3.1: Multiport admittance matrix formulation
The admittance of an RC network with m ports and n intemal nodes can be represented by the conductance and susceptance matrices, G and C. These matrices have 1 = m+n rows and columns and relate voltage and current in the frequency domain by
Here, x and b are column vectors with I rows representing nodal voltages and injected currents, respectively, and s is complex frequency. Since the network contains only resistors and capacitors, both G and C are symmetric. If the resistors and capacitors are real and positive, then G and C are non-negative definite, which means that none of the eigenvalues of either matrix is negative.
A logical partitioning of the voltage vector, x, in (l), orders the entries which correspond to the port nodes first, followed by the internal node entries, i.e., 2 Y 2 + ..., are found by first expanding (D+sE)-' of (5) as a Taylor series about s = 0, i.e.,
Substituting ( where Note that (8)-(11) can be used to recursively generate the moments of Y ( s ) in a manner similar to AWE. However, due to symmetry, two moments can be matched for each set of m forward and backward substitutions. In contrast, AWE matches only one moment for the same amount of computational effort. Also, the matrix to be inverted, D, is symmetric positive definite with negative off-diagonal elements. This property is significant because techniques more efficient than sparse LU factorization (e.g., Cholesky factorization [lo] ) can be used for inversion. Finaliy, we do not apply (8)-(11) directly, as this would yield the same ill-conditioning problem from which AWE suffers. Instead, we utilize the congruence transformation, which can be used to efficiently reduce G and C in a well-conditioned manner and, at the same time, preserve the moments, Y,, of @)-(lo).
3.2: Accurate moment matching by congruence transformations
A congruence transformation applied to A is defined as the transformation B = VTAV, where V is square and non-singular. V is referred to as the congruence transform, and the matrices A and B are said to be congruent. A fundamental property of congruence transformations is that they preserve the eigenvalues of the generalized eigenvalue problem demonstrated in (6) above (see [Ill, chapter 8) . The eigenvalues of (where V is square and non-singular) are identical to those of (6). This property is useful since, if eigenvalues are preserved, the poles of Y(s) are also retained. In this work, we make use of network transformations using the transform, V, which has n rows and k linearly independent columns, with k 5 n. If k < n the transform is an incomplete congruence transform, but we henceforth refer to all cases as congruence transforms and identify the special cases where V is square.
In 
Y'(s) = (A+SB)-(Q'~+SR'~)(D'+~E')-~ (Q'+~R'). (18)
Substituting (14)-( 17) in (13) yields
The transformations using V, shown in (14)- (17), can be represented as congruence transformations on G and C, i.e., are reduced in size (relative to G and C, respectively), which is our primary objective. In this case, however, VV-' # I (although V-'V = I) so that Y'(s) # Y(s) and the transformed network has port characteristics which differ from those of the original network -non-square congruence transforms do not retain all the eigenvalues of (6), so the poles are not preserved.
To circumvent this difficulty, we note that, even if k < n , a build block of V I matrices, G and C. The last n rows and columns of G form the D matrix, which is factorized. The congruence transform, V, is then built iteratively using a symmetric Lanczos procedure. The process terminates when the "dominant" poles of the reduced network multiport admittance converge within the specified error tolerance. All pole frequencies less than the specified maximum frequency are automatically included. Program output is a SPICE-compatible RC netlist containing the reduced network.
In general, the derived G' and C' matrices contain unwanted (and non-converged) high frequency poles. To ensure that the final network is as simulation-efficient as possible, the unnecessary outlying poles are removed in a post-processing step. A square congruence transform is formulated and applied on E' and D'. The transformation is chosen to diagonalize the resulting congruent matrices without affecting the behavioral properties of Y'(s). Since the two matrices are diagonal, each intemal node is associated with a single pole of Y'(s). The undesired poles and their associated network intemal nodes can then be removed via transformation with a non-square congruence transform. The poles within the range of interest are unaffected, and the first two moments of Y(s) are preserved. Since every post-processing transform is representable as a congruence transformation on (G'+sC'), network passivity is retained.
The algorithm employs a positive definite symmetric sparse matrix solver, which uses nested dissection [lo] to pre-order the nodes for efficient factorization. With minimal computational expense, nested dissection provides near-optimal ordering for mesh-like networks, and the resultant matrix factorization time complexity is O(n'3. Here, n is the number of intemal nodes in the original substrate network. The single most time-consuming operation is the factorization of D. By applying positive definite symmetric sparse matrix techniques, we determine the factorization an order of magnitude faster than we would using general non-symmetric methods. Cumulatively, the forward-backward substitutions account for the next most expensive computational operations. Our formulation requires m of these for each pair of moments that are matched, where m is the number of port nodes.
A few remarks are in order regarding the similarities and differences between our approach and other linear network approximation techniques. On many accounts, the congruence transformation method offers substantial advantages over both AWE and MPVL. The first relates to performance. Our algorithm is specifically tailored for purely RC networks, whereas the AWE and MPVL methods are general (i.e., they are able to accommodate any linear network). With our strategy, problem formulation ensures that the D matrix is positive definite and, consequently, more efficient specialized sparse matrix techniques are applied to determine D-'. It is important to note that we do not forsake accuracy for speed. The ill-conditioned higher order moment matching problems associated with AWE are well-known. Like MPVL, we overcome this pitfall by employing the Lanczos method. However, our formulation exploits symmetry in a way that published MPVL methods do not. Since the matrices D and E are symmetric, the span constraint of (23) is related to the generalized symmetric eigenvalue problem. Consequently, we employ a symmetric Lanczos algorithm. MPVL employs a non-symmetric Lanczos process, which requires twice as many forward-backward substitutions. As mentioned above, our exewell doping (n-type) channel stop doping cution profiles revealed that these seemingly benign computational operations are responsible for a significant portion of the total CPU time. Finally, we highlight the more fundamental differences. Our method directly generates admittance matrices which contain minimal quantities of poles and branches. The G' and C' matrices are easily "un-stamped" to directly realize the reduced RC network in a SPICE-compatible netlist format. More importantly, the reduced network is guaranteed to be passive, so subsequent circuit simulations are well-behaved.
4: Applications to mixed-signal circuits
ing strategy applied to four mixed-signal circuit layouts:
In this section we report on the results of the proposed model-1) A three-stage CMOS ring oscillator adjacent to a sensitive "analog" NMOS transistor (ro3+FET). The ring oscillator circuitry injects noise into the substrate, and we monitor the simulated voltage signal at the body terminal of the analog transistor.
2) A digital frequency divider and nearby analog current source (FreqDiv+lsrc) associated with a CMOS PLL. The subcircuits are separated by approximately 150pm. The divider circuitry switching transients are coupled through the substrate to degrade the operation of the current source. We monitor the simulated current source signal to study the impact of the switching noise on its designed value.
3) A CMOS operational amplifier completely surrounded by a ring oscillator (ro+OpAmp). We examine the impact of substrate coupling by monitoring the simulated op amp output signal resulting from a small-signal sinusoidal input.
4) The same analog op amp shielded by an ohmic guard ring (ro+OpAmpGua). We monitor the simulated noise on the amplifier output signal relative to the noise generated with no shielding (i.e., the result using ro+OpAmp, circuit (3)).
All substrate models were extracted directly from layout data using the process technology and mesh properties listed in Table  1 . Level 3 transistor parameters derived from MOSIS parametric tests were used for SPICE simulations. The extracted and reduced network parameters for the four circuits are tabulated in Table 2 , which also specifies the computational cost and network pole retention data associated with the mesh reduction runs. Mesh generation times are also listed. The frequency below which all poles are retained is determined by the error bound at the specified maximum frequency. We use a tolerance of 5% at 1 from the effects of the digital subcircuit. In our final examples, ro+OpAmp and ro+OpAmpGua employ a linear amplifier circuit completely surrounded by a ring oscillator, and in the latter case, the amplifier is shielded by an ohmic guard ring. Figures 6 and 7 show four simulation waveforms, all representing the output signal of the amplifier with a 5 MHz small-signal sinusoidal input. For the first simulation (Fig. 6, curve A) , we eliminated the source of noise by de-activating the ring oscillator. In the next simulation, we induced substrate-coupled switching noise by allowing the oscillator to run freely. The impact on the amplifier output is readily apparent in Fig. 6, curve B. Fig. 7 shows the results for the same simulations, except the substrate-coupling is inhibited by the presence of the guard ring.
Simulation CPU measurements for our test circuits with coupled substrate models are shown in Table 3 . Excepting ro3+FET, the non-reduced substrate networks are far too complex for post-layout circuit analysis. For that single circuit, however, the speedup obtained with the reduced network is substantial. The simulations for the amplifier circuits are inordinately lengthy since we enforced very small transient timesteps to obtain the level of waveform detail shown in Figures 6 and 7 . algorithm exploits the matrix symmetries which characterize our formulation. Table 4 shows empirically-obtained data for the CPU times required to factorize the D matrix for each substrate mesh network discussed in this section. We first employed sparse Cholesky factorization, which is standard for our system. Additionally, we implemented a version of the mesh reduction program which called the non-symmetric sparse LU factorization routines in Sparse [14] . The symmetric methods are faster by an order of magnitude. By comparing Table 4 data to the mesh reduction CPU times in Table 2 , it is also evident that the D factorization accounts for a substantial fraction of the overall run time. Tailoring the reduction algorithm to exploit the pure-RC properties of the extracted substrate networks improves the computational performance significantly.
5: Conclusions
We report on two techniques that greatly reduce the computational limitations associated with equivalent-circuit-based substrate coupling analysis in mixed-signal integrated circuits. RC substrate mesh networks are generated by applying Voronoi tessellation to layout-derived substrate feature data. The networks are efficiently approximated to any level of accuracy using congruent admittance matrices in conjunction with a well-conditioned Lanczos moment-matching process. Due to their enormous complexity requirements, previous approaches have been rigorously demonstrated only on small device-scale examples. With the aim of applying global substrate modeling to realistically large circuits, the proposed techniques make significant strides towards the development of a software methodology capable of analyzing mixed-signal noise behavior in a cognizant, quantitative fashion. 
