Abstract-As on-chip circuits have scaled into the deep submicron regime, electromagnetics-based analysis has increasingly become essential for high-performance integrated circuit (IC) design. Not only fast, but also high-capacity electromagnetic solutions are demanded to overcome the large problem size facing on-chip design community. In this paper, we present a novel, high-capacity, and fast approach to the full-wave modeling of 3-D on-chip interconnect structures. In this approach, the interconnect structure is decomposed into a number of seeds. In each seed, the original wave propagation problem is represented as a generalized eigenvalue problem. The resulting eigenvalue representation can comprehend both conductor and dielectric losses, arbitrary dielectric and conductor configuration in the transverse cross section, and arbitrary material. A new mode-matching technique applicable to on-chip interconnects is developed to solve large-scale 3-D problems by using 2-D-like CPU time and memory. A junction matrix acceleration technique is proposed to speed up the mode matching process. A fast frequency sweep technique is employed to obtain the response over the entire frequency band by solving at one or a few frequency points only. An extraction technique is developed to obtain S-parameters from the solution of the eigenvalue system. The entire procedure is numerically rigorous without making any theoretical approximation. Experimental and numerical results demonstrate its accuracy and efficiency.
I. INTRODUCTION
W ITH continued breakthrough in processing technology, over one billion transistors can now be integrated on a single chip. All these transistors are connected via interconnect lines. Interconnect design becomes one of the biggest design challenges in today's and next generation very large scale integration (VLSI) design. To address the increased complexity, over the past three decades, interconnect design methodology has experienced a series of transitions: from lumped resistance (R), lumped resistance and capacitance (RC), distributed RC, Manuscript to distributed resistance, inductance, and capacitance (RLC) models. As the clock frequency of microprocessors entered the gigahertz regime, since it is necessary to analyze the chip response to harmonics 5 times the clock frequency, semiconductor industry started to validate RLC-based interconnect extraction at tens of gigahertz. Good agreement has been observed between measured data and static RLC-based modeling on purely 2-D test chip structures [1] . However, significant mismatch was observed at multigigahertz frequencies on 3-D interconnect structures [1] . In contrast, full-wave electromagnetics-based modeling accurately captured the measured behavior over the entire frequency band [1] . The mismatch between RLC models and measurements was shown to be attributed to the decoupled and in static modeling, and the capacitance and inductance were extracted in an independent fashion. This finding demonstrated the importance of electromagnetic (EM) analysis in high-frequency on-chip interconnect design. However, on-chip interconnect structures present many modeling challenges that are less pronounced in board and package level interconnects, which hinders the direct use of many traditional full-wave modeling techniques in realistic on-chip design. These challenges include high conductor loss, large aspect ratio, large number of conductors, the presence of silicon substrate, and strong nonuniformity [2] . A few techniques have been developed in recent years to address the full-wave modeling of on-chip interconnects. A 2-D finite difference time domain (FDTD) approach [3] - [9] has been developed for the full-wave modeling of on-chip transmission line structures. For 3-D interconnect structures, an FDTD solver was recognized as computationally expensive. This is not only because the large number of unknowns resulted from 3-D volumetric discretization, but also because the small geometrical sizes of the on-chip problems and the rapid field variation inside conductors. Different from FDTD, the partial element equivalent circuit (PEEC) method is an integral equation based method [10] - [16] . PEEC was first utilized to solve quasi-static problems. It was then extended to full-wave analysis [15] . Recently, surface-based PEEC methods have also been developed which reduce significantly the number of unknowns involved in a volume integral equation based PEEC method [13] , [14] . Since integral equation based methods deal with dense matrices, a direct solver would require operations and storage in dealing with number of unknowns, which is computationally expensive. The surface-based PEEC is under way to be accelerated by fast algorithms such as the fast multipole method, fast Fourier transform (FFT)-based method, adaptive integral method, and fast QR-based methods [17]- [21] . Another surface integral equation based formulation was developed in [24] . A fast precorrected FFT scheme was formulated to accelerate the iterative solution of the dense matrix equation. The multilayered dielectric has not yet been accounted for. Integral equation based methods have also been developed in [22] - [26] for the application to on-chip problems. It is expected that an integral equation based solution that takes all the on-chip intricacies into consideration, and accelerated by the fast algorithms will soon be completed.
However, ultra large scale integrated circuit (IC) design results in numerical problems of ultra large scale, requiring billions of parameters to describe them accurately. To solve number of parameters, the optimal computational complexity one can hope for is linear complexity . However, even is inadequate in practice since is too large in practical integrated systems. Therefore, it is of paramount importance to develop high-capacity electromagnetic solutions that can overcome the problem size to achieve with , also in a rigorous fashion. In this paper, we propose a fast frequency-domain eigenvalue-based method for fullwave modeling of large-scale 3-D on-chip interconnect structures. Initial study of this method has been briefed in [27] . In this method, the complexity of 3-D interconnects is overcome by seeking the full-wave solution of a few 2-D problems, which are then postprocessed to obtain the solution of the original 3-D problem. In this paper, we will introduce segmentation of the interconnect structure and identification of the structure seeds, detail the eigenvalue-based formulation, present a new on-chip mode-matching technique that was not described in [27] , introduce a junction matrix acceleration technique, present a fast frequency sweep analysis, and demonstrate the accuracy and high capacity of the proposed method by a number of numerical and experimental results.
II. FORMULATION

A. Segmentation of the Interconnect Structure and Identification of the Structure Seeds
To model a 3-D interconnect structure, first we slice it into segments. Each segment has a constant cross section. The segmentation direction, i.e., the longitudinal direction, is chosen from the -, -, and -directions to minimize the number of unknowns in the transverse cross section. This is because the transverse cross section is numerically solved while the longitudinal direction is analytically processed in the proposed method.
After the segmentation, a set of unique structure seeds are identified. Take a typical Manhattan-type bus structure made of six metal layers as an example. Fig. 1(a) -(c) depicts its top view, end view (cross-sectional view), and side view respectively. Without loss of generality, assuming the bus is segmented along -direction. This results in a large number of segments in a large-scale on-chip structure. However, the number of unique structure seeds is only a few. For a typical bus structure shown in Fig. 1 , the number of structure seeds is only eight. This can be seen more clearly from Fig. 1(b) . In each cross section, conductors in parallel layers (M2, M4, and M6 which are parallel to ) remain unchanged whereas the orthogonal lines in M1, M3, and M5 can be present or absent. This generates eight structure seeds. If we use three digits to describe each structure seed, the eight structure seeds can be represented by , respectively. The first, second, and third digits correspond to M5, M3, and M1 layers, respectively. For each digit, value 0 denotes the absence of the orthogonal lines in that layer, whereas value 1 denotes its presence. If the orthogonal lines are aligned in each layer, the total number of unique structure seeds is only 2. One refers to the presence of all of the orthogonal conductors. The other denotes their absence. The structure seeds are repeated in different lengths along the longitudinal direction, constructing the entire structure.
Note that the width (along ) and the spacing (along ) of the M2, M4, and M6 wires can be arbitrary. The width (along ) and the spacing (along ) of the M1, M3, and M5 wires can also be arbitrary. The width and spacing are not required to be equal to generate eight structure seeds. If M2, M4, and M6 wires have different lengths (along ), or M1, M3, and M5 wires have different lengths (along ), the number of structure seeds will be increased. However, the number of seeds is generally much smaller than that of segments in realistic on-chip interconnect structures.
The aforementioned scheme of segmentation and identification of structure seeds applies to interconnects with vias. For simplicity, consider a two-metal-layer power grid structure. This structure involves an orthogonal layer in M1 (orthogonal to ), and a parallel layer in M2 (parallel to ). Different from a bustype structure, the orthogonal line in a power grid can be a VCC (power), a VSS (ground), or can be absent. As shown in Fig. 2 , this structure involves only 3 structure seeds regardless of the width and spacing of M2-layer wires. In Fig. 2 , VCCs are represented by shaded wires. The unshaded ones are VSSs. Vias can only exist between like wires.
B. Eigenvalue-Based Solution
Inside the interconnect structure, the electric field satisfies the second-order vector wave equation (1) subject to certain boundary conditions such as (2) In (1),
, and denote the relative permeability, relative permittivity, and conductivity, respectively, is the computational domain which is the cross section of a structure seed including both dielectric and conducting regions, is the boundary where the Dirichlet boundary condition is applied,
is the boundary where the Neumann boundary condition is applied. The truncation boundary is generally placed far away from the structure to minimize the impact of the truncation on the fields inside the computational domain. Either Dirichlet or Neumann boundary condition can be imposed on the truncation boundary without loss of accuracy when the truncation boundary is placed far away. The distance of the truncation boundary to the structure can be adaptively determined by varying the distance until the field solution does not change with the distance.
The incorporation of conductivity in (1) allows for the accurate modeling of both conductor and dielectric losses. This is of great importance for the accurate simulation of on-chip interconnects in which skin effects are dominant. By applying variational principle [28] , it can be demonstrated that solving the boundary-value problem defined by (1) and (2) is equivalent to seeking the stationary point of the following functional [28] , [29] (3) where refers to the complex relative permittivity which comprises and . Each structure seed has a constant cross section, and hence, wave propagating along longitudinal direction is analytical. Therefore, the -dependence of all field components is , in which is the propagation constant. It is a complex number due to the conductor loss. 
where denotes the number of basis functions per element, and denotes the corresponding expansion coefficient. The longitudinal field is represented by node basis functions [28, pp. 95-97] as (6) in which is the corresponding expansion coefficient. The discretization of the variational problem (4) by using (5) and (6) results in a generalized eigenvalue problem (7) in which and are complex-valued matrices. Their matrix elements are given by (8) Obviously, the eigenvalues of (7) correspond to the propagation constants, whereas the eigenvectors characterize the transverse and longitudinal fields. Once (7) is solved, the electric field in each structure seed can be obtained as (9) which is a superposition of all of the forward and backward propagation modes that can be supported by the structure. It should be noted that the field in (9) has all three components , and . Equation (7) can be denoted as (10) It is solved by using an Arnoldi-based krylov subspace projection method [30] . To expedite the convergence, (10) is divided by at both sides. In addition, it is transformed to the following eigenvalue problem through a shift-invert operation:
(11) in which is chosen from the propagation constants of typical on-chip interconnect structures.
It is worth mentioning that the eigen-system (7) only needs to be solved for each unique structure seed, the number of which is many orders of magnitude less than that of the segments. This, in part, contributes to the efficiency of the proposed method.
C. On-Chip Mode-Matching Technique
The unknown coefficients and in (9) are determined by imposing field continuity condition at each junction. As an example, consider an arbitrary junction along in Fig. 3 , which separates regions I and II. Fig. 3 is a -plane view of this junction. In region I, there are M1, M2, M4, and M6 layers, whereas in region II, M2, M4, and M6 are present. The electric field in region I can be represented as (12) in which , and denotes the number of modes in region I. Similarly, the electric field in region II can be written as (13) in which , and denotes the number of modes in region II. To impose field continuity condition at each junction, we develop a mode matching technique that is effective for on-chip interconnects as follows.
First, we construct the following two continuity conditions:
in which (16) Equation (14) enforces the tangential continuity of the electric field at the junction, whereas (15) enforces the normal continuity of the current. Next, we test (14) by , we obtain equations (17) Then we test (15) by , we obtain the other equations (18) Equations (17) and (18) yield equations at the junction. Combining this set of equations at each junction with the loading conditions, the unknown coefficients , and can be determined, and hence the field anywhere inside the interconnect structure is solved.
One thing worth mentioning is that we can also test (14) by eigen-modes in region II, ; and (15) by eigen-modes in region I, . However, this testing is necessary but not sufficient for the specific junction shown in Fig. 3 . To be sufficient, when testing , we choose testing modes from the segment that involves more conductors; when testing , we choose them from the one that involves fewer conductors. This is because the field continuity condition at the junction needs to be satisfied at both low and high frequencies. At high frequencies, on-chip wires behave more like conductors in traditional full-wave applications because the skin depth approaches to zero. It is known that the tangential must vanish on a perfect conducting surface. Although the conductor in real applications is not perfect, the zero tangential is still a very good approximation at high frequencies. Therefore, this type of hard boundary conditions needs to be enforced at the junction by choosing testing modes from the segment that involves more conductors when testing .
The number of modes is chosen based on the convergence of the numerical solution. If only dominant modes (quasi-TEM) are important (i.e., other evanescent modes are attenuated very rapidly) in the frequency band of interest, (14) and (15) in which is the path from the th conductor to the ground, and is the cross-sectional area of the th conductor. The ground is the physical ground present in the structure, for example, the bottom of the silicon substrate. For the realistic on-chip interconnect structures we have tested, it is observed that the number of propagation modes is orders of magnitude smaller than the number of unknowns in a transverse cross section. In addition, the number of propagation modes that can be observed in the current operating frequency band is limited.
D. Junction-Matrix Acceleration Technique
Matching the field continuity at the junction will result in a matrix, from which the unknown coefficients , and are determined. We denote this matrix as a junction matrix. Due to the large number of junctions in a realistic on-chip interconnect structure, the dimension of the junction matrix can be very large. Take an interconnect of segments as an example, its junction matrix resulted from (17) and (18) is shown in Fig. 4 , in which denotes the number of modes in the th segment, and denote the transmission, and reflection coefficients, respectively, in the th segment. The dimension of as well as is . Each boxed matrix in Fig. 4 corresponds to the matrix equations formed at one junction by using (17) and (18) . The matrices formed at the first and the last junction only involve unknown coefficients of one segment. Therefore, their dimensions are different from those of the other boxed matrices. In a boxed matrix denotes the submatrix associated with which are the transmission coefficients in the ( -1)th segment; is the submatrix associated with which are the reflection coefficients in the ( -1)th segment;
is the submatrix associated with ; and is the submatrix associated with . Take the second boxed matrix in Fig. 4 as an example, the eight submatrices can be written as (20a) The junction matrix is of dimension . It is sparse. Though sparse, its computation remains expensive when the size is large. To solve this problem, the following technique is developed.
Instead of solving the matrix as a whole, we reduce it to a matrix that only involves unknown coefficients and in the first and last segment, where the circuit parameters are extracted. Consider a submatrix formed at the -and th junctions, which involves unknowns in the ( -1)-, -, and th segments. This matrix can be reduced to involve unknowns in the ( -1)th and th segments only, as shown in Fig. 5 . The reduction cost is simply the inversion of the circled matrix. Its dimension is , in which is the number of modes in the th segment.
The reduction is achieved by first solving and in terms of , and ; and then transforming the original matrix accordingly by eliminating and . Mathematically, it can be performed as follows. First, we evaluate the following matrices: (21) in which satisfies (21a)
We then denote and as (22) The reduced matrix can then be obtained from the original one as the following: (23) By repeatedly applying the aforementioned reduction procedure, we obtain a matrix that only involves unknowns in the first and last segment, which can be readily solved.
E. Fast Frequency Sweep Technique
Equation (7) can be denoted as (24) where is the wave number, and denotes a vector. The dimension of is the size of matrices and . Because (24) is frequency dependent, it has to be solved at each discrete frequency point, which is time consuming. Here, we employ asymptotic waveform evaluation (AWE) method [31] - [35] to alleviate this problem. AWE has been applied to accelerate the methodof-moment and finite-element solution of scattering, radiation, waveguide, and circuit problems. It has not been applied to the fast frequency sweep analysis of on-chip interconnect problems yet. Here, we first expand and into Taylor series (25) where is the expansion point, and denote the unknown coefficients of and , respectively, and denotes the order of the expansion. Matrix and are also expanded into Taylor series with the same order . It is worth mentioning that to expedite the convergence of the eigenvalue solution, (10) is divided by at both sides. As a result, higher-order derivatives with respect to can exist.
Substituting (25) into (24) and matching the coefficients of the equal powers of on both sides lead to a set of equations relating and . By imposing the orthogonality condition (26) we obtain the following recursive formulation for and , shown in (27) at the bottom of the page, where and denote the th derivative of and at the expansion point . Since the bandwidth of Taylor expansion is limited, we convert Taylor series into a Padé rational function, which has a larger radius of convergence. The Padé expansion is as follows: (28) where . From (25) and (28), the unknown coefficients can be obtained by matching the coefficients of the equal powers of [32] .
F. Circuit-Parameter Extraction Technique
Solution of the fields enables the calculation of circuit parameters. When the transmission line model is valid, the RLGC parameters can be extracted in the following fashion. First, we obtain the eigen-voltage and eigen-current of each conductor for each mode by performing the following line or area integrals: (29) where denotes the index of mode; denotes the index of conductor; and represent the tangential, and longitudinal electric field of the th eigen-mode respectively. When transmission line model holds true, in the transverse plane that is perpendicular to the wave propagation direction, the potential is almost the same at each point on the conductor cross section. This is the reason the capacitance per unit length of a transmission line can be extracted by solving Poisson's equation. Therefore, when performing the line integral of (29) to evaluate voltage, one end point is selected at the physical ground, and the other end point can be arbitrarily selected on the conductor cross section.
Substituting (29) and propagation constant into the telegraph equation (30) we obtain the RLGC matrices. Note that in contrast to the RLGC matrices obtained via quasi-static solvers that employ a decou-(27) pled electric and magnetic field model, here, any coupling between and , no matter how weak, is captured accurately.
For 3-D interconnect structures or interconnects working at high frequencies, S-parameters become a necessity for an accurate description of their electrical behavior. To extract them, we first construct the total voltage and current of each conductor at the ports by performing the following integrals: (31) Since in circuit applications, S-parameters of interest are those of dominant modes, i.e., quasi TEM modes. For these modes, the distribution in the transverse plane obeys that of a static field, and hence the voltage obtained by using the line integral of (31) is not sensitive to the choice of integration path.
After extracting the total voltage and current of each conductor, we load each conductor by the reference impedance (usually in industry standard), and excite the conductor in turn, which can be mathematically represented as (32) The loading condition in (32) together with the boundary condition at each junction yield the solution of unknown coefficients , and , and hence, the total voltage and current of each conductor. As a result, the S-parameters can be obtained as follows: (33) 
III. NUMERICAL RESULTS
To test the accuracy of the proposed algorithm, we simulated a set of interconnect structures that were fabricated on a test chip using conventional silicon processing techniques. High resolution cross-sectional scanning electron microscopy and optical microscopy were used to measure the relevant dimensions of the fabricated structures. Parasitics signals were removed from the measured S-parameters using a de-embedding approach [1] , [36] . In Fig. 6 , we compare measured and simulated S-parameters of an on-chip three-metal-layer interconnect structure. It involves a 10-m-wide strip in metal 2 (M2) layer, one ground plane in metal 1 (M1) layer, and one ground plane in metal 3 (M3) layer. There are returns in M2 layers, but they are far from the strip. The strip is of a length of 2000 m. The S-parameters were measured at the near and far end of the strip line. As can be seen from Fig. 6 , the agreement of S-parameters is excellent in both magnitude and phase. Fig. 7(a) shows the simulated crosstalk of another 2000-m long on-chip interconnect versus the measured data. The crosstalk was measured at the near ends of the two centered interconnect lines in M2 layer as shown in Fig. 7(b) . Again, it can be observed that the S-parameters are in very good agreement. Fig. 7 (b) also shows the current distribution on a cross section at 40 GHz with the excited signal on the left in the M2 layer. The current induced on surrounding conductors exhibits a complex pattern, which reflects the mutual coupling among these conductors. In addition, we observe current induced in the lossy silicon substrate. The nulls observed in the substrate are due to the destructive interaction between reflected and transmitted waves.
Next, we simulated a crosstalk structure with 2000 orthogonal interconnects in M1 and M3 layers. The crosstalk was measured at the near ends of two adjacent lines residing in the M2 layer. Due to a nondisclosure agreement, the detail of the structure is not reported here. Fig. 8 depicts the phase of S12 obtained by the proposed full-wave modeling method in comparison with the measured data and that obtained by a static solver. Clearly, the static solvers exhibit a significant disagreement with the measured data especially at high frequencies, whereas the full-wave data produced by the present method exhibits an excellent agreement. In Fig. 8 , raw measured data without de-embedding is also given, which indicates the difficulty of on-chip measurements. The true signal can be significantly contaminated by parasitic signals which need to be de-embedded correctly.
To demonstrate the efficiency and accuracy of the proposed fast frequency sweep technique and junction matrix acceleration technique, we considered a three-metal-layer on-chip interconnect structure. The structure is of 300 m width. It involves a 10-m-wide strip in M2 layer, one ground plane in M1 layer, and one ground plane in M3 layer. The distance of this strip to the M2 returns at the left and right hand sides are 50 m, respectively. The strip is 0.285 m thick and 2000 m long. With a frequency step of 1 GHz, it takes the direct method 34.9 s to obtain the solution over 50 MHz 50 GHz frequency band. With two expansion points at 12.5 and 37.5 GHz, and a sixth-order Padé expansion , the proposed method yields an accurate solution over the whole band in 2.9 s. The speedup is 12 times. Fig. 9 shows the S-parameters simulated by the proposed techniques in comparison with the direct method, which reveals an excellent agreement. Since this structure only involves one segment along the longitudinal direction, the speedup is solely attributed to the fast frequency sweep technique. Next, we consider a 3-D on-chip interconnect that involves 2000 segments along the longitudinal direction. The M1 and M3 metal layers are The spacing between these two wires is 1.95 m. The distance to the M2 returns of both wires is 10.3 m. Fig. 10 depicts the simulated S-parameters in comparison with the results obtained from the direct method. With a frequency step of 1 GHz, the direct simulation takes 981 s to simulate the entire frequency band, whereas the proposed fast techniques speed it up by more than 39 times. The measured crosstalk is also given in Fig. 10 . Clearly, the proposed method accurately predicted the measured behavior.
IV. CONCLUSION
This paper presents a fast and high-capacity full-wave modeling technique for large-scale 3-D on-chip interconnects. This technique formulates a generalized eigenvalue representation of the original wave propagation problem that can comprehend arbitrary dielectric and conductor configurations in the transverse cross section, both conductor and dielectric losses, and arbitrary materials. A new on-chip mode matching technique is developed to solve large-scale 3-D problems with 2-D-like computational expenses. If one has computational resources to solve one 2-D structure seed, with the proposed method, he is able to solve the entire 3-D interconnect structure. The solution obtained from the proposed scheme satisfies both Maxwell's equations and boundary conditions, and hence constitutes a rigorous solution. The boundary condition in the transverse cross section is satisfied by the finite element procedure; while the boundary condition along the longitudinal direction is satisfied by the mode-matching process. A junction matrix acceleration technique is developed to speed up the numerical mode matching process. A fast frequency sweep technique is employed to avoid repeating the computation at each discrete frequency point. A comprehensive characterization of the interconnect is conducted, which includes S-parameters, full-wave RLGC, propagation constants, characteristic impedances, voltage, current, and field distributions.
Experimental and numerical results demonstrate its validity.
