An Algorithmic Framework for Efficient Large-Scale Circuit Simulation
  Using Exponential Integrators by Zhuang, Hao et al.
An Algorithmic Framework for Efficient Large-Scale Circuit
Simulation Using Exponential Integrators
Hao Zhuang1, Wenjian Yu2, Ilgweon Kang1, Xinan Wang1, and Chung-Kuan Cheng1
1Department of Computer Science & Engineering, University of California, San Diego, CA, USA
2Department of Computer Science & Technology, Tsinghua University, Beijing, China
hao.zhuang@cs.ucsd.edu, yu-wj@tsinghua.edu.cn, {igkang, xinan, ckcheng}@ucsd.edu
Abstract—We propose an efficient algorithmic framework for time-
domain circuit simulation using exponential integrators. This work
addresses several critical issues exposed by previous matrix exponential
based circuit simulation research, and makes it capable of simulating stiff
nonlinear circuit system at a large scale. In this framework, the system’s
nonlinearity is treated with exponential Rosenbrock-Euler formulation.
The matrix exponential and vector product is computed using invert
Krylov subspace method. Our proposed method has several distinguished
advantages over conventional formulations (e.g., the well-known back-
ward Euler with Newton-Raphson method). The matrix factorization is
performed only for the conductance/resistance matrix G, without being
performed for the combinations of the capacitance/inductance matrix
C and matrix G, which are used in traditional implicit formulations.
Furthermore, due to the explicit nature of our formulation, we do not
need to repeat LU decompositions when adjusting the length of time
steps for error controls. Our algorithm is better suited to solving tightly
coupled post-layout circuits in the pursuit for full-chip simulation. Our
experimental results validate the advantages of our framework.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - simulation
General Terms
Algorithms, Design, Performance
Keywords
Circuit Simulation, Transient Simulation, Exponential Integrators
I. INTRODUCTION
SPICE-like transistor-level circuit simulation [1]–[3] of integrated
circuits is indispensable during the cycle of very large scale integra-
tion (VLSI) designs. It is crucial to cost-efficient production of VLSI
chips. Smaller process geometries, larger designs as well as tighter de-
sign margins translate to the need for more accurate large-scale circuit
simulation, e.g., post-layout simulations with more detailed parasitic
couplings [4]–[6]. With advancing technologies, three dimensional
IC structures, and increasing complexities of system designs, the
numbers of chip components are easily more than millions [7], [8],
which make simulation tasks very time-consuming. Finding effective
algorithms still remains challenging and has always been an important
research topic for several decades [1]–[3].
Time-domain circuit simulation algorithm relies on numerical
integration to solve differential algebraic equations (DAE). To capture
circuit’s transient behaviors, numerical integration is computed step
by step till the end of whole simulation time span. Conventional
numerical integration formulations are forward Euler (FE), backward
Euler (BE), Trapezoidal (TR), Gear’s and multi-step methods [1]–
[3]. Despite explicit formulation, such as FE, avoids solving linear
system, the time steps are usually restricted for ensuring numerical
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
DAC ’15, June 07 - 11, 2015, San Francisco, CA, USA
Copyright 2015 ACM 978-1-4503-3520-1/15/06$15.00
http://dx.doi.org/10.1145/2744769.2744793
stability [2], [3], [9], [10]. In general, VLSI circuits form stiff and
nonlinear dynamical systems. Therefore, the implicit formulations,
e.g., BE, are much more stable, robust and widely deployed in VLSI
circuit simulators. Based on the implicit formulation of such nonlinear
system, Newton-Raphson (NR) iterations are typically adopted to
obtain solutions. Each NR iteration needs to linearize system and
then solves the linear system. During this process, direct solvers
[11], [12] have been favored and applied because of their robustness
and ease of use. However, it is known that direct solvers, e.g.,
LU decomposition, have super-linear computational complexities and
very expensive to simulate large-scale and strongly coupled circuit
systems (the cost of LU decomposition can approach the worst case
O(n3) [4], [5]). It is crucial to devise efficient algorithms to reduce
the numbers of expensive LU operations and NR iterations in the
circuit simulation. For example, leveraging the explicit formulations
is a research direction [9], [10].
Above formulations are all categorized into the low order approx-
imation of DAE, which is with the order typically lower than ten
[3]. The error of those formulations are proportional to the step size
due to local truncation error (LTE). Beyond those conventional low
order schemes, a high order matrix exponential based time integration
kernel [3] has been recently triggering academic researchers’ interests
because of the progress of efficient matrix function computation
using Krylov subspace methods [13]–[16]. In VLSI CAD and EDA
research, this exponential integration kernel has been applied in
time domain electromagnetic and technology semiconductor device
simulation [17], power distribution network analysis [18], [19] as
well as general circuit simulation [20], [21]. Those frameworks
provide analytical solution for transient simulation of linear system,
and better properties in numerical accuracy, stability and time step
controlling, etc [3], [16], [18], [19]. However, one drawback of
previous works for matrix exponential based circuit simulation [20] is
the slow convergence rate for matrix exponential and vector product
(MEVP) by using standard Krylov subspace [17], [20]. In addition,
the nonlinear formulation still requires NR iterations [17], [20], which
does not fully utilize the explicit nature from matrix exponential
formulation [14].
In this work, two major processes are devised to address above
challenges. First, we propose a new matrix exponential based frame-
work using exponential Rosenbrock-Euler integration formulation
[15], [16], which well suits nonlinear dynamical differential equation
problems and preserves explicit features. Then, we also design error
control scheme for adaptive time stepping. Second, we deploy invert
Krylov subspace method [19] to compute MEVP in an efficient
manner. The major contributions are listed as follows.
• By invert Krylov subspace strategy, we improve the convergence
rate for the MEVP computation, the key operation within the
matrix exponential based method. Besides, this approach also
removes the regularization process [20], [22], which would be
impractical for large designs with singular matrix C. Therefore,
this approach enables the application for the simulation of large
stiff VLSI designs.
• Proposed invert Krylov subspace strategy removes C from
1
ar
X
iv
:1
51
1.
04
51
5v
1 
 [c
s.C
E]
  1
4 N
ov
 20
15
matrix factorization processes. Building invert Krylov subspace
bases only needs to factorize G, which is much sparser and
simpler than C in strongly coupled post-layout circuits. In con-
trast, conventional implicit methods require LU decomposition
of the combinations of G and C. Therefore, our method is better
suited to handle those strongly coupled post-layout circuits in
the pursuit for full-chip simulation.
• We devise exponential Rosenbrock-Euler formulation for stable
explicit DAE solver, as well as correction term on top of that
to further improve the accuracy. The stability of the explicit
formulation is preserved by the high-order approximation of
exponential operator [13], [15], [16]. Thus, it takes only one
LU decomposition per time step, while conventional methods,
e.g., BE with NR (BENR), require at least two times of LU to
verify the convergence.
• Moreover, our method does not need to repeat LU when we
adjust the length of time steps for error controls. It is because our
matrix exponential based explicit formulation and (time-step)
scaling-invariant property of Krylov subspace [13], [20]. On the
contrary, the low order approximation schemes force time step
embedded in the linear matrix and conduct matrix factorization.
Once the time step is adjusted, LU is unavoidable in order to
solve the new linear system.
• We test our framework with baseline BENR against our large-
scale circuits. For some challenging test cases, our framework
can achieve speedups by magnitude, and even manage to finish
them when conventional counterpart fails.
The remainder of this paper is organized as follows. Sec. II
introduces the background of nonlinear circuit simulation. Sec. III
presents our new circuit simulation framework that utilizes exponen-
tial Rosenbrock-Euler method and related formulations. Sec. IV illus-
trates the MEVP computation uses invert Krylov subspace method.
Sec. V shows experimental results and validates our framework. Sec.
VI concludes this paper.
II. BACKGROUND
Given a circuit netlist and device models, time-domain simulation
of general nonlinear electronic circuit is formulated as
d~q(~x(t))
dt
+ ~f(~x(t)) = B~u(t), (1)
where ~x(t) ∈ Rn×1 denotes nodal voltages and branch currents and
n is the length of vector. ~q ∈ Rn×1 and ~f ∈ Rn×1 represent the
charge/flux and current/voltage terms, respectively. ~u(t) is a vector
representing all the external excitations at time t; B is an incident
matrix that inserts those signals to the system.
A. Circuit Simulation Using Low Order Integration Schemes
The conventional numerical approaches discretized Eq. (1) with
time step size hk = tk+1−tk. To compute the solution ~xk+1 at tk+1,
we start from an initial solution ~xk. The well-known BE, which is a
stable implicit integration scheme, leads to
~q(~xk+1)− ~q(~xk)
hk
+ ~f(~xk+1) = B~u(tk+1) (2)
Usually NR is a widely adopted method of solving this implicit
formulation Eq. (2). Each NR iteration, direct solver (e.g., LU
decomposition) is applied to solve
∂ ~T (~xi)
∂~x
(~xi+1 − ~xi) = −~T (~xi) (3)
until it is convergent, which means the difference of the solu-
tion from i-th iteration ~xi and ~xi+1 is “small enough”. ~T (~x) =
~q(~x)−~q(~xk)
hk
+ ~f(~x)−B~u(tk + hk) corresponds to Eq. (2). Then we
have ∂
~T (~xi)
∂~x
=
(
C(~xi)
h
+G(~xi)
)
∈ Rn×n is the Jacobian matrix of
~T , where C(~xi) ∈ Rn×n is a matrix of capacitances and inductances
linearized at ~xi; G(~xi) ∈ Rn×n represents the linearized resistances
and conductances as well as the incidence between voltages and
currents. This approach usually serves a de facto method in industrial
tools and academic research, which is called as BENR in this paper.
To solve Eq. (1), we can also deploy other low order implicit
discretized forms, such as TR and Gear’s, with NR.
Due to the low order implicit integration scheme, Eq. (3) embeds
time step hk in ~T and ∂
~T (~xi)
∂~x
(following the practice of SPICE).
If the estimated LTE violates numerical error budget, hk will be
reduced. Then new NR iterations for ~xk+1 are re-launched with the
updated hk. Besides, C always exists in Jacobian matrix ∂
~T (~xi)
∂~x
.
Large volume of non-zeros of C are introduced from the post-
layout parasitics extraction [23]–[25], resulting to huge computational
challenges for the capability of numerical integration algorithms [6]
and model order reductions [26]. In addition, the non-zero fill-ins of
off-diagonal terms in C and G are usually mutually exclusive in VLSI
circuits. For example, the matrices in Fig. 1 are from post-extraction
of a design FreeCPU [23]. We notice that the distribution of non-
zeros in C (Fig. 1(a)) is much more complicated than G (Fig. 1(b)).
After LU decompositions of those matrices, the factorized matrices
Fig. 1(c) and Fig. 1(d) of C, Fig. 1(g) and Fig. 1(h) of
(
C
h
+G
)
also contain much larger nnz (number of non-zeros) than Fig. 1(e)
and Fig. 1(f) of G.
To address above challenges, we use exponential integrators [14]
and architect a framework with stable explicit formulation. The next
sub-section briefs previous progresses of circuit simulation using
related exponential integration methods and the major problems,
which keep them from the large-scale circuit simulation.
B. Circuit Simulation Using Exponential Integrators
Recently, many works are proposed to solve time-domain simu-
lation problem using exponential integrators [17]–[21]. Chen et al.
[17] and Weng et al. [20] adopted a technique developed in [27],
which decouples the linear and nonlinear terms by approximating
the integrand in Eq. (4) with Lagrange polynomial.
~xk+1 = e
hkJ(~xk)~xk + (4)∫ hk
0
e(hk−τ)J(~xk)C−1(~xk)
(
~F (~xk + τ) +B~u(tk + τ)
)
dτ
where J(~x) = −C−1(~x)G(~x), and ~F (x) collects the nonlinear
dynamics at ~x from transistor models, such as BSIM3. The second-
order approximation yields ~xk+1 as the solution of a nonlinear
equation, which again can be solved by NR using Eq. (3). The
Jacobian matrix in Eq. (3) is C(~xi) + h
2
∂ ~F (~xi)
∂~x
. More details can be
found in [17], [21]. Therefore, it is similar to conventional method in
Sec. II-A. During NR iterations, the matrix factorization involves C in
the Jacobian matrix. Then it will suffer from the runtime degradation
when C is relatively large and complicated.
Another problem comes from the key operation of eJ~x, the matrix
exponential-and-vector product (MEVP), where J is a n×n matrix.
MEVP is computed by standard Krylov subspace [13], [17], [20],
[21]. Such Krylov subspace is constructed as
Km(J, ~x) := span{~v1, ~v2, · · · , ~vm} = span{~x, J~x, · · · , Jm−1~x}. (5)
An n×m orthonormal basis and an (m+ 1)×m upper Hessenberg
matrix H for the Krylov subspace Km are first constructed by Arnoldi
process [17], [20], [21]. Then MEVP is computed via
ehJ~v ≈ ‖~v‖VmehHm~e1 (6)
where Hm = H(1 : m, 1 : m), e1 ∈ Rm×1 and e1 is the first
unit vector. When A is mildly stiff, Hm is usually much smaller
2
Fig. 1: Visualization of post-extraction matrices’ non-zero elements distributions from a design FreeCPU [23], the sizes of matrix are 11417× 11417,
which are obtained from SPEF extracted by industrial tool Synopsys Star-RCXT. nnz is the number of non-zeros in the matrix. (a) Extracted capacitance
matrix C (non-zero entries distribute widely in the matrix). (b) Extracted conductance matrix G (there are many off-diagonal non-zeros in the matrix, but
the bandwidth is much smaller than C). (c) Lower triangular matrix LC and (d) Upper triangular matrix UC of LU decompose(C); (e) Lower triangular
matrix LG and (f) Upper triangular matrix UG of LU decompose(G); (g) Lower triangular matrix LC
h
+G
and (h) Upper triangular matrix UC
h
+G
of
LU decompose(C
h
+ G). The function of LU decompose uses MATLAB2013a UMFPACK. LG and UG contain much smaller nnz than LC , UC ,
LC
h
+G
and UC
h
+G
.
than original A [20]. However, when the values in C vary in
magnitudes caused by stiff circuit system, standard Krylov subspace
demands large dimension of subspace to approximate MEVP, then
degrades performance of matrix exponential-based circuit simulation.
Such phenomenon has been observed in power distribution network
simulation using matrix exponential framework [18], [19]. Besides,
C should not be singular during the process of standard Krylov
subspace strategy. Otherwise, related regularization process [20], [22]
is required, which is time-consuming for large-scale designs.
III. CIRCUIT SIMULATION USING EXPONENTIAL INTEGRATORS
BASED ON ROSENBROCK-EULER FORMULATION
In contrast to standard low order and previous matrix exponential
integration schemes, our framework adopts exponential Rosenbrock-
Euler method, and leads to a stable explicit method, which does not
require NR iterations [15]. To simplify the derivations from expo-
nential Rosenbrock-Euler to SPICE-like formulation, we consider a
non-autonomous ordinary differential equation (ODE) system,
d~x(t)
dt
= ~g(~x, ~u, t). (7)
Exponential Rosenbrock-Euler method is derived to compute ~xk+1
with step size hk as follows,
~xk+1 = ~xk + hkφ1(hkJk)~g(~xk, ~u, tk) + h
2
kφ2(hkJk)~bk (8)
where Jk denotes the Jacobian matrix of ~g, and ~bk = ∂~g∂t (~xk, ~u, tk).
[14]–[16]. φi(z) describes ODE’s time evolution, where φi(z) =∫ 1
0
ez(1−s) s
i−1
(i−1)!ds (i ≥ 1). We have φ0(hkJk) = ehkJk and In ∈
Rn×n is the identity matrix, then
φ1(hkJk) =
ehkJk − In
hkJk
, φ2(hkJk) =
ehkJk − In
h2kJ
2
k
− In
hkJk
. (9)
More details can be found in the works of Hochbruck and Ostermann
[14], [15]. The advantage of this formulation is the explicit nature and
the superior stable region (in the entire complex plane) than the class
of low order integrations schemes. Therefore φi functions permit a
large value for the step size hk with guaranteed stability [14]–[16].
To utilize above formulation in SPICE-like circuit simulation, we
apply the chain rule to Eq. (1),
d~q(~x(t))
d~x
· d~x(t)
dt
= B~u(t)− f(~x). (10)
Eq. (10) is linearized at the state ~xk, when d~q(~xk)d~x = C(~xk) = Ck.
We obtain the equivalent format for the non-autonomous dynamical
system
d~x(t)
dt
= ~g(~x, ~u, t) = Jk~x(t) + C
−1
k
(
~F (~x(t)) +B~u(t)
)
(11)
where Jk = −C−1k Gk and Gk = G (~xk). The circuit system
response is computed by high order function hkφ1(hkJk)~gk, where
~gk = Jk~xk + C
−1
k
(
~Fk +B~u(tk)
)
, (12)
and ~Fk = F (~xk). Furthermore, we assume ~u(t) is a piecewise-linear
function for t ∈ [tk, tk+1] in VLSI designs, so the contribution from
external excitations can also be captured by h2kφ2(hkJk)~bk [14]–[16],
where
~bk = C
−1
k B
~u(tk+1)− ~u(tk)
hk
(13)
The next time step solution ~xk+1 with hk is
~xk+1(hk) = ~xk + hkφ1(hkJk)~gk + h
2
kφ2(hkJk)~bk. (14)
Note the denominator of φi in Eq. (9) always cancels out C−1k in
Eq. (12) and Eq. (13). Hence, we avoid the matrix factorization of
Ck.
A. Local Nonlinear Error Estimator
The error estimator is an important ingredient of adaptive time-
stepping for the nonlinear systems. Based on the notion proposed by
Caliari and Ostermann [16], we devise a formula to estimate local
nonlinear truncation error to control the step size,
err(~xk+1, ~xk) = hkφ1(hkJk)C
−1
k ∆~Fk = −
(
ehkJk − In
)
G−1k ∆~Fk, (15)
where ∆~Fk = ~Fk+1− ~Fk. Intuitively, Eq. (15) represents the response
changes inside the nonlinear system along time evolutions. When the
strong nonlinear phenomenon is detected, the absolute value of err
becomes large and shrinks the step size to obtain an accurate enough
solution.
B. Modified Correction Term for Exponential Rosenbrock-Euler
Method
Furthermore, we can reuse the intermediate results ∆~Fk (after
device evaluations at ~xk+1) of Eq. (15) to improve the accuracy of
solution [16] The correction strategy is designed by utilizing φ2 [16],
~Dk = γhkφ2(hkJk)C
−1
k ∆~Fk, (16)
where γ is constant and has several choices [14], [16]. Within the time
step, by adding this correction term, the corrected solution ~xk+1,c is
~xk+1,c = ~xk+1 − ~Dk. (17)
In order to obtain such more accurate ~xk+1,c by reusing ∆~Fk, we
need extra computations, including Krylov subspace generations.
3
IV. MEVP USING INVERT KRYLOV SUBSPACE
Krylov subspace method enables the time-domain circuit simula-
tion algorithm to use exponential integrators. The key of efficient
Krylov subspace based matrix function evaluation is keeping the
size of the Krylov basis m small so that calculation of Eq. (6) is
cheap. However, using standard Krylov subspace for MEVP eJ~v
with stiff J = −C−1G is not efficient enough due to the demand
of large dimensional subspace. Our interpretation of the inefficiency
is standard Krylov subspace method tends to oversample eigenvalues
with large magnitudes in the spectrum of J , which are not crucial
players to define dynamical behaviors [18], [19].
Another drawback of standard Krylov subspace comes from matrix
factorization of C for subspace generations. To generate basis vi+1,
we need to solve C~vi = ~b, where ~b = −G~vi−1, 1 ≤ i ≤ m, in Eq.
(5). C may be singular, relatively complicated or denser than G (Fig.
1) [23]–[25]. If C is singular, we should apply regularization process
to make a non-singular C. It is time-consuming and impractical for
large designs [20], [22].
A. Invert Krylov Subspace Method
We adopt a technique called as invert Krylov subspace [19] to
avoid the matrix factorization of C. It also helps capture important
eigenvalues for exponential matrix function and accelerate MEVP
evaluation. In [19], invert Krylov subspace method is the second on
convergent rate and the length of step-size h after rational Krylov
subspace method. However, its basis generation can be cheaper for
general nonlinear circuit simulation problems. Besides, the properties
fit well with nonlinear dynamical systems where the step-size is
limited by the nonlinearity of the devices. Invert Krylov subspace
is constructed as follows,
Km(J
−1, ~v) = span{~v, J−1~v, · · · , J−(m−1)~v} (18)
= span{~v,−G−1C~v, · · · , (−G−1C)(m−1)~v}.
During the process, only G is required for matrix factorization.
Therefore, we are able to gain computational advantages when C
contains large number of non-zeros to describe many parasitics and
strongly coupling effects.
J−1Vm = VmHm + hm+1,m~vm+1~e
T
m (19)
MEVP of x(t) = etJv = e−tC
−1G~v is computed as
xm(t) = ‖~v‖VmetH
−1
m ~e1. (20)
Error estimation of MEVP is derived to determine dimension size of
Krylov subspace for MEVP. We have the derivative of ~xm(t)
d~xm(t)
dt
= ‖~v‖VmH−1m etH
−1
m ~e1. (21)
The residual is the difference between d~xm(t)
dt
and J~xm(t). Then, we
use the following residual to check Kirchhoff’s current and voltage
laws (KCL/KVL).
rm(t) = C
(
d~xm(t)
dt
− J~xm(t)
)
= ‖~v‖C
(
VmH
−1
m e
tH−1m ~e1 − JVmetH
−1
m ~e1
)
= −‖~v‖hm+1,mG~vm+1~eTH−1m etH
−1
m ~e1 (22)
We design MEVP IKS in Algorithm 1 and embed error estimator
above as a convergence criteria to terminate Arnoldi process. Then,
given the initial state vector ~v, time step size hk and error budget ,
φ1~v and φ2~v are computed, respectively, as φ
(e)
1 and φ
(e)
2 , where
φ
(e)
1 (Ck, Gk, ~v, , hk) =
1
hkJk
~mevp − 1
hkJk
~v
Algorithm 1: MEVP IKS: Arnoldi Algorithm of Invert Krylov
Subspace Method for MEVP e−hC
−1G~v
Input: C,G,~v, , h,mc
Output: ~mevp, Vm, Hm,m
1 ~v1 =
~v
‖~v‖ ;
2 for j = 1 : mc do
3 Solve −G~w = C~vj and obtain ~w;
4 for i = 1 : j do
5 hi,j = ~w
T~vi;
6 ~w = ~w − hi,j~vi;
7 end
8 hj+1,j = ‖~w‖;
9 ~vj+1 =
~w
hj+1,j
;
10 if ||rm(h)|| <  using Eq. (22) then
11 m = j;
12 break;
13 end
14 end
15 ~mevp = ‖~v‖VmehH−1m ~e1 ;
φ
(e)
2 (Ck, Gk, ~v, , hk) =
1
h2kJ
2
k
~mevp − 1
h2kJ
2
k
~v − 1
hkJk
~v (23)
where ~mvep is obtained from MEVP IKS(Ck, Gk, ~v, , hk). Note the
denominator Jk always helps cancel out C−1 when forming the
vector ~v during the whole simulation, and leaves only G−1, which
is factorized by LU decomposition once per step and reused. The
estimated nonlinear truncation error is
e(e)rr (~xk+1, ~xk) = hkφ
(e)
1 (Ck, Gk, C
−1
k ∆~Fk, , hk). (24)
This requires another MEVP via Krylov subspace. When e(e)rr is larger
than given error budget, we reduce the time step hk = αhk (α < 1).
In Sec III-B, we add a correction term using φ(e)2 and computed ∆~Fk,
then the corrected solution is
~xk+1,c = ~xk+1 − γhkφ(e)2
(
Ck, Gk, C
−1
k ∆~Fk, , hk
)
. (25)
B. Overall Circuit Simulation Framework
Algorithm 26 shows our circuit simulation framework. The pro-
posed method takes only one LU decomposition per time step h
(line 5), while BENR makes at least two LU decompositions (one
for integration and NR’s convergence check). When the solutions are
not converged, BENR updates the time step and repeats iterations
including LU operations. Moreover, based on the explicit formulation,
our method does not need to repeat LU decomposition when we
adjust the length h of time steps for error controls. When the error is
beyond the error budget Err , our framework easily adjusts time step
without extra LU operations to reduce the nonlinear error (from line
8 to 20). Note that there is no need to perform LU decomposition of
G. There is no matrix factorization with C. In addition, invert Krylov
subspace method deals with a much simpler matrix G, while BENR
method uses the format of
(
G+ C
h
)
. If C contains more detailed
parasitics, BENR is affected more than our method (more non-zeros
in C indeed increase the operations of sparse matrix and vector
multiplication when we construct Krylov subspace in our framework).
If the option of correction term Opt c is enabled, line 12 to 14
will perform extra computations to provide more accurate solution,
leading to ER-C, the method of exponential Rosenbrock-Euler with
correction term (i.e., Eq. (17)). Without this option, the method is
plain ER (i.e., Eq. (14)). Line 23 shows that fast convergence rate
triggers step-size increase.
V. RESULTS
The algorithms are implemented in MATLAB and C/C++, where
all devices are evaluated by BSIM3. The interactions between C/C++
4
Algorithm 2: ER and ER-C Methods: Circuit Simulation Using
Exponential Rosenbrock-Euler Integrators
Input: Circuit netlist; Err is the error budget for circuit simulator,
 criteria for the convergence within MEVP IKS; Corrected
option Opt c is enabled for ER-C, otherwise it is ER.
Output: Voltage/current solution for time period [0, T ]
1 Initialization phase: (a) load the netlist; (b) build linear matrices.
Set t = 0, k = 0, hk = hinit;
2 x(0) = xk = DC solution;
3 while t ≤ T do
4 Derive Ck, Gk, ~Fk, ~gk,~bk from device models at ~xk;
5 LU decompose (Gk);
6 Compute φ(e)1 , φ
(e)
2 using MEVP IKS (Algorithm 1) with
above matrices and factorized ones. Obtain
(~mevp, Hm, Vm, ~v,m) for φ
(e)
1 and φ
(e)
2 , respectively;
7 Set i = 0 ; // the iteration index
8 while true do
9 Obtain ~xk+1 by Eq. (14) with the given/updated step size
hk and previous sets of (~mevp, Hm, Vm, ~v,m) from line 6;
10 Derive ~Fk+1 from device models at ~xk+1 and obtain
∆~Fk = ~Fk+1 − ~Fk;
11 Compute
e
(e)
rr (~xk+1, ~xk) = hkφ
(e)
1 (Ck, Gk, C
−1
k ∆
~Fk+1, , hk) by
MEVP IKS;
12 if Opt c is enabled then
13 ~D = γhkφ
(e)
2 (Ck, Gk, C
−1
k ∆
~Fk, , hk) by MEVP IKS;
14 ~xk+1 = ~xk+1 − ~D ; // ER-C, e.g., γ = 0.1
15 end
16 if ‖e(e)rr (~xk+1, ~xk)‖∞ ≤ Err then
17 ~x(t+ h) = ~xk+1;
18 break;
19 end
20 i = i+ 1, h = αh; // update hk, i.e., α = 1/2
21 end
22 t = t+ hk, k = k + 1;
23 if i is small enough then
/* e.g., update hk when i < 5 */
24 hk = βhk; // e.g., β = 2
25 end
26 end
and MATLAB2013a are through MATLAB Executable (MEX) exter-
nal interface with GCC 4.7.3. We test our algorithms in a Linux
workstation (Intel CPU i7 3.4GHZ and 32GB memory). All of
algorithms and procedures utilize single thread mode. All of test cases
in our experiment are stiff designs with singular C matrices. Hence,
we do not use previous matrix exponential method in [20], [21], since
it requires extra time-consuming regularization processes at each time
step just to make C to be non-singular. We compare our framework
to the baseline circuit simulator using the standard backward Euler
with Newton-Raphson method (BENR).
A stiff nonlinear circuit containing a inverter chain is used to
demonstrate the favorable characteristics of our proposed method. We
extract waveforms from one observed node of that circuit to compare
the accuracy of BENR, exponential Rosenbrock-Euler method (ER)
and ER with correction term (ER-C). In Fig. 2, compared to the
reference solution (REF) obtained by BENR with smaller step size
(10−14s), our ER and ER-C are more accurate than BENR.
In Table I, we list the relatively large test cases with their specifi-
cations. We compare the runtime performance and capability of those
large test cases among BENR and our proposed algorithms ER, ER-C.
We set  = 10−7 in Algorithm 1 for MEVP convergence condition.
We try test cases ckt1 to ckt3, which have capacitances matrices
C with extremely sparse non-zeros distributions. Our framework
achieves speedups, that is 1.8× by ER and 1.4× by ER-C. The
Fig. 2: Accuracy comparisons of transient simulation solutions obtained by
BENR, ER and ER-C. REF is the reference solution obtained from BENR
with step size 10−14s. BENR and ER both use step size 10−13s. ER-C uses
2X step size as BENR and ER and still maintain better accuracy.
small speedups are because of their relatively simple C matrices.
LU decomposition of
(
C
h
+G
)
is dominated by the part of G. ER
and ER-C need LU decomposition of G for invert Krylov subspace
constructions as well. Therefore, our framework is not fully beneficial
here. However, we notice the reduced numbers of LU operations
and time steps improve the runtime performance and compensate the
portion of inverted Krylov subspace generations.
Cases from ckt4 to ckt8 are more challenging since they have more
complicated capacitance matrices. For the available speedup numbers,
we achieve speedups by magnitude, 23.2× by ER and 20.7× by ER-
C. For the speedup numbers not available, BENR cannot complete
the simulation tasks under our limited computing resources in a
reasonable time range. However, our framework processes those tasks
efficiently. Note the design of ckt5 contains interconnect structure of
FreeCPU (Fig. 1), and there are corresponding 40 drivers similar to
ckt3. In such simulation with stronger parasitic couplings, BENR
needs 54K seconds to finish the whole simulation. In contrast, the
runtime performance of our methods is more stable and achieves over
35× speedups. The increasing runtime compared to our methods on
ckt3 is partially due to the matrix and vector multiplication, e.g., C~v
(line 3 in Algorithm 1) during the invert Krylov subspace generations
since C has larger nnz. ckt4 has more nonlinear MOSFET devices
but simpler coupling capacitances than ckt5. The over 4× speedups
are still observed from our methods. Cases ckt6 to ckt8 are even
more challenging ones with many parasitics in certain parts of C. LU
decompositions in BENR exhaust our machine’s memory (32GB).
Thus, we mark them as “Out of Memory” in Table I. NA (no
available) denotes the speedups number are not available, which
represents our methods’ capability of handling those challenging
cases while standard BENR cannot. Our methods only need to
factorize G. In these three cases, the peak memory consumption of
our framework is smaller than 12GB reported by Linux top command
for corresponding process of MATLAB instance (BENR costs at least
32GB). Our methods can run through all the simulation tasks and the
solutions converge to the version of ER-C with smaller step-sizes.
The benchmark performance shows our algorithmic framework’s
advantages on memory usage and runtime performance.
VI. CONCLUSION
We propose a new and efficient algorithmic framework for time-
domain large-scale circuit simulation using exponential integrators.
We also devise a correction term on top of this formulation and
further improve the accuracy. By virtue of the stable explicit nature
of our formulation, we remove Newton-Raphson iterations and reduce
the number of LU decomposition operations. In this framework,
matrix exponential and vector product is computed by efficient invert
5
TABLE I: Simulation Results.
#N: the number of unknowns; #Dev.: the number of nonlinear devices. nnzC and nnzG: the number of non-zero elements in linear capacitance/inductance
matrix C and conductance/resistance matrix G. #step: the number of steps for transient simulation; #NRa: the average number of Newton-Raphson iterations
for each time step. #ma: the average dimension number of invert Krylov subspace for each time step. RT(s): the runtime of transient simulation. SP: the
runtime speedup of proposed method over BENR (backward Euler formulation with Newton-Raphson iterations).
Designs & Specifications BENR ER ER-C
Case #N #Dev. nnzC nnzG #step #NRa RT(s) #step #ma RT(s) SP #step #ma RT(s) SP
ckt1 84K 40K 19K 188K 1950 2.8 1983.6 1398 30.2 1314.6 1.5× 1404 31.6 1461.6 1.4×
ckt2 3.3M 0.1M 1.6M 9.8M 2375 3.1 122926.2 1995 28.4 78042.2 1.6× 2005 32.7 96406.3 1.3×
ckt3 54K 40 19K 181K 1928 2.8 1497.4 1380 27.4 659.0 2.3× 1418 31.1 888.9 1.7×
ckt4 84K 40K 37K 188K 1950 2.8 6153.0 1372 27.7 1138.6 5.4× 1422 30.3 1411.4 4.4×
ckt5 54K 40 82K 181K 1917 2.8 53873.5 1353 27.8 1298.2 41× 1365 31.5 1447.4 37×
ckt6 84K 40K 157K 188K Out of Memory 1282 27.0 1447.0 NA 1299 31.2 1917.7 NA
ckt7 0.2M 0.1M 0.2M 0.6M Out of Memory 1395 26.7 8016.6 NA 1399 30.4 9373.8 NA
ckt8 3.3M 0.1M 1.7M 9.8M Out of Memory 2310 29.3 86386.7 NA 2360 34.2 98681.7 NA
Krylov subspace. This approach can keep capacitance/inductance
matrix from matrix factorization and avoid the time-consuming
regularization process when there are singular capacitance/inductance
matrices during the simulation. Moreover, this method does not
need to repeat LU decompositions when we adjust the length of
time steps for error controls. Compared to conventional methods,
our new framework has several distinguished features. The proposed
method can handle many parasitics, strongly coupled or post-layout
circuits, even when conventional methods are not applicable. We test
our proposed framework against standard backward Euler method
with Newton-Raphson iterations, called as BENR. Our framework
does not only accelerate the simulation, but also manages to finish
the challenging test cases when BENR processes extremely slow
or even fails. Many aspects are worthwhile to investigate within
this new circuit simulation framework. For instance, matrix and
vector multiplication plays an important role in our framework.
Parallel processing this kind of operation by multi-core/many-core
architectures could be beneficial to enhance the runtime performance.
ACKNOWLEDGMENT
We acknowledge the support from NSF CCF-1017864. Hao
Zhuang is supported by the UCSD Powell Fellowship and Qualcomm
FMA Fellowship. Xinan Wang is supported by the UCSD Jacobs
Fellowship. This work is also partially supported by NSFC (grant
No. 61422402). We thanks Prof. Mike Botchev, Dr. Quan Chen, Mr.
Ryan Coutts, Prof. Marlis Hochbruck, Mr. Chia-Hung Liu, Dr. John
Loffeld, Dr. Jingwei Lu, Prof. Alexander Ostermann, Prof. Mayya
Tokman, Dr. Lining Zhang and Mr. Xiang Zhang for the helpful
discussions.
REFERENCES
[1] L. Nagel, SPICE2: A computer program to simulate semiconductor
circuits. Ph.D. dissertation, 1975.
[2] F. N. Najm, Circuit simulation. Wiley, 2010.
[3] L. O. Chua and P.-M. Lin, Computer Aided Analysis of Electric Circuits:
Algorithms and Computational Techniques. Prentice-Hall, 1975.
[4] Z. Li and C.-J. Shi, “SILCA: SPICE-accurate iterative linear-centric
analysis for efficient time-domain simulation of vlsi circuits with strong
parasitic couplings,” IEEE TCAD, vol. 25, no. 6, pp. 1087–1103, 2006.
[5] J. R. Phillips and L. M. Silveira, “Simulation approaches for strongly
coupled interconnect systems,” in ICCAD, pp. 430–437, 2001.
[6] Z. Zhu, H. Peng, C. K. Cheng, K. Rouz, M. Borah, and E. S. Kuh, “Two-
stage Newton-Raphson method for transistor-level simulation,” IEEE
TCAD, vol. 26, no. 5, pp. 881–895, 2007.
[7] J. Lu, H. Zhuang, P. Chen, H. Chang, C.-C. Chang, Y.-C. Wong,
L. Sha, D. Huang, Y. Luo, C.-C. Teng, and C. K. Cheng, “ePlace-MS:
Electrostatics based placement for mixed-size circuits,” IEEE TCAD,
2015.
[8] J. Lu, P. Chen, C.-C. Chang, L. Sha, D. Huang, C.-C. Teng, and C.-
K. Cheng, “ePlace: Electrostatics based placement using Nesterov’s
method,” in DAC, 2014.
[9] Q. He, H. Gan, and D. Jiao, “Explicit time-domain finite-element method
stabilized for an arbitrarily large time step,” IEEE TAP, vol. 60, no. 11,
pp. 5240–5250, 2012.
[10] W. Dong and P. Li, “Parallelizable stable explicit numerical integration
for efficient circuit simulation,” in DAC, 2009.
[11] T. A. Davis, Direct Method for Sparse Linear Systems. SIAM, 2006.
[12] X. Chen, Y. Wang, and H. Yang, “NICSLU: An adaptive sparse matrix
solver for parallel circuit simulation,” IEEE TCAD, vol. 32, no. 2,
pp. 261–274, 2013.
[13] Y. Saad, “Analysis of some krylov subspace approximations to the matrix
exponential operator,” SIAM J. Numer. Anal., vol. 29, no. 1, pp. 209–228,
1992.
[14] M. Hochbruck and A. Ostermann, “Exponential integrators,” Acta Nu-
merica, vol. 19, pp. 209–286, 2010.
[15] M. Hochbruck, A. Ostermann, and J. Schweitzer, “Exponential
Rosenbrock-type methods,” SIAM J. Numer. Anal., vol. 47, no. 1,
pp. 786–803, 2009.
[16] M. Caliari and A. Ostermann, “Implementation of exponential
rosenbrock-type integrators,” Appl. Numer. Math., vol. 59, no. 3,
pp. 568–581, 2009.
[17] Q. Chen, W. Schoenmaker, S.-H. Weng, C. K. Cheng, G.-H. Chen, L.-J.
Jiang, and N. Wong, “A fast time-domain em-tcad coupled simulation
framework via matrix exponential,” in ICCAD, pp. 422–428, 2012.
[18] H. Zhuang, S.-H. Weng, and C. K. Cheng, “Power grid simulation using
matrix exponential method with rational krylov subspaces,” in ASICON,
2013.
[19] H. Zhuang, S.-H. Weng, J.-H. Lin, and C. K. Cheng, “MATEX: A
distributed framework of transient simulation of power distribution
networks,” in DAC, 2014.
[20] S.-H. Weng, Q. Chen, and C. K. Cheng, “Time-domain analysis of large-
scale circuits by matrix exponential method with adaptive control,” IEEE
TCAD, vol. 31, no. 8, pp. 1180–1193, 2012.
[21] S.-H. Weng, Q. Chen, N. Wong, and C. K. Cheng, “Circuit simulation
via matrix exponential method for stiffness handling and parallel pro-
cessing,” in ICCAD, pp. 407–414, 2012.
[22] Q. Chen, S.-H. Weng, and C. K. Cheng, “A practical regularization
technique for modified nodal analysis in large-scale time-domain circuit
simulation,” IEEE TCAD, vol. 31, no. 7, pp. 1031–1040, 2012.
[23] C. Zhang and W. Yu, “Efficient space management techniques for large-
scale interconnect capacitance extraction with floating random walks,”
IEEE TCAD, vol. 32, no. 10, pp. 1633–1637, 2013.
[24] H. Zhuang, W. Yu, G. Hu, Z. Liu, and Z. Ye, “Fast floating random
walk algorithm for multi-dielectric capacitance extraction with numerical
characterization of Green’s functions,” in ASP-DAC, pp. 377–382, 2012.
[25] W. Yu, H. Zhuang, C. Zhang, G. Hu, and Z. Liu, “RWCap: A floating
random walk solver for 3-D capacitance extraction of very-large-scale
integration interconnects,” IEEE TCAD, vol. 32, no. 3, pp. 353–366,
2013.
[26] R. Ionutiu, J. Rommes, and W. H. Schilders, “SparseRC: Sparsity
preserving model reduction for RC circuits with many terminals,” IEEE
TCAD, vol. 30, no. 12, pp. 1828–1841, 2011.
[27] Q. Nie, Y. Zhang, and R. Zhao, “Efficient semi-implicit schemes for stiff
systems,” J. Comput. Phys., vol. 214, no. 2, pp. 521–537, 2006.
6
