Abstract-We propose an efficient algorithmic framework for timedomain circuit simulation using exponential integrators. This work addresses several critical issues exposed by previous matrix exponential based circuit simulation research, and makes it capable of simulating stiff nonlinear circuit system at a large scale. In this framework, the system's nonlinearity is treated with exponential Rosenbrock-Euler formulation. The matrix exponential and vector product is computed using invert Krylov subspace method. Our proposed method has several distinguished advantages over conventional formulations (e.g., the well-known backward Euler with Newton-Raphson method). The matrix factorization is performed only for the conductance/resistance matrix G, without being performed for the combinations of the capacitance/inductance matrix C and matrix G, which are used in traditional implicit formulations. Furthermore, due to the explicit nature of our formulation, we do not need to repeat LU decompositions when adjusting the length of time steps for error controls. Our algorithm is better suited to solving tightly coupled post-layout circuits in the pursuit for full-chip simulation. Our experimental results validate the advantages of our framework.
I. INTRODUCTION SPICE-like transistor-level circuit simulation [1] [2] [3] of integrated circuits is indispensable during the cycle of very large scale integration (VLSI) designs. It is crucial to cost-efficient production of VLSI chips. Smaller process geometries, larger designs as well as tighter design margins translate to the need for more accurate large-scale circuit simulation, e.g., post-layout simulations with more detailed parasitic couplings [4] [5] [6] . With advancing technologies, three dimensional IC structures, and increasing complexities of system designs, the numbers of chip components are easily more than millions [7] , [8] , which make simulation tasks very time-consuming. Finding effective algorithms still remains challenging and has always been an important research topic for several decades [1] [2] [3] .
Time-domain circuit simulation algorithm relies on numerical integration to solve differential algebraic equations (DAE). To capture circuit's transient behaviors, numerical integration is computed step by step till the end of whole simulation time span. Conventional numerical integration formulations are forward Euler (FE), backward Euler (BE), Trapezoidal (TR), Gear's and multi-step methods [1] [2] [3] . Despite explicit formulation, such as FE, avoids solving linear system, the time steps are usually restricted for ensuring numerical Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. stability [2] , [3] , [9] , [10] . In general, VLSI circuits form stiff and nonlinear dynamical systems. Therefore, the implicit formulations, e.g., BE, are much more stable, robust and widely deployed in VLSI circuit simulators. Based on the implicit formulation of such nonlinear system, Newton-Raphson (NR) iterations are typically adopted to obtain solutions. Each NR iteration needs to linearize system and then solves the linear system. During this process, direct solvers [11] , [12] have been favored and applied because of their robustness and ease of use. However, it is known that direct solvers, e.g., LU decomposition, have super-linear computational complexities and very expensive to simulate large-scale and strongly coupled circuit systems (the cost of LU decomposition can approach the worst case O(n 3 ) [4] , [5] ). It is crucial to devise efficient algorithms to reduce the numbers of expensive LU operations and NR iterations in the circuit simulation. For example, leveraging the explicit formulations is a research direction [9] , [10] .
Above formulations are all categorized into the low order approximation of DAE, which is with the order typically lower than ten [3] . The error of those formulations are proportional to the step size due to local truncation error (LTE). Beyond those conventional low order schemes, a high order matrix exponential based time integration kernel [3] has been recently triggering academic researchers' interests because of the progress of efficient matrix function computation using Krylov subspace methods [13] [14] [15] [16] . In VLSI CAD and EDA research, this exponential integration kernel has been applied in time domain electromagnetic and technology semiconductor device simulation [17] , power distribution network analysis [18] , [19] as well as general circuit simulation [20] , [21] . Those frameworks provide analytical solution for transient simulation of linear system, and better properties in numerical accuracy, stability and time step controlling, etc [3] , [16] , [18] , [19] . However, one drawback of previous works for matrix exponential based circuit simulation [20] is the slow convergence rate for matrix exponential and vector product (MEVP) by using standard Krylov subspace [17] , [20] . In addition, the nonlinear formulation still requires NR iterations [17] , [20] , which does not fully utilize the explicit nature from matrix exponential formulation [14] .
In this work, two major processes are devised to address above challenges. First, we propose a new matrix exponential based framework using exponential Rosenbrock-Euler integration formulation [15] , [16] , which well suits nonlinear dynamical differential equation problems and preserves explicit features. Then, we also design error control scheme for adaptive time stepping. Second, we deploy invert Krylov subspace method [19] to compute MEVP in an efficient manner. The major contributions are listed as follows. matrix factorization processes. Building invert Krylov subspace bases only needs to factorize G, which is much sparser and simpler than C in strongly coupled post-layout circuits. In contrast, conventional implicit methods require LU decomposition of the combinations of G and C. Therefore, our method is better suited to handle those strongly coupled post-layout circuits in the pursuit for full-chip simulation.
• We devise exponential Rosenbrock-Euler formulation for stable explicit DAE solver, as well as correction term on top of that to further improve the accuracy. The stability of the explicit formulation is preserved by the high-order approximation of exponential operator [13] , [15] , [16] . Thus, it takes only one LU decomposition per time step, while conventional methods, e.g., BE with NR (BENR), require at least two times of LU to verify the convergence.
• Moreover, our method does not need to repeat LU when we adjust the length of time steps for error controls. It is because our matrix exponential based explicit formulation and (time-step) scaling-invariant property of Krylov subspace [13] , [20] . On the contrary, the low order approximation schemes force time step embedded in the linear matrix and conduct matrix factorization.
Once the time step is adjusted, LU is unavoidable in order to solve the new linear system.
• We test our framework with baseline BENR against our largescale circuits. For some challenging test cases, our framework can achieve speedups by magnitude, and even manage to finish them when conventional counterpart fails. The remainder of this paper is organized as follows. Sec. II introduces the background of nonlinear circuit simulation. Sec. III presents our new circuit simulation framework that utilizes exponential Rosenbrock-Euler method and related formulations. Sec. IV illustrates the MEVP computation uses invert Krylov subspace method. Sec. V shows experimental results and validates our framework. Sec. VI concludes this paper.
II. BACKGROUND
Given a circuit netlist and device models, time-domain simulation of general nonlinear electronic circuit is formulated as
where x(t) ∈ R n×1 denotes nodal voltages and branch currents and n is the length of vector. q ∈ R n×1 and f ∈ R n×1 represent the charge/flux and current/voltage terms, respectively. u(t) is a vector representing all the external excitations at time t; B is an incident matrix that inserts those signals to the system.
A. Circuit Simulation Using Low Order Integration Schemes
The conventional numerical approaches discretized Eq. (1) with time step size h k = t k+1 −t k . To compute the solution x k+1 at t k+1 , we start from an initial solution x k . The well-known BE, which is a stable implicit integration scheme, leads to
Usually NR is a widely adopted method of solving this implicit formulation Eq. (2). Each NR iteration, direct solver (e.g., LU decomposition) is applied to solve
until it is convergent, which means the difference of the solution from i-th iteration x i and x i+1 is "small enough".
T , where C( x i ) ∈ R n×n is a matrix of capacitances and inductances linearized at x i ; G( x i ) ∈ R n×n represents the linearized resistances and conductances as well as the incidence between voltages and currents. This approach usually serves a de facto method in industrial tools and academic research, which is called as BENR in this paper. To solve Eq. (1), we can also deploy other low order implicit discretized forms, such as TR and Gear's, with NR.
Due to the low order implicit integration scheme, Eq. (3) embeds time step h k in T and
(following the practice of SPICE). If the estimated LTE violates numerical error budget, h k will be reduced. Then new NR iterations for x k+1 are re-launched with the updated h k . Besides, C always exists in Jacobian matrix
. Large volume of non-zeros of C are introduced from the postlayout parasitics extraction [23] [24] [25] , resulting to huge computational challenges for the capability of numerical integration algorithms [6] and model order reductions [26] . In addition, the non-zero fill-ins of off-diagonal terms in C and G are usually mutually exclusive in VLSI circuits. For example, the matrices in Fig. 1 are from post-extraction of a design FreeCPU [23] . We notice that the distribution of nonzeros in C ( Fig. 1(a) ) is much more complicated than G ( Fig. 1(b) ). After LU decompositions of those matrices, the factorized matrices To address above challenges, we use exponential integrators [14] and architect a framework with stable explicit formulation. The next sub-section briefs previous progresses of circuit simulation using related exponential integration methods and the major problems, which keep them from the large-scale circuit simulation.
B. Circuit Simulation Using Exponential Integrators
Recently, many works are proposed to solve time-domain simulation problem using exponential integrators [17] [18] [19] [20] [21] . Chen et al. [17] and Weng et al. [20] adopted a technique developed in [27] , which decouples the linear and nonlinear terms by approximating the integrand in Eq. (4) with Lagrange polynomial.
, and F (x) collects the nonlinear dynamics at x from transistor models, such as BSIM3. The secondorder approximation yields x k+1 as the solution of a nonlinear equation, which again can be solved by NR using Eq. (3). The Jacobian matrix in Eq. (3) 
. More details can be found in [17] , [21] . Therefore, it is similar to conventional method in Sec. II-A. During NR iterations, the matrix factorization involves C in the Jacobian matrix. Then it will suffer from the runtime degradation when C is relatively large and complicated.
Another problem comes from the key operation of e J x, the matrix exponential-and-vector product (MEVP), where J is a n × n matrix. MEVP is computed by standard Krylov subspace [13] , [17] , [20] , [21] . Such Krylov subspace is constructed as
An n × m orthonormal basis and an (m + 1) × m upper Hessenberg matrix H for the Krylov subspace Km are first constructed by Arnoldi process [17] , [20] , [21] . Then MEVP is computed via
where Hm = H(1 : m, 1 : m), e1 ∈ R m×1 and e1 is the first unit vector. When A is mildly stiff, Hm is usually much smaller 
than original A [20] . However, when the values in C vary in magnitudes caused by stiff circuit system, standard Krylov subspace demands large dimension of subspace to approximate MEVP, then degrades performance of matrix exponential-based circuit simulation. Such phenomenon has been observed in power distribution network simulation using matrix exponential framework [18] , [19] . Besides, C should not be singular during the process of standard Krylov subspace strategy. Otherwise, related regularization process [20] , [22] is required, which is time-consuming for large-scale designs.
III. CIRCUIT SIMULATION USING EXPONENTIAL INTEGRATORS BASED ON ROSENBROCK-EULER FORMULATION
In contrast to standard low order and previous matrix exponential integration schemes, our framework adopts exponential RosenbrockEuler method, and leads to a stable explicit method, which does not require NR iterations [15] . To simplify the derivations from exponential Rosenbrock-Euler to SPICE-like formulation, we consider a non-autonomous ordinary differential equation (ODE) system,
Exponential Rosenbrock-Euler method is derived to compute x k+1 with step size h k as follows,
where J k denotes the Jacobian matrix of g, and [16] . φi(z) describes ODE's time evolution, where φi(z) = 
n×n is the identity matrix, then
More details can be found in the works of Hochbruck and Ostermann [14] , [15] . The advantage of this formulation is the explicit nature and the superior stable region (in the entire complex plane) than the class of low order integrations schemes. Therefore φi functions permit a large value for the step size h k with guaranteed stability [14] [15] [16] .
To utilize above formulation in SPICE-like circuit simulation, we apply the chain rule to Eq. (1),
Eq. (10) is linearized at the state x k , when
We obtain the equivalent format for the non-autonomous dynamical system
where
The circuit system response is computed by high order function h k φ1(h k J k ) g k , where
and
. Furthermore, we assume u(t) is a piecewise-linear function for t ∈ [t k , t k+1 ] in VLSI designs, so the contribution from external excitations can also be captured by h
- [16], where
The next time step solution x k+1 with h k is
Note the denominator of φi in Eq. (9) always cancels out C −1 k in Eq. (12) and Eq. (13) . Hence, we avoid the matrix factorization of C k .
A. Local Nonlinear Error Estimator
The error estimator is an important ingredient of adaptive timestepping for the nonlinear systems. Based on the notion proposed by Caliari and Ostermann [16] , we devise a formula to estimate local nonlinear truncation error to control the step size,
where ∆ F k = F k+1 − F k . Intuitively, Eq. (15) represents the response changes inside the nonlinear system along time evolutions. When the strong nonlinear phenomenon is detected, the absolute value of err becomes large and shrinks the step size to obtain an accurate enough solution.
B. Modified Correction Term for Exponential Rosenbrock-Euler Method
Furthermore, we can reuse the intermediate results ∆ F k (after device evaluations at x k+1 ) of Eq. (15) to improve the accuracy of solution [16] The correction strategy is designed by utilizing φ2 [16] ,
where γ is constant and has several choices [14] , [16] . Within the time step, by adding this correction term, the corrected solution x k+1,c is
In order to obtain such more accurate x k+1,c by reusing ∆ F k , we need extra computations, including Krylov subspace generations.
IV. MEVP USING INVERT KRYLOV SUBSPACE
Krylov subspace method enables the time-domain circuit simulation algorithm to use exponential integrators. The key of efficient Krylov subspace based matrix function evaluation is keeping the size of the Krylov basis m small so that calculation of Eq. (6) is cheap. However, using standard Krylov subspace for MEVP e J v with stiff J = −C −1 G is not efficient enough due to the demand of large dimensional subspace. Our interpretation of the inefficiency is standard Krylov subspace method tends to oversample eigenvalues with large magnitudes in the spectrum of J, which are not crucial players to define dynamical behaviors [18] , [19] .
Another drawback of standard Krylov subspace comes from matrix factorization of C for subspace generations. To generate basis vi+1, we need to solve C vi = b, where b = −G vi−1, 1 ≤ i ≤ m, in Eq. (5). C may be singular, relatively complicated or denser than G (Fig.  1) [23]- [25] . If C is singular, we should apply regularization process to make a non-singular C. It is time-consuming and impractical for large designs [20] , [22] .
A. Invert Krylov Subspace Method
We adopt a technique called as invert Krylov subspace [19] to avoid the matrix factorization of C. It also helps capture important eigenvalues for exponential matrix function and accelerate MEVP evaluation. In [19] , invert Krylov subspace method is the second on convergent rate and the length of step-size h after rational Krylov subspace method. However, its basis generation can be cheaper for general nonlinear circuit simulation problems. Besides, the properties fit well with nonlinear dynamical systems where the step-size is limited by the nonlinearity of the devices. Invert Krylov subspace is constructed as follows,
During the process, only G is required for matrix factorization. Therefore, we are able to gain computational advantages when C contains large number of non-zeros to describe many parasitics and strongly coupling effects.
MEVP of x(t) = e tJ v = e −tC
Error estimation of MEVP is derived to determine dimension size of Krylov subspace for MEVP. We have the derivative of xm(t)
The residual is the difference between
and J xm(t). Then, we use the following residual to check Kirchhoff's current and voltage laws (KCL/KVL). 
We design MEVP IKS in Algorithm 1 and embed error estimator above as a convergence criteria to terminate Arnoldi process. Then, given the initial state vector v, time step size h k and error budget , φ1 v and φ2 v are computed, respectively, as φ 2 , where Solve −G w = C v j and obtain w; 
where mvep is obtained from MEVP IKS(C k , G k , v, , h k ). Note the denominator J k always helps cancel out C −1 when forming the vector v during the whole simulation, and leaves only G −1 , which is factorized by LU decomposition once per step and reused. The estimated nonlinear truncation error is
This requires another MEVP via Krylov subspace. When e
rr is larger than given error budget, we reduce the time step h k = αh k (α < 1). In Sec III-B, we add a correction term using φ (e) 2 and computed ∆ F k , then the corrected solution is
B. Overall Circuit Simulation Framework Algorithm 26 shows our circuit simulation framework. The proposed method takes only one LU decomposition per time step h (line 5), while BENR makes at least two LU decompositions (one for integration and NR's convergence check). When the solutions are not converged, BENR updates the time step and repeats iterations including LU operations. Moreover, based on the explicit formulation, our method does not need to repeat LU decomposition when we adjust the length h of time steps for error controls. When the error is beyond the error budget Err, our framework easily adjusts time step without extra LU operations to reduce the nonlinear error (from line 8 to 20). Note that there is no need to perform LU decomposition of G. There is no matrix factorization with C. In addition, invert Krylov subspace method deals with a much simpler matrix G, while BENR method uses the format of G + C h . If C contains more detailed parasitics, BENR is affected more than our method (more non-zeros in C indeed increase the operations of sparse matrix and vector multiplication when we construct Krylov subspace in our framework). If the option of correction term Opt c is enabled, line 12 to 14 will perform extra computations to provide more accurate solution, leading to ER-C, the method of exponential Rosenbrock-Euler with correction term (i.e., Eq. (17)). Without this option, the method is plain ER (i.e., Eq. (14)). Line 23 shows that fast convergence rate triggers step-size increase.
V. RESULTS
The algorithms are implemented in MATLAB and C/C++, where all devices are evaluated by BSIM3. The interactions between C/C++ Set
using MEVP IKS (Algorithm 1) with above matrices and factorized ones. Obtain ( mevp, Hm, Vm, v, m) for φ Obtain x k+1 by Eq. (14) with the given/updated step size h k and previous sets of ( mevp, Hm, Vm, v, m) from line 6;
10
Derive F k+1 from device models at x k+1 and obtain and MATLAB2013a are through MATLAB Executable (MEX) external interface with GCC 4.7.3. We test our algorithms in a Linux workstation (Intel CPU i7 3.4GHZ and 32GB memory). All of algorithms and procedures utilize single thread mode. All of test cases in our experiment are stiff designs with singular C matrices. Hence, we do not use previous matrix exponential method in [20] , [21] , since it requires extra time-consuming regularization processes at each time step just to make C to be non-singular. We compare our framework to the baseline circuit simulator using the standard backward Euler with Newton-Raphson method (BENR).
A stiff nonlinear circuit containing a inverter chain is used to demonstrate the favorable characteristics of our proposed method. We extract waveforms from one observed node of that circuit to compare the accuracy of BENR, exponential Rosenbrock-Euler method (ER) and ER with correction term (ER-C). In Fig. 2 , compared to the reference solution (REF) obtained by BENR with smaller step size (10 −14 s), our ER and ER-C are more accurate than BENR.
In Table I , we list the relatively large test cases with their specifications. We compare the runtime performance and capability of those large test cases among BENR and our proposed algorithms ER, ER-C. We set = 10 −7 in Algorithm 1 for MEVP convergence condition. We try test cases ckt1 to ckt3, which have capacitances matrices C with extremely sparse non-zeros distributions. Our framework achieves speedups, that is 1.8× by ER and 1.4× by ER-C. The small speedups are because of their relatively simple C matrices. LU decomposition of C h + G is dominated by the part of G. ER and ER-C need LU decomposition of G for invert Krylov subspace constructions as well. Therefore, our framework is not fully beneficial here. However, we notice the reduced numbers of LU operations and time steps improve the runtime performance and compensate the portion of inverted Krylov subspace generations.
Cases from ckt4 to ckt8 are more challenging since they have more complicated capacitance matrices. For the available speedup numbers, we achieve speedups by magnitude, 23.2× by ER and 20.7× by ER-C. For the speedup numbers not available, BENR cannot complete the simulation tasks under our limited computing resources in a reasonable time range. However, our framework processes those tasks efficiently. Note the design of ckt5 contains interconnect structure of FreeCPU (Fig. 1) , and there are corresponding 40 drivers similar to ckt3. In such simulation with stronger parasitic couplings, BENR needs 54K seconds to finish the whole simulation. In contrast, the runtime performance of our methods is more stable and achieves over 35× speedups. The increasing runtime compared to our methods on ckt3 is partially due to the matrix and vector multiplication, e.g., C v (line 3 in Algorithm 1) during the invert Krylov subspace generations since C has larger nnz. ckt4 has more nonlinear MOSFET devices but simpler coupling capacitances than ckt5. The over 4× speedups are still observed from our methods. Cases ckt6 to ckt8 are even more challenging ones with many parasitics in certain parts of C. LU decompositions in BENR exhaust our machine's memory (32GB). Thus, we mark them as "Out of Memory" in Table I . NA (no available) denotes the speedups number are not available, which represents our methods' capability of handling those challenging cases while standard BENR cannot. Our methods only need to factorize G. In these three cases, the peak memory consumption of our framework is smaller than 12GB reported by Linux top command for corresponding process of MATLAB instance (BENR costs at least 32GB). Our methods can run through all the simulation tasks and the solutions converge to the version of ER-C with smaller step-sizes. The benchmark performance shows our algorithmic framework's advantages on memory usage and runtime performance.
VI. CONCLUSION
We propose a new and efficient algorithmic framework for timedomain large-scale circuit simulation using exponential integrators. We also devise a correction term on top of this formulation and further improve the accuracy. By virtue of the stable explicit nature of our formulation, we remove Newton-Raphson iterations and reduce the number of LU decomposition operations. In this framework, matrix exponential and vector product is computed by efficient invert Parallel processing this kind of operation by multi-core/many-core architectures could be beneficial to enhance the runtime performance.
