Abstract-Transmission line, or wire, is always troublesome to integrated circuits designers, but it could be helpful to parallel computing researchers. This paper proposes the Virtual Transmission Method (VTM), which is a new distributed and stationary iterative algorithm to solve the linear system extracted from circuit. It tears the circuit by virtual transmission lines to achieve distributed computing. For the symmetric positive definite (SPD) linear system, VTM is proved to be convergent. For the unsymmetrical linear system, numerical experiments show that VTM is possible to achieve better convergence property than the traditional stationary algorithms. VTM could be accelerated by some preconditioning techniques, and the convergence speed of VTM is fast when its preconditioner is properly chosen.
I. INTRODUCTION
RANSMISSION line, or interconnect, or wire, is one of the main determinants of chip performance, because of the long interconnect latency and large power dissipation [37] . However, the circuit could not work without transmission lines, because the transmission line plays an irreplaceable role to make the distributed circuit stable and scalable [13] . In this paper, transmission lines are used to assist the distributed computation of sparse linear system of circuit.
The basic problem of the distributed circuit simulation is to distributedly solve the sparse linear system extracted from circuit [34, 41, 46] . Generally speaking, the linear system of circuit is unsymmetrical, because of the existence of controlled sources. Restricted to the resistor-capacitor (RC) network, the linear system would be symmetrical [39] .
Many numerical algorithms are employed to solve the linear system from circuit. KLU is an optimized LU factorization algorithm for circuit, and it is broadly used in commercial and non-commercial circuit simulators [49] . KLU is sequential. SuperLU is an outstanding distributed algorithm to solve the general unsymmetrical sparse linear system on a large number of processors, and it has been widely adopted for circuit simulation [15] . Xyce is a parallel circuit simulator designed for supercomputers, and it uses the Block Jacobi preconditioned GMRES method [50, 4, 3] . Schur complement method is a non-overlapping domain decomposition method making use of the master-slave model. It is simple to be comprehended and implemented, and was frequently used [29, 65, 20] . [7] presents a parallel direct/iterative mixed approach, based on the Schur complement method and preconditioned GMRES method. [41] adopted the preconditioned overlapping domain decomposition method, which is supported by PETSc, a distributed scientific computing package [1] . The highlight of [41] is that it improves the matrix property by exploring the characters of transistor device models.
The symmetric linear system of circuit is mainly extracted from the RC networks. Solving RC network is the key for the power grid analysis and thermal analysis [30, 26] . Meanwhile, the sparse linear system generated by the finite element method from elliptic partial differential equations (PDE) is equivalent to a resistor network.
The simulation of power grid has been intensively researched in the past 10 years, and a variety of numerical algorithms had been employed to solve the RC network, such as stationary iterative methods [67] , Krylov-subspace iterative method [12] , algebraic multigrid (AMG) [31, 69] , coupled iterative/direct method [32] and FFT method [48] . [42] proposed a stochastic method, and this method was further expanded as a preconditioning method for the diagonal dominant matrix [43] .
To compute the RC network on the parallel computing platform, [68] makes use of the domain decomposition method (DDM). [11] adopts a distributed matrix inversion algorithm. The potential of GPU to solve the RC network was illustrated in [6, 16, 17, 48] .
VTM is a new distributed and stationary iterative algorithm to solve the sparse linear systems [59] . VTM is similar to Block Jacobi or Overlapped Block Jacobi method [56, 66] .
VTM is inspired by the behavior of transmission line [40, 13] . It inserts virtual transmission lines into the circuit to achieve distributed computing [58, 59] . The idea of VTM is derived from the pseudo transient method [57] . Pseudo transient method inserts pseudo capacitors or inductors into the circuit to approve the convergence property of DC analysis. Similarly, VTM inserts pseudo transmission lines into the circuit to approve the convergence property of distributed simulation. Inserting virtual or pseudo electrical element to assist the circuit simulation is a common approach, and the advantage is that the physical property of circuit could be utilized during numerical computation.
VTM draws a parallel between distributed computing of linear system and distributed simulation of linear circuit, and it interprets the relationship between numerical analysis and circuit analysis. To solve a sparse linear system in parallel is equalized to simulate a linear circuit in parallel, and vice versa, Transmission Line Inspires A New Distributed Algorithm to Solve the Linear System of Circuit Fei Wei, Huazhong Yang, Senior Member, IEEE T we might comprehend the distributed numerical algorithm from the viewpoint of circuit theory and microwave network.
To partition a circuit, node tearing and branch tearing are two important methods [64, 47] . This paper proposes the third way, which is called wire tearing. Tearing the circuit by wires is not a new idea, since all the subcircuits are connected by wires, but we first apply this concept into the distributed numerical algorithms [58, 59] . Wire tearing can be considered as the insertion of transmission line to connect the torn nodes [61] . The advantage of wire tearing over the branch tearing and node tearing is that it does not bring in any extra energy, because all the wires are passive [27] .
As a distributed numerical algorithm, VTM is similar to overlapped Block Jacobi or Block SOR method [2, 45] . We have proved that VTM is convergent to solve arbitrarily-large SPD linear system on arbitrary number of processors [59, 62] .
VTM could also be considered as a new type of algebraic domain decomposition method [55] . VTM is similar to the additive Schwarz method with Robin condition, i.e. Schwarz-Robin method [35, 21, 22, 36] . Additive Schwarz method is mainly used to solve the sparse linear system from elliptic partial differential equations (PDE), and VTM is used to solve the general sparse linear system. The partitioning method for VTM, i.e. wire tearing, is different from the traditional partitioning methods for algebraic additive Schwarz method [22, 36] . Furthermore, the preconditioning technique for VTM is more flexible than Optimized Schwarz Method [21] . This paper is organized as follows. Section 2 presents a brief introduction to the linear system of circuit. Section 3 introduces the mathematical description of transmission line. Section 4 proposes the physical background of VTM. Section 5 describes how to partition the circuit by wire tearing. Section 6 details the algorithm of VTM. Section 7 presents the convergence theory. Section 8 provides a simple example. Section 9 discusses the preconditioning techniques for VTM. Numerical experiments are shown in Section 10. We conclude this work in Section 11. In the appendix, we present a proof for the convergence theory.
II. LINEAR SYSTEM OF CIRCUIT
In virtue of the nodal analysis, one circuit could be described by a sparse linear system [39] .
Ax = b
(1) Here x represents the nodes' voltages, and b is the current sources flowing into the nodes. A is the coefficient matrix, which might be symmetrical, or unsymmetrical, depending on the property of circuit.
If the circuit contains no voltage sources but only current sources, we call it current-driven circuit. To construct the linear system of current-driven circuit, we only need to use the nodal analysis technique, and do not need the modified nodal analysis technique [39] . If there were voltage sources in the circuit, they might be equalized into current sources, according to the Norton equivalent theory [18] .
Lemma 1: For a current-driven linear resistor network, its coefficient matrix, A, is weakly diagonally dominant, and thus symmetrical-non-negative-definite (SNND). Further, if there is at least one resistor connecting to the ground, A is strictly diagonally dominant, and thus symmetric-positive-definite (SPD).
The physical insight of this lemma is that the energy of a resistor network is always non-negative.
III. TRANSMISSION LINE
Transmission line is an important element in electrical and microwave engineering. It is also called cable, wire or interconnect in different contexts.
The circuit diagram of the transmission line is illustrated in Fig. 1 . The analytical description of the lossless or ideal transmission line is the wave equation [40, 13] . 
Here L is the inductance per unit length, C is the capacitance per unit length. The time domain mathematical description of the lossless transmission line is called Transmission Delay Equations, as shown in (3) [40, 8, 59 ]. 
Here u 1 (t) and u 2 (t) represent the port voltages. i 1 (t) and i 2 (t) represent port inflow currents. t is the time variable. τ is the propagation delay. Z is the characteristic impedance, which is positive. The iterative formula of Transmission Delay Equations is similar to the Robin boundary condition in Partial Differential Equations and Schwarz-Robin method [35, 21] . There are two main differences: First, the normal derivative in the robin condition is replaced by a current; Second, (3) expresses the time delay explicitly.
For a number of coupled transmission lines, (3) could be re-expressed in the matrix-vector form:
Here Z is the characteristic impedance matrix of the transmission lines. If there is no coupling among transmission lines, Z is diagonal; if there is coupling, Z is a sparse matrix. (4) could be re-expressed as (5):
Here W is the characteristic admittance matrix of the wires. 
W = Z
Associated with the principle of parallel computer, we come to realize that, there is conceptual similarity between the distributed computer and the distributed circuit, as shown in Fig. 2 [61] .
First, different circuits are connected by wires, which is similar to the situation where different processors are connected by the digital data link.
Second, the wire has transmission delay, and there is also communication delay between the processors.
As the result, if we assimilate each circuit to a processor, and consider the wire as the data link between processors, then the distributed circuit would be kind of a distributed computer.
IV. PHYSICAL BACKGROUND
In this section, we present an observation from circuit and microwave network. This lemma presents the stability property of the resistor network with wires. Fig. 3 presents an illustration for this lemma. Lemma 2 could be validated in SPICE [38, 44] .
Lemma 2:
A linear resistor network with wires inside never oscillates unendingly, and it will finally settle down to the steady state, which is the steady state of the resistor network eliminating all the wires.
The physical insight of this lemma is that, the energy of a resistor network is limited. Divergence means infinite energy, which is impossible for the resistor network.
Lemma 2 is also valid for the resistor-capacitor network. But for the general circuit with wires, it is not always valid. A distributed circuit with wires inside might go unconvergent. The stability is depending on the property of circuit and the characteristic impedance of the wires. 
V. WIRE TEARING
To solve the sparse linear system of a circuit, originally there is no transmission line in it, but we might figure out some way to insert transmission lines into the circuit by splitting the nodes. This partitioning technique is called wire tearing.
By means of wire tearing, the original circuit is turned into a distributed circuit. It is partitioned into a number of separate sub-circuits by virtual transmission line. Later, we locate each sub-circuit into one processor, and use the digital data link to imitate the behavior of the transmission line. As the result, the distributed circuit is simulated in a distributed way.
According to Lemma 2, the resistor network with wires would finally go to the steady state, which is exactly the steady state of the resistor network without wires. This lemma explains the reason that VTM could always be convergent to solve the resistor network. For the sparse linear system of a general circuit, VTM might be unconvergent.
There are 4 steps to perform wire tearing for the circuit G . Γ denotes the set of all the nodes in G .
Step 1. Set the splitting interface interface
; otherwise, V is called inner node.
Step 2. Split each interfacial node into a pair of nodes, which are called twin nodes.
Step 3. Split the resistors and current sources connected to each interfacial node.
Step 4. Connect each pair of twin nodes by a length of lossless wire. Add inflow currents to the twin nodes. Example 1. Fig. 4 illustrates an example, in which a simple resistor network is split into two sub-circuits.
First we define the set of inner nodes in Sub-circuit 1 as 1,inner Γ , the set of inner nodes in Sub-circuit 2 as 2,inner Γ . By reordering the rows, the sparse linear system of circuit (1) could be re-formatted as below by row reordering:
Here u is the voltage vector of interface 
After wire tearing, the system of sub-circuit 2 is: 
i (t) W u (t) = i (t -τ) W u (t -τ) i (t) W u (t) = i (t -τ) W u (t -τ)
If we set the boundary condition as below:
u (t) = u (t) = u(t) i (t) = -i (t) = i(t)
Then (6) could be re-obtained from (7) and (8).
The above splitting technique is called level-one wire tearing, since the interfacial node is split once. If the interfacial node was split for more than one times, this splitting technique is called multilevel wire tearing [59] .
VI. VIRTUAL TRANSMISSION METHOD
After partitioning of the circuit, there are 3 steps to perform the parallel computing by Virtual Transmission Method.
Step 1. Replace the lossless transmission line with its equivalent circuit.
Step 2. Load each sub-circuit into a processor.
Step 3. Use the data link to transfer the previous interfacial variables from one processor to another by message passing, and perform the distributed computing. Example 2. Continuing with Example 1, the analytic expression of sub-circuit 1 is obtained by merging (5) and (7):
Eliminate 1 i (t) , and we obtain:
Set all the delays of transmission lines to be 1, then,
(11) is an SPD sparse linear system, and it could be solved by any sequential algorithms, such as cholesky factorization, algebraic multigrid method (AMG) and conjugate gradient method (CG). Similarly, the analytic expression of sub-circuit 2 is obtained by merging (5) and (8) 
If all the delays of lines are 1, we obtain:
Later, we locate (11) on Processor 1, and locate (14) on Processor 2. After that, we set the initial values for boundary variables: This conclusion is supported by Lemma 2, and the mathematical proof is given in the appendix.
According to the proof of convergence theory, we know that the convergence speed of VTM is depending on the choice of the characteristic admittance matrix W. So W is also called the preconditioner for VTM.
VIII. PRECONDITIONING
In this section, we discuss the preconditioning techniques for VTM. We first define the input admittance matrix of a circuit or microwave network [18, 13] . From the view of numerical analysis, input admittance matrix is an alias of Schur complement matrix associated with the interface variables u [45, 68] . Similarly, in Sub-circuit 2, the Schur complement matrix associated with the interface variables u 2 is: 
W S W+S
The voltage reflection matrix on the interface of Sub-circuit 2 is:
S W+S
According to the convergence proof in the appendix, the convergence factor of VTM is:
T T
Here ( ) ρ A is the spectral radius of the square matrix A.
Consequently, we know that the convergence speed of VTM is depending on the choice of the characteristic admittance matrix W. This viewpoint is identical to the impedance matching technique in microwave engineering [13] .
In the ideal case, if we set the preconditioner,
As the result, only 1 iteration is needed to achieve convergence. Based on the knowledge of microwave network, we know that, if one port is precisely matched, there would be no energy reflection on this port and the network would settle down after 1 reflection. In practice, it is impossible to obtain the accurate input admittance matrix of a large circuit, because the dimension of D 1 or D 2 might be very large, and it is impossible to obtain If the circuit is a resistor network, first we need to assure that W is SPD. Continue with Example 2, there are several ways to choose W. S  or 2 S  is dense matrix, and we need to drop the small elements inside the matrix. After that, the fill-in pattern of W is similar to C 1 or C 2 .
If the circuit is a general circuit with controlled sources, i.e. the linear system of circuit is unsymmetrical, the above 5 preconditioning techniques could also be used, but VTM might be unconvergent.
We have to say that, it is still an open problem to choose the preconditioner for the unsymmetrical linear system. In the next section, we illustrate a simple example to show the potential of VTM to solve the unsymmetrical system.
IX. EXAMPLE
For the symmetrical linear system, a simple example could be found in [59] . Here we illustrate another example to solve the unsymmetrical linear system of the circuit. Fig. 5 proposes a tightly-coupled circuit with 4 operational amplifiers connected end-to-end. The linear system of this circuit is: 
Here we set: As the result, we obtain 2 sub-systems: 
Here :   2  2  2  3  3  3   2  2  2  3  3 Merge (16), (18) and (19), we obtain the linear system for Sub-circuit 1: 
(21) Finally we solve this example on 2 processors by VTM, and the computing result is compared with a number of well-known stationary iterative algorithms, e.g. Jacobi, Gauss Seidel (GS), Successive Overrelaxation Method (SOR), Symmetric SOR (SSOR), Block Jacobi (BJ) [45] . Here the global iterative index k increases to k+1 until all the variables are updated.
As shown in Fig. 6 , the traditional stationary algorithms are unconvergent for this circuit, but VTM is convergent. The key is to choose proper characteristic admittances for the virtual transmission lines. According to this example, we conclude that there exists a parallel between the stability of VTM and the stability of distributed linear circuit (or microwave network). The traditional techniques to stabilize a distributed circuit might be transplanted to stabilize VTM.
However, for the large unsymmetrical linear system, we have not yet figure out an effective method to choose W. The convergence property of VTM is depending on the property of linear circuit, as well as the preconditioner W.
X. NUMERICAL EXPERIMENTS
In this section we compare VTM with the traditional stationary algorithms for both the symmetrical matrix and unsymmetrical matrix. We use MATLAB and SIMULINK as our test environment [53, 54] , and use METIS and MESHPART as our partitioning methods [28, 51, 19] . The test matrices are described in Table 1 . Grid2d and Grid3d are structured resistor networks [51, 43] . Fv2 is a 2D mesh constructed by finite element method (FEM) [14] . Pchip is extracted from a MOS integrated circuit of MCNC'90 testbench [52] . Wang is a sparse matrix from semiconductor device simulation [14] . Table 2 illustrates the computing results of the stationery iterative algorithms. BJ, OBJ and VTM are tested on 2 processors. Jacobi, GS, Symmetrical GS (SGS), SOR and SSOR are tested on 1 processor.
VTM_Diag means VTM with diagonal preconditioner. VTM_OB represents VTM with overlapped block preconditioner. VTM_WOB is VTM with weighted overlapped block preconditioner. VTM_SCA means VTM with Schur complement approximation preconditioner. All these preconditioning methods were previously described in Section 8.
According to Table 2 , we conclude that VTM is convergent to solve the SPD system, and its convergence speed is fast when using the SCA or WOB preconditioner. The convergence speed of OBJ and VTM_OB are same. For the unsymmetrical linear systems, VTM might be unconvergent. At last, we test VTM on a various number of processors, as shown in Fig. 7 . In this case, we use Grid3d as the testbench. The test result indicates that the convergence speed of VTM is slightly relative to the number of processors.
XI. CONCLUSION AND FUTURE WORK
In this paper, we introduce VTM to solve the sparse linear system of circuit in parallel. VTM is a distributed and stationary iterative numerical algorithm, and it is efficient to solve the resistor networks.
VTM proves many insightful conclusions linking numerics and electrics. The physical background of VTM is electric circuit, and this makes this algorithm different from the traditional numerical algorithms. As presented in this paper, we have borrowed many concepts from circuit theory to describe this algorithm.
According to the simple example presented in Section 9, we conclude that there exists similarity between the stability of VTM and the stability of distributed linear circuit (or microwave network).
The convergence speed of VTM is highly depending on the characteristic admittance matrix of the virtual transmission lines, i.e. its preconditioner W. A number of preconditioning techniques are implemented to accelerate VTM. Experiments indicate that, if W is properly chosen, the performance of VTM would be appreciable; if not, VTM would be plain. Nevertheless, the computation cost to design the preconditioner W should be aware of [5] .
For the SPD linear system, we have proved that VTM is convergent; for the large unsymmetrical linear system, we have not yet figure out an effective way to choose the characteristic admittance matrix to guarantee convergence. This is an open problem.
VTM could also be used to solve the nonlinear system, as we pointed out in [59] . However, the convergence analysis of nonlinear system is more complicated than the linear system. This problem should be further studied.
If the delays of virtual transmission lines are different and stochastic, VTM would turn to be a fully asynchronous and chaotic numerical algorithm [58] . If we replace each variable by a piece of waveform, VTM would become a waveform-relaxation algorithm [60] .
To simulate the post-layout integrated circuit in parallel, we have figured out another method to distributedly solve the nonlinear delay differential equations extracted from the distributed circuits [61] .
APPENDIX
Here we prove that, if the graph of the resistor network is 2-partete, VTM converges.
Lemma 3: Assume a linear resistor network is partitioned into 2 sub-circuits by level-one wire tearing, if the characteristic impedance matrix Z of the virtual transmission lines is SPD, VTM converges at the answer to the resistor network.
Proof:
First, we consider the resistor network with inner nodes, and eliminate these inner nodes by Schur complement technique. Then, we prove Lemma 3 for the resistor network without inner nodes.
The sparse linear system of the resistor network is: If the circuit is split into two sub-circuits by level-one wire tearing, the system of Sub-circuit 1 is (7): Consequently, we have: (22) is partitioned into two sub-systems of (23) and (24) by wire tearing.
The iterative formula is Transmission Delay Equations (5),
Here W is the characteristic admittance matrix of the transmission lines. W is SPD. (23), (24) and (5), we find, 
Similarly, we obtain:
Merge (25) and (26), we come to know the iterative formula for (27) Set: 
Reformat (28) as (29): h T T r T r.
In the following text, we will prove that (30) is convergent. Here we use Z -1 instead of W. ( , , , )
End of proof [25] . 
T T T T
So we first calculate ( ) 
T T
So we conclude that VTM converges for 2-partite resistor network [45] . According to (27) , when this algorithm is convergent, 
S r
Until now, we have proved that, VTM is convergent for 2-partite resistor network when using level-one wire tearing.
For the more general cases, the resistor network is k-partite, and the wire tearing might be level-one or multilevel. A general proof was presented in [62] .
