Reduce-Factor-Solve for Fast Thevenin Impedance Computation and Network by Sommer, Stefan et al.
 
 
General rights 
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright 
owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. 
 
 Users may download and print one copy of any publication from the public portal for the purpose of private study or research. 
 You may not further distribute the material or use it for any profit-making activity or commercial gain 
 You may freely distribute the URL identifying the publication in the public portal 
 
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately 
and investigate your claim. 
  
 
   
 
 
Downloaded from orbit.dtu.dk on: Mar 29, 2019
Reduce-Factor-Solve for Fast Thevenin Impedance Computation and Network
Sommer, Stefan; Aabrandt, Andreas; Jóhannsson, Hjörtur
Published in:
I E T Generation, Transmission and Distribution
Link to article, DOI:
10.1049/iet-gtd.2018.5330
Publication date:
2018
Document Version
Peer reviewed version
Link back to DTU Orbit
Citation (APA):
Sommer, S., Aabrandt, A., & Jóhannsson, H. (2018). Reduce-Factor-Solve for Fast Thevenin Impedance
Computation and Network. I E T Generation, Transmission and Distribution, 13(2), 288-295. DOI: 10.1049/iet-
gtd.2018.5330
IET Research Journals
Reduce-Factor-Solve for Fast Thevenin
Impedance Computation and Network
Reduction
ISSN 1751-8644
doi: 0000000000
www.ietdl.org
Stefan Sommer1,∗, Andreas Aabrandt1, Hjörtur Jóhannsson2
1Department of Computer Science, University of Copenhagen
2Department of Electrical Engineering, Technical University of Denmark
* E-mail: sommer@di.ku.dk
Abstract: The complexity and volatility of power system operation increases when larger parts of the power production is based
on distributed and non-controllable renewable energy sources. Ensuring stable and secure operation becomes more difficult in
these modern power systems. For security assessment, the results of traditional offline simulations may become obsolete prior
to the completion of the assessment. In contrast, real-time stability and security assessment aims at online computation, and it is
therefore dependent on very fast computation of properties of the grid operating state. The paper develops the reduce-factor-solve
approach to real-time computation of two key components in real-time assessment methods, network reduction and calculation
of Thevenin impedances. The aim is to allow online stability assessment for very complex networks. The theoretical foundation
behind the reduce-factor-solve approach is described together with the ability to handle both algorithms in a common framework.
By exploiting parallelization of the reduce and solve steps in combination with fast matrix factorization, Thevenin impedances and
reduced networks are computed much faster than previous approaches. The reduce-factor-solve algorithm is evaluated on power
grids of varying complexity to show that Thevenin impedance computation and network reduction for complex power systems can
be performed on a milliseconds time scale.
1 Introduction
Whereas traditional power systems are characterized by centralized
controllable energy production, efforts on decarbonizing the power
system often lead to distributed power production based on non-
controllable renewable energy sources. This general shift from low
numbers of high-power units to high numbers of low-power units
with fluctuating production creates new challenges for stable and
secure operation.
In traditional power systems, stability and security could be
assessed offline and sensitivities to various contingencies estab-
lished by running time-consuming simulations. Multiple factors of
the future power system can challenge this approach: the complex-
ity of a power grid is greater when the level of decentralization
is high. This results in higher computational requirements and and
thus longer run-time for simulations. The power system should still
be able to operate under rapidly changing conditions, e.g., when
including weather dependent energy sources. Large fluctuations of
the system operating point can be common, and in combination these
factors may make the results of conventional offline stability assess-
ment obsolete prior to the time domain simulation completing. From
a stability assessment point of view, the answer to these complica-
tions lies in faster assessment methods [1]. Speed-up of assessment
algorithms can be obtained by developing new assessment methods
or by reducing the execution time of critical elements of already
existing methods. However, no matter which strategy is chosen, fast
algorithms for extracting properties of the grid operating state are
needed.
This paper presents a general approach to fast computation of two
critical parts of important assessment methods: calculating the sys-
tem Thevenin impedances and reducing the network to the generator
or voltage controlled nodes only. The reduce-factor-solve approach
allows online computation of both operations in a common frame-
work. Because several approaches to real-time stability assessment
require knowledge of Thevenin impedances and properties of the
reduced network, the developed algorithms make the use of these
approaches feasible for complex power systems at smaller time
resolutions. As we validate experimentally, the presented approach
allows fast Thevenin impedance computation on networks with close
to 8000 buses on a millisecond scale.
The paper shows how the Thevenin impedance computation con-
stitute a particular case of network reduction, and this property is
used to cover both operations with the reduce-factor-solve approach.
The approach as illustrated in Figure 1 is identified as a mix of left-
looking and frontal matrix factorization algorithms, and the division
between serial and highly parallel parts is used to reduce the runtime
of both Thevenin impedance computations and network reduction.
The work extends the conference paper [2] by including full net-
work reduction and node-elimination as a core part of the algorithm
resulting in the reduce-factor-solve framework. While node elimi-
nation is generally non-parallelizable, we describe here how fine-
grained parallelization in the elimination procedure can be exploited,
and how graphical processing units, GPUs, can be employed to
speed up part of the network reduction task.
The paper starts with background information on real-time sta-
bility assessment and computational issues before progressing to
discussing Thevenin impedances and previous algorithms. It is
shown how the voltage controlled part of the network gives rise to a
dense submatrix, and this is used to develop the reduce-factor-solve
approach. The role of node elimination is discussed before covering
full network reduction. In Section 8, parallelization and implemen-
tation details are discussed, and the performance and validity of the
reduce-factor-solve approach is evaluated in Section 9. The paper
ends with description of future work and concluding remarks.
2 Background
Thevenin impedance calculations constitutes a major component
of important stability assessment methods. For example, in [3], a
method is proposed for real-time assessment of rotor angle sta-
bility which exploits analytically derived expressions for critical
stability boundaries [4]. The Thevenin impedance seen from each
node of constant voltage is used to determine the distance to a
IET Research Journals, pp. 1–8
c© The Institution of Engineering and Technology 2015 1
Ync Reduction /
node elimination
Ync Factorization
Ync, Ylink
Forward Solve
Inner products
/ diagonal
Thevenin impedances
Inner products
/ full matrix
Reduced network
Y =
(
Ync Ylink
Y Tlink Yvc
)
Fig. 1: The reduce-factor-solve approach, see also Algorithm 1.
The full system admittance matrix Y is partitioned into the volt-
age controlled Yvc, non-voltage controlled Ync, and link Ylink parts.
The algorithm performs a fill-in limiting reduction/node elimina-
tion followed by a factorization of Ync. Computation of Thevenin
impedances (left) or fully reduced network (right) follows after a
forward solve and inner product computations. The grayed boxes
represent fully parallelizable parts of the algorithm, while the white
boxes are primarily serial allowing only fine-grained parallelization.
stability boundary of each synchronous machine, and the voltage
stability monitoring is based on local measurements and Thevenin
impedances in [5–8]. Thevenin impedance based assessment were
proposed in [9–11], and approaches for real-time static security
assessment exploiting coupled Thevenin equivalents are reported
in [12–14]. Online stability assessment with the above methods
is dependent on fast Thevenin impedance computation, and the
algorithm presented here therefore addresses a real issue: Without
fast algorithms, online stability assessment will be limited in the time
resolution and in the size of the network concerned. To the best of
our knowledge, no method that focuses on computational aspects
of Thevenin impedance computation is described in the literature
besides [2, 3].
Direct methods for transient stability using Lyaponov functions
[15] compare the energy of the grid operating point against the value
of the system in critical states. Each generator contributes to the
energy and the evaluation of the energy function is therefore depen-
dent on the transfer conductances between all pairs of generators
[16]. The transfer conductances constitute the elements of the admit-
tance matrix of the network reduced to the generator nodes only. The
fast network reduction method developed here will therefore allow
fast evaluation of energy functions with the Lyaponov method. Simi-
larly, assessment methods based on the extended equal area criterion
(EEAC, [17]) rely on the admittance matrix of the reduced network
for calculation the acceleration criterion and identification of critical
machines. See [18] for mathematical aspects of network reduction
and [19] for step-by-step matrix reduction closely related to node
elimination as discussed here.
To minimize the execution time of computations involving power
grids, sparsity of the network matrices is often exploited. Sparse
algorithms are, however, often hard to parallelize, and a trade-off
between exploiting sparsity and parallelization is often necessary.
Here, the high level of parallelism of the solve-step in the devel-
oped network reduction algorithm allows for fast evaluation of the
energy in Lyaponovs direct method. In contrast, when solutions to
linear systems are needed e.g., for implicit time simulation with the
Lyaponov direct method, the larger but more sparse non-reduced
system should be used [20]. Additional typical network computa-
tions involve sparse matrix factorization [21] and relaxation methods
[22]. See also [23] for comparison of factorization and relaxation
methods.
3 Power System and Representation
An arbitrary power grid consisting of N nodes (buses) is considered
throughout the paper. In the grid, the steady state voltage magnitude
atM ≤ N nodes is kept constant by means of voltage control. These
M nodes include the nodes representing synchronous generators’
internal voltage (behind Xd) if they are manually excited. Letting
Y denote the system admittance matrix, the system node voltage
equation is I = Y V . The M voltage controlled nodes (vcs) and the
N −M nodes of non-controlled voltage (ncs) can be ordered so
that the ncs and vcs are numbered by indices 1, . . . , N −M and
N −M + 1, . . . , N , respectively. The system admittance matrix
then takes the form
Y =
(
Ync Ylink
Y Tlink Yvc
)
(1)
where Ync denotes the admittance matrix of only the non-controlled
nc part of the system, Yvc denotes the admittance matrix of the volt-
age controlled vc part, and Ylink encodes the links between the nc
and vc parts of the network.
4 Transfer admittances and Thevenin
Impedances
This section describes two mathematical formulations of the sys-
tem Thevenin impedances as seen from the voltage controlled vc
nodes. For each node vc node k, the aim is to compute the Thevenin
impedance for the node, i.e. the impedance seen from node k when
all vc nodes besides node k are shorted. This situation can be mod-
eled by removing all vc nodes besides k from the system, and the
Thevenin impedance Zth,k can then be obtained as the last diagonal
element of inverse of the resulting admittance matrix.
4.1 Thevenin Impedances from LU-factorization
It is shown in [24] that the Thevenin impedances Zth,k can be
obtained from an LU-factorization of the system admittance matrix.
The LU-factorization [25] splits a matrix into a product of a lower
diagonal and an upper diagonal matrix, e.g. the admittance matrix Y
is factorized into the product Y = LU . Using the factorization, the
impedance Zth,k can be found by the formula
Z−1th,k = Y(k,k) − Lˆk·Uˆ·k (2)
with the last term being the inner product between the entries
1, . . . , N −M of the kth row of the matrix L and of the kth column
of the matrix U .
4.2 Schur Complement Formulation
The Thevenin impedances can equivalently be expressed in terms of
the Schur complement of the voltage controlled part of the admit-
tance matrix. With system loads represented by their admittance
values, no current enters the nodes of non-controlled voltage (ncs),
IET Research Journals, pp. 1–8
2 c© The Institution of Engineering and Technology 2015
Fig. 2: Sparsity patterns of the LU-factorization of the full admit-
tance matrix Y . The excessive fill-in visible in the lower right vc-part
limits the performance of the algorithm. In contrast, factorization of
the nc-submatrix only with the reduce-factor-solve approach can be
done with very limited fill-in and consequently very fast factoriza-
tion. The reduce-factor-solve algorithm reduces both the dimension
of the matrix to be factored (in this case from 7917× 7917 to 960×
960) and the number of non-zeros in the factors (from 1549093 to
10174).
and the network equation can be stated as(
0
Ivc
)
=
(
Ync Ylink
Y Tlink Yvc
)(
Vnc
Vvc
)
. (3)
Using the Schur complement [18, 26]
S = Yvc − Y TlinkY −1nc Ylink (4)
of Yvc, the vc-part of the solution to (3) can be obtained from
the reduced system Ivc = SVvc. Looking only at the diagonal ele-
ments S(d,d), d = 1, . . . ,M , the right-hand side of (2) equals S(d,d)
with k = d+N −M . The Thevenin impedances are therefore the
element-wise inverses of the diagonal of the Schur complement S.
5 Approaches to Computing Thevenin
Impedances
In [24], LU -factorization of the full system admittance matrix Y
and equation (2) is used for computing Thevenin impedances. The
benefit of using this approach is that only one factorization of the
full admittance matrix is needed in order to compute Zth,k for all vc
nodes. This method is analyzed below in order to show how a faster
approach can be developed, and the method is used as basis for the
comparisons in the experiment section.
5.1 Complexity and Fill-In
Due to the very high sparsity of network matrices, LU -factorization
is in general a very efficient procedure. Though the worst case per-
formance scales cubicly in the number of nodes, i.e. performance is
O(N3) in worst case, the complexity will be linear in most practical
Fig. 3: Elimination of one of two interior nodes in a six node net-
work. The node to be eliminated has degree three and with three new
branches added to the reduced network, the total number of branches
is kept constant. This preserves the sparsity. Elimination of nodes
with higher degree will result in an increased number of branches
reducing sparsity.
cases, see [23]. A key factor in achieving this complexity is mini-
mizing the number of fill-ins, non-zero elements of the factors L and
U that are not present in Y . The number of fill-ins is highly depen-
dent on the ordering of the matrix Y . For network matrices, ordering
algorithms like Approximated Minimum Degree (AMD, [27]) and
variants ensure a very low degree of fill-in. The number of both non-
zeros in Y and additional fill-ins are in practice close to linearly
correlated with N , implying that the factorization will be close to
linear in complexity.
In (1), an ordering with the ncs occurring with lower indices than
the vcs is used. This ordering is required for equation (2) that allows
us to extract the Thevenin impedances. To adhere to this indexing
convention, [24] applies AMD ordering to the submatrices Ync and
Yvc individually before combining them to obtain the full matrix Y
as in (1). The result of this partial ordering strategy is that the upper
left part of the factors L,U becomes adequately sparse but the lower
right part of the factors contains a very large number of fill-ins. This
problem that slows down the algorithm considerably is illustrated
in Figure 2, and a theoretical explanation of the excessive fill-in is
given below.
5.2 Sparse Factorization and Dense Schur Complement
The Schur complement (4) can also be obtained by successively
eliminating nodes from the system and creating reduced admittance
matrices. If the kth node is to be eliminated from anN node network
as illustrated in Figure 3, the new (N − 1)× (N − 1) admittance
matrix is given by the formula
Y new(i,j) = Y(i,j) −
Y(i,k)Y(k,j)
Y(k,k)
, (5)
for i, j 6= k. The Schur complement S of the voltage controlled part
is the matrix resulting from eliminating all ncs, see e.g. [18].
Successive node elimination using the update formula (5) pro-
duces an equivalent network matrix that has fewer nodes; however,
branches are added to the network, and the resulting network is
therefore potentially less sparse. In the completely reduced network
consisting of all vc nodes, the majority of nodes will be connected
by branches. The Schur complement S is therefore a dense matrix.
In [28, 29], it is observed that if a matrix with the block structure
in (1) is LU -factored, the product of the lower right blocks Lvc, Uvc
of the factors L,U corresponding to the vc part of the network
gives the Schur complement of the voltage controlled part of the
network directly, i.e. S = LvcUvc. This provides a way to compute
and factor S but it also tells us why the large number of fill-ins are
observed when computing Thevenin impedances with the method
of [24] where the full matrix Y is factorized: Because S is dense,
the factors Lvc and Uvc will in general not be sparse. The product
of sparse matrices can be dense, however, if, for example, the node
degree of the networks represented by the factors is limited, the prod-
uct will be sparse.Lvc, Uvc are precisely the lower right blocks of the
factors L,U where the excessive number of fill-ins occur. Indeed,
any fixed bound on the maximum node degree in both Lvc and Uvc
would imply that the number of non-zeros in S would grow linearly
IET Research Journals, pp. 1–8
c© The Institution of Engineering and Technology 2015 3
with the number of vcs, i.e. M . Since S is dense, the number of non-
zeros grow quadratically, nnz(S) ≈M2, and no such bound can
therefore exist.
6 Reduce-Factor-Solve Thevenin Impedances
The factorization of the dense Schur complement completely dom-
inates the runtime when computing Thevenin impedances using the
above outlined method. Therefore, a great speed-up of the compu-
tation can be achieved if the factorization of the full admittance
matrix Y and the excessive fill-in in the nc-part of the factors can
be avoided. The reduce-factor-solve approach achieves exactly that.
The name of the method relates to its composition into three indi-
vidual steps. The derivation below couples a variant of equation
(2) with the structure of left-looking LU-factorization algorithms.
For each vc node k, let Yk denote the (N −M + 1)× (N −M +
1) admittance matrix of the system with all vc nodes but node
k removed. A close variant of (2) for computing Z−1th,k uses the
matrices Yk instead of Y . A factorization Yk = LkUk gives the
relation
Z−1th,k = Y(k,k) − Lˆk,(N−M+1)·Uˆk,·(N−M+1) (6)
where the notation in the rightmost term denotes the inner product
between entries 1, . . . , N −M of the last row ofLk and of the right-
most column of Uk. The advantage of using this formula is that the
row Lˆk,(N−M+1)· and the column Uˆk,·(N−M+1) can be obtained
from a factorization Ync = LncUnc of the nc-part of Y only.
Consider now iterations of left-looking LU-factorization algo-
rithms [25]. With this class of algorithms, the N −M + 1 columns
in a factorization Yk = LkUk are computed iteratively from left to
right, i.e. starting with column 1 and ending with column N −M +
1. At each step j, the upper left (j − 1)-block of Lk is used to com-
pute the first j − 1 entries of the jth column of Uk. In particular,
computation of N −M entries of the rightmost column uses only
the upper left (N −M)-block of Lk, i.e. the block representing the
ncs. Writing this last step of the algorithm explicitly, the firstN −M
entries of column N −M + 1 of Uk satisfy the equation
LncUˆk,·(N−M+1) = Yˆlink,·k (7)
where Yˆlink,·k denotes the first N −M entries of the column
Ylink,·,k. The column vector Uˆk,·(N−M+1) is therefore computed
with a triangular forward solve using the factorization of Ync only.
Similarly, the first N −M entries of row N −M + 1 of Lk can be
obtained from the equation
UTncLˆ
T
k,(N−M+1)· = Yˆ
T
link,k· (8)
again using only the factorization of Ync. Thus, using (6), we get
Zth,k from two forward solutions using the factorization of Ync.
With the above computations, all matrices and operations
involved are sparse and the fill-in producing factorization of the full
admittance matrix Y is avoided. In addition, the triangular matrices
used for the forward solves are not dependent on k, and the factor-
ization of Ync must therefore be done only once. Due to the sparsity,
the forward solves are each computationally lightweight, and they
can in addition be computed completely in parallel. In the sequel,
the factorization of Ync is denoted the factorization step and the for-
ward solutions (7),(8) combined for the forward solve step. Though
the experiments section will show that the forward solve step can
dominate the runtime, the completely parallel nature of the loop over
all vcs makes speeding up this step straight forward by splitting the
computation of several compute cores. In contrast, the factorization
step is hard to parallelize and therefore in reality the limiting factor
of the algorithm. This step is analysed below.
6.1 Node Elimination and Factorization Speed
The factorization step of Algorithm 1 consist of the LU-factorization
of Ync. We use the KLU solver [21] that is particularly optimized
Algorithm 1 Reduce-Factor-Solve Algorithm.
Ync ← Fast node elimination Ync . (reduction)
Lnc, Unc ← factorization of Ync . (factorization)
for k = N −M + 1→ N do . for each vc in parallel
Uˆk,·(N−M+1) ← solve(Lnc,Yˆlink,·k) . (forward solve)
LˆTk,(N−M+1)· ← solve(UTnc,Yˆ Tlink,k·)
end for
for k = N −M + 1→ N do . For Thevenin impedances
S(k−N+M,k−N+M) ←
Y(k,k) − Lˆk,(N−M+1)·Uˆk,·(N−M+1)
Zth,k ← S−1(k−N+M,k−N+M) . Thevenin imp. node k
end for
for k, l = N −M + 1→ N do . For network reduction
S(k−N+M,l−N+M) ← (GPU)
Y(k,l) − Lˆk,(N−M+1)·Uˆl,·(N−M+1)
end for
for matrices with sparsity structure equivalent to power network
matrices, and it is therefore inherently difficult to improve the factor-
ization speed. Nevertheless, it turns out that the execution time of the
factorization step can be reduced by using that factorization of the
entire submatrix Ync is not required in order to evaluate (6). Instead,
node elimination prior to factorization can be performed to produce
an equivalent but smaller matrix. This part of the reduce-factor-solve
algorithm is denoted the reduction step.
Node elimination in the reduction-step can reduce the execution
time of the factorization step but careful consideration must be made
with respect to the amount of fill-in generated by the elimination.
Due to the efficiency of KLU, the reduction algorithm can be quite
relaxed in removing only a relatively limited number of nodes. This
is done with a simple fill-in reducing strategy: The algorithm scans
through the nc nodes removing a node only if it is connected to less
than 4 other nc nodes and if the fill introduced in the link matrix
Ylink is limited. Since removing nodes of degree 3 or less does not
introduce fill-ins, this strategy ensures that the number of non-zeros
in Ync does not increase during the process, confer Figure 3. The
number of non-zeros in Ylink will in general increase but the number
of added fill-ins is controlled by a fixed limit. Additional methods
for fill-in limiting eliminating can be found in the literature on large
resistor networks, e.g. [30].
It will be shown in the experiments section that node elimina-
tion reduces the computational effort of the factorization step by a
factor of 2-3. Please note that this is a reduction of serial part of the
algorithm that could not be obtained simply by adding more compute
cores running in parallel.
The reduce-factor-solve algorithm is listed in Algorithm 1. Imple-
mentation details and pivoting issues are discussed in Section 8.
7 Fast Network Reduction
Because the Thevenin impedances comprise the element-wise
inverses of the diagonal of the Schur complement, the reduce-factor-
solve approach can be regarded a fast algorithm for computing the
diagonal of the Schur complement. Since the entire Schur comple-
ment is needed in transient stability applications, e.g. for computing
the energy in the Lyaponov direct method and for identifying criti-
cal machines with the equal area criterion, the reduce-factor-solve
approach will here be generalized to computing the entire Schur
complement of the voltage controlled part of the network. As the
Schur complement corresponds to node elimination, this operation
is equivalent to eliminating all ncs from the network. The focus here
is on reducing the execution time needed to perform this operation.
The Schur complement (4) can be obtained by computing the
entire matrix instead of just the diagonal elements, i.e. by extending
IET Research Journals, pp. 1–8
4 c© The Institution of Engineering and Technology 2015
(6) to compute
Y(k,l) − Lˆk,(N−M+1)·Uˆl,·(N−M+1)
for all pairs l, k = N −M + 1, . . . , N . This requires evaluation of
M2 inner products between sparse vectors instead of the M inner
products needed for the Thevenin impedance computation. However,
no additional evaluations in the forward solve step are needed. The
performance of the algorithm is therefore dependent on the speed of
evaluating the inner products.
Network reduction is often viewed as an iterative process of node
elimination, and successive node elimination can be considered the
most straight forward algorithm for network reduction. In order to
eliminate all ncs, the update formula (5) must be applied N −M
times. Since each update step potentially updates the entire remain-
ing matrix, this approach has complexity O((N −M)N2). For the
first elimination steps, the matrix is very sparse and each update is
in practice much faster than than the worst case O(N2) bound. As
more nodes are eliminated, the number of added branches may grow
quickly and reduce the sparsity of the matrix. For the last nodes
to be eliminated, the matrix becomes dense. This fact limits the
performance.
In contrast to the iterative process of node elimination is evalu-
ating the Schur complement directly from equation (4) using matrix
algebra. The inversion Y −1nc will then in practice be carried out using
an LU -factorization of the nc-part of the network. This approach is
therefore equivalent to running Algorithm 1 with computations of all
M2 sparse inner products but without the reduction step.
With the terminology used in the LU -factorization literature, suc-
cessive node elimination corresponds to a partial factorization of the
matrix using a frontal factorization algorithm with M −N rank-
1 update steps. Similarly, direct evaluation of (4) corresponds to a
frontal factorization with one step and a sparse factorization of the
frontal matrix. The reduce-factor-solve approach can therefore be
viewed as a hybrid between these algorithms that performs rank-
1 updates until the sparsity starts decreasing followed by sparse
LU -factorization for the remaining vc nodes.
8 Implementation and Parallelization
The execution time of Algorithm 1 is highly dependent on the
parallelization over multiple CPU cores or on highly parallel graph-
ics processing units (GPUs). This section provides details on the
possibilities for parallelization and the choices made in the actual
implementation. A prototype implementation of the algorithm for
computing both Thevenin impedances and for performing network
reduction is available online.∗
8.1 Reduction Step
In the reduction step, nc-nodes that are connected to three or
fewer additional nc-nodes are successively removed. The elimina-
tion of one node must be finished before its neighboring nodes
can be removed. Performing node elimination in parallel will there-
fore require a dependency graph between the parallel parts and
synchronization between parallel threads. As each individual node
elimination can be performed very fast, the overhead of such an
approach may remove any gain from the parallel execution.
For each single node elimination, the algorithm updates the
entries of the matrix corresponding to the branches and nodes of
the connecting neighbors. The total number of connected nodes, i.e.
in both the nc- and vc-part of the network, is in practice sufficiently
large that some fine-grained parallelism can be exploited in this oper-
ation. In order to avoid overhead in synchronizing threads, SIMD
(single instruction, multiple data) vector instructions are used to per-
form the updates for 4 branches at a time. This produces a 2-3 times
speed-up of the node-elimination process.
∗See https://bitbucket.org/stefansommer/networkred
In each successive node elimination, the update formula (5)
requires division by the diagonal element of the node to be elimi-
nated, i.e. Y(k,k) when eliminating the kth node. For factorization
algorithms in general, attention must be payed to the numerical sta-
bility when the diagonal value is small compared to the off-diagonal
element. The division can be avoided by pivoting rows or columns
in the matrix [25].
Admittance matrices representing power grids may not be
diagonally-dominant, and diagonal entries with small absolute value
can occur e.g. due to the addition of shunts. Fortunately, small diag-
onal entries can be handled elegantly in the reduction step without
introducing complicated pivot schemes: If a node is encountered
with low absolute value, the elimination can just skip the node.
The subsequent factorization performed by the LU-factorization
algorithm then takes care of the pivoting. Thus, numerical stability
can be ensured by merely adding a runtime-check for small diagonal
entries.
8.2 Forward Solve Step
Both the forward solutions (7),(8) and the following computation
of inner products are completely parallel operations. Without paral-
lelization, these steps will dominate execution time. In particular,
computing the M2 inner products for network reduction is time
consuming.
The 2M solutions involved in the forward solve steps can be run
in parallel. Each individual solution is a serial operation and can in
general be computed very fast using a sparse forward solution. In
addition, if no pivoting occurs, the evaluation tree can be precom-
puted for each row and column reducing the execution time for the
numerical solution. The whole process can be efficiently executed
on e.g. a multi-core CPU using several threads. For the Thevenin
impedance, the subsequent M inner products can in addition effi-
ciently be calculated using the same threads. The algorithms used
for each forward solution and sparse inner product is described in
[25].
The quadratic scaling of the M2 inner products for network
reduction forces a different approach for the network reduction
algorithm. The limited number of CPU cores on present machines
makes the effect of parallelization limited: In the experiments
section, the algorithm will be evaluated on power systems that allow
evaluation of more than 106 inner products in parallel but a standard
machine usually have less than 12 cores. This difference suggests
using massively parallel execution units such as graphics processing
units (GPUs).
A typical GPU allows a much larger number of execution units to
work in parallel than a CPU, and the large number of cores can sig-
nificantly speed up the evaluation of theM2 inner products. Usually,
the work flow when employing GPUs consist of a transfer from the
main computer memory to the GPU memory, execution of the GPU
program, and transfer of the result from GPU memory to the main
computer memory. Because the inner product computation can be
expressed as a multiplication of two sparse matrices, a general sparse
matrix multiplication kernel can be used to do the actual computation
on the GPU card. For this task, the cuSPARSE library is used.∗
The actual GPU computation is relatively fast compared to the
memory transfer operations. This is in particular amplified by the
fact that cuSPARSE requires one of the two matrices to be multi-
plied to be structured as a dense matrix. Since matrices of dimension
M ×N −M and N −M ×M are multiplied to obtain a matrix
of dimension M ×M , the transfer memory time is dependent on
the size of N . Conveniently, the reduction step of the reduce-factor-
solve algorithm removes a large part of the nc nodes before the inner
product calculation. This produces a sufficient reduction in memory
transfer time. See e.g. [31, 32] for additional ongoing research on the
data structures used for sparse GPU computations
∗https://developer.nvidia.com/cusparse
IET Research Journals, pp. 1–8
c© The Institution of Engineering and Technology 2015 5
9 Experiments
In the first set of experiments, the speed-up provided by the reduce-
factor-solve Thevenin impedance algorithm, its absolute runtime,
and its validity is evaluated. In particular, the experiments will show
that the Thevenin impedance of all generators for power systems of
considerable sizes can be established in less than 3 milliseconds. In
addition, the runtime of the serial and parallel parts of the algorithm
will be explored in order to evaluate the achieved overall efficiency,
and a great reduction in size and number of non-zeros for the matrix
to be factored will be observed.
In the second set of experiments, network reduction with the
reduce-factor-solve approach is considered and the effect of paral-
lelization and GPU computation is investigated. In particular, it will
be shown how the reduction step reduces the time spent on host GPU
memory transfer.
The algorithms will be compared to the existing method described
in [24]. In addition, multiple methods for computing the Schur com-
plement will be explored. The validity of the algorithms is ensured
by measuring the differences in the output of the methods. For all
experiments, it is observed that the results are equal up to numerical
precision showing that the new algorithms produce correct results.
The experiments are performed on admittance matrices generated
from test systems included in the PSS R©E-30.0 and MATPOWER
[33] network simulation packages. ∗ The test systems include the
US west-coast (1648 buses, 2602 branches) and US east-cost (7917
buses, 13014 branches) power grids along with 6 additional systems
ranging from 2383 to 3120 buses.
The runtime is tested on a 3.2GHz Intel Core i7 hexa-core CPU.
In accordance with [24], UMFPACK [21] is used for factoring the
full admittance matrix with the reference method, and KLU [21] is
used for the factorization step of Algorithm 1. Pivoting is disabled
for all factorization methods. The main loop of the algorithm is par-
allelized over the CPU cores using threads, and the node elimination
step uses SIMD instructions to exploit fine-grained parallelism. The
GPU computations are performed on NVIDIA Titan Xp GPUs with
12 GB of memory and 3840 CUDA cores per card.
9.1 Thevenin Impedance Computation
Figure 4 shows for each test system the runtime of the Thevenin
impedance algorithm employing LU-factorization of the full admit-
tance matrix, the runtime of the reduce-factor-solve algorithm with-
out node elimination, and the reduce-factor-solve algorithm with
node elimination prior to the factorization. For all three approaches,
the runtime of the initial preparation step is left out of the measure-
ments because this step only need to be done once for each network.
The timings are performed just on the computational parts leav-
ing out the time used for initial copying of data, and the obtained
timings are averaged over a large number of runs. Note the loga-
rithmic scale on the vertical axis and the achieved approximately 80
times speed-up on the largest system with the reduce-factor-solve
algorithm compared to the previous method.
In Figure 5, the runtime of the three different steps in the reduce-
factor-solve algorithm is plotted: Reduction, factorization, and for-
ward solve. It can be seen that a relatively large portion of the
computational effort is spend on the forward solve. It is important
to relate this to the fact that the forward solve step is completely par-
allelizable. For the results here, all 6 cores of the test machine are
used. If a reduction in runtime is needed, a machine with more cores
will allow the runtime of the forward solve step to be driven down
below the runtime used for the reduction and factorization.
∗See http://www.pserc.cornell.edu/matpower/docs/
ref/matpower5.0/case*.html and https://bitbucket.
org/stefansommer/networkred/src/master/data/. Only
test systems with sufficient size to ensure non-negligible runtime are
included in the evaluation.
10−4
10−3
10−2
10−1
100
nr. busses / nr. vc nodes (8 grids)
tim
e 
(s.
, lo
g−
sc
ale
)
Algorithm Comparison, Thevenin Impedances
 
 
1648/
313
2383/
327
2736/
280
2737/
255
2746/
379
2746/
388
3120/
349
7917/
1325
Full LU−factorization (UMFPACK)
Reduce−Factor−Solve (without node elim.)
Reduce−Factor−Solve (with node elim.)
Fig. 4: Computation time for determining Thevenin impedances
when factorizing the full admittance matrix ([24], red), the reduce-
factor-solve algorithm without node elimination (black), and the
reduce-factor-solve algorithm with node elimination (blue). Evalu-
ation performed on 8 power grids ranging from 1648 buses to 7917
buses with between 313 and 1325 voltage controlled nodes. Note the
logarithmic scale on the time axis. For the largest system, the new
method is roughly 80 times faster than the previous approach.
0
0.5
1
1.5
2
2.5
3
x 10−3
nr. busses / nr. vc nodes (8 grids)
tim
e 
(s.
)
Time Usages, Thevenin Impedances
 
 
1648/
313
2383/
327
2736/
280
2737/
255
2746/
379
2746/
388
3120/
349
7917/
1325
LU−factorization
Node elimination
Forward solve
Fig. 5: Timings for the three different parts of the reduce-factor-
solve algorithm: Factorization (blue), node elimination (green),
and forward solve (red). The forward solve step parallelizes com-
pletely and the runtime can thus be reduced by employing more
computational cores.
Because the forward solve step can be parallelized, the serial parts
of the algorithm are in reality the true bottlenecks. In Figure 6, the
runtime of the serial parts are plotted in order to evaluate the benefits
of the node elimination step. Employing node elimination results in
a 2-3 times speed-up for this part of the algorithm: For the largest
test system, the factorization time without node elimination is 2.4
milliseconds. With node elimination, elimination and factorization
takes 0.8 milliseconds combined. In terms of sparsity, for the largest
test system, the reduce-factor-solve algorithm reduces the dimension
IET Research Journals, pp. 1–8
6 c© The Institution of Engineering and Technology 2015
00.5
1
1.5
2
2.5
x 10−3
nr. busses / nr. vc nodes (8 grids)
tim
e 
(s.
)
Algorithm Comparison, Factorization and Node Elimination
 
 
1648/
313
2383/
327
2736/
280
2737/
255
2746/
379
2746/
388
3120/
349
7917/
1325
Factor−and−solve (KLU)
Factor−and−solve (KLU + node elimination)
Fig. 6: Runtime for the reduce-factor-solve algorithm excluding the
forward solve step without node-elimination (black) and with node-
elimination (blue). Node elimination results in a speedup of a factor
2-3 for this part of the algorithm.
of the matrix to be factorized from 7917× 7917 (the full admittance
matrix) to 960× 960 (the node eliminated non-controlled part of the
admittance matrix). At the same time, the number of non-zeros in the
factors is reduced from 1549093 to 10174.
9.2 Network Reduction
Progressing to full network reduction, Figure 7 shows the execution
time when reducing the entire nc part of the network. As previ-
ously discussed, the reduce-factor-solve algorithm can be seen as a
hybrid between node elimination and direct evaluation of the matrix
equation (4). Therefore, these two approaches and the reduce-factor-
solve Algorithm 1 are compared. The tests are performed with and
without GPU acceleration using one GPU.
The differences between the methods are the potential for paral-
lelization, and, for the GPU code, the time spent on memory transfer.
Node elimination uses parallelization exclusively inside each elim-
ination step, and therefore consistently performs worse than the
remaining methods that allows more parallelization. For the largest
system, the reduce-factor-solve algorithm with GPU acceleration
reduces the execution time an order of magnitude compared to node
elimination.
Comparing the GPU accelerated direct evaluation of (4) with the
reduce-factor-solve approach, the speed-up is mainly a result of a
reduction in memory transfer times. The node reduction step of the
reduce-factor-solve approach decreases the number of nc nodes by
roughly a factor 6 for the largest system resulting in an equivalent
reduction in host-to-GPU memory transfer time. The GPU-to-host
transfer of the computed result is not affected. Both approaches
uses approximately 6 milliseconds for the actual computation but
the memory transfer time is reduced from 44 milliseconds to 8
milliseconds with the reduce-factor-solve approach.
For all but the largest system, the total execution time for the
reduction of the nc part of the network is below 10 milliseconds For
the largest network, the execution time is 14 milliseconds. However,
due to the parallelization ability of the inner product calculation,
using multiple GPU cards can further reduce the memory trans-
fer and computation time. Using two GPUs, the execution time
is approximately halved. Addition of more GPUs beyond two has
little effect on the execution time as not all of the very large num-
ber of compute cores can be effectively utilized for systems of the
size considered here. Larger systems may however benefit from
computations using more than two GPUs.
10−4
10−3
10−2
10−1
100
nr. busses / nr. vc nodes (8 grids)
tim
e 
(s.
)
Algorithm Comparison, Network Reduction
 
 
1648/
313
2383/
327
2736/
280
2737/
255
2746/
379
2746/
388
3120/
349
7917/
1325
Direct evalution of (4) (CPU only)
Reduce−Factor−Solve (CPU only)
Node Elimination
Direct evalution of (4) (GPU accel.)
Reduce−Factor−Solve (GPU accel.)
Fig. 7: Computation time for network reduction of the nc nodes
with direct evaluation of (4) using KLU for the factorization
and with/without GPU acceleration (red), the reduce-factor-solve
algorithm with/without GPU acceleration (black), and node elimi-
nation only (blue). The CPU execution times are roughly equivalent
for all but the largest system. The reduce-factor-solve approach
with GPU acceleration consistently improves the execution time by
approximately a factor of 5. For the largest system, the GPU acceler-
ated reduce-factor-solve approach is 6 times faster than evaluating
(4) directly.
10 Future Work
The algorithms discussed in this paper concerns some of the steps
involved in common transient stability methods. Online application
of these methods will require fast computation of all the involved
operations. In particular, a number of algorithms including the
Lyaponov method and EAC solves linear systems involving the
reduced admittance matrix. In this paper, it is shown how to com-
pute the reduced matrix fast e.g. for evaluating the Lyaponov energy.
However, because the reduced matrix is dense, the network spar-
sity cannot be exploited when solving the mentioned linear system.
The speedup of the reduce-factor-solve approach lies precisely in the
ability to avoid dense sub-matrices, and we are therefore working on
applying similar approaches to reducing the execution time of the
remaining operations needed for the transient stability methods.
11 Conclusion
Real-time calculation of different properties of the grid operation
state is necessary for online stability and security assessment. In
particular, calculating Thevenin impedances and performing net-
work reduction is important for several suggested approaches to
stability and security assessment. The paper introduces the reduce-
factor-solve approach that allows computation of both Thevenin
impedances and fast network reduction, and it describes the theo-
retical foundation and implementation details for the algorithm.
The performance and validity of the approach is tested on sev-
eral power systems. Comparison with previous approaches shows
approximately 80 times speed-up for the largest power system for
calculation of Thevenin impedances and 5 times speed-up when per-
forming network reduction using GPU acceleration. Consequently,
neither Thevenin impedance computation or network reduction con-
stitute bottlenecks for real-time stability and security assessment.
IET Research Journals, pp. 1–8
c© The Institution of Engineering and Technology 2015 7
12 Acknowledgments
This work is part of the Security Assesment of Renewable Power
Systems (SARP) project with support from ForskEL - Energinet.dk.
13 References
1 F. Li, W. Qiao, H. Sun, H. Wan, J. Wang, Y. Xia, Z. Xu, and P. Zhang, “Smart trans-
mission grid: Vision and framework,” IEEE Transactions on Smart Grid, vol. 1,
no. 2, pp. 168 –177, Sep. 2010.
2 S. Sommer and H. Johannsson, “Real-time thevenin impedance computation,” in
ISGT, 2013 IEEE PES, 2013, pp. 1–6.
3 H. Jóhannsson, A. H. Nielsen, and J. Østergaard, “Wide-Area Assessment of Ape-
riodic Small Signal Rotor Angle Stability in Real-Time,” IEEE Trans. Power Syst.,
vol. 28, no. 4, pp. 4545–4557, 2013.
4 H. Jóhannsson, J. Østergaard, and A. H. Nielsen, “Identification of critical trans-
mission limits in injection impedance plane,” International Journal of Electrical
Power & Energy Systems, vol. 43, no. 1, 2012.
5 S. Corsi and G. Taranto, “A real-time voltage instability identification algorithm
based on local phasor measurements,” IEEE Transactions on Power Systems,
vol. 23, no. 3, pp. 1271 –1279, Aug. 2008.
6 I. Smon, G. Verbic, and F. Gubina, “Local voltage-stability index using tellegen’s
theorem,” in IEEE PES General Meeting, 2007.
7 K. Vu, M. Begovic, D. Novosel, and M. Saha, “Use of local measurements to
estimate voltage-stability margin,” IEEE Transactions on Power Systems, vol. 14,
no. 3, pp. 1029 –1035, Aug. 1999.
8 L. Warland and A. Holen, “Estimation of distance to voltage collapse: Testing an
algorithm based on local measurements,” in PSCC, 2002.
9 A. Perez, H. Jóhannsson, and J. Østergaard, “Wind farms generation limits and
its impact in real-time voltage stability assessment,” in IEEE PowerTech 2015,
Eindovhen, June 2015.
10 ——, “Improved method for considering PMU’s uncertainty and its effect on
real-time stability assessment methods based on Thevenin equivalent,” in IEEE
PowerTech 2015, Eindovhen, June 2015.
11 A. Perez, H. Jóhannsson, T. V. Cutsem, M. Glavic, and J. Østergaard, “ Improved
Thevenin equivalent methods for real-time voltage stability assessment,” in IEEE
International Energy Conference 2016, 2016.
12 J. Møller, H. Jóhannsson, and J. Østergaard, “Fast Computation of Steady State
Nodal Voltages in Power System Contingency Studies,” in 2014 IEEE Power
Quality and Reliability Conference, Tallin, Estonia, June 2014.
13 J. G. Møller, H. Jóhannsson, and J. Østergaard, “Super-Positioning of Voltage
Sources for Fast Assessment of Wide-Area Thevenin Equivalents,” IEEE Trans.
on Smart Grid, vol. 8, no. 3, pp. 1488–1493, 2017.
14 J. Møller, H. Jóhannsson, and J. Østergaard, “Contingency Assessment with Detec-
tion of Aperiodic Small-Signal Instability,” in 2015 IEEE PES General Meeting,
Denver, Colarado, July 2015.
15 G. Gless, “Direct method of liapunov applied to transient power system stability,”
Power Apparatus and Systems, IEEE Transactions on, vol. PAS-85, no. 2, pp. 159
–168, Feb. 1966.
16 T. Weckesser, H. Jóhannsson, S. Sommer, and J. Østergaard, “Investigation of the
adaptability of transient stability assessment methods to real-time operation,” in
IEEE PES ISGT Europe, Berlin, 2012.
17 M. Pavella, D. Ernst, and D. Ruiz-Vega, Transient Stability of Power Systems: A
Unified Approach to Assessment and Control, 1st ed. Springer, Oct. 2000.
18 F. Dorfler and F. Bullo, “Kron reduction of graphs with applications to electrical
networks,” IEEE Transactions on Circuits and Systems, 2011.
19 M. Belkacemi and N. Harid, “Fast reduction and modification of power system
sparse matrices,” Electric Power Components and Systems, vol. 32, no. 4, pp. 367–
373, 2004.
20 M. Abu-Elnaga, M. El-Kady, and R. Findlay, “Sparse formulation of the tran-
sient energy function method for applications to large-scale power systems,” IEEE
Transactions on Power Systems, vol. 3, no. 4, 1988.
21 T. A. Davis and E. Palamadai Natarajan, “Algorithm 907: KLU, a direct sparse
solver for circuit simulation problems,” ACM Trans. Math. Softw., vol. 37, no. 3, p.
36:1–36:17, Sep. 2010.
22 M. Ilic’-Spong, M. L. Crow, and M. A. Pai, “Transient stability simulation by
waveform relaxation methods,” Power Systems, IEEE Transactions on, vol. 2,
no. 4, pp. 943 –949, Nov. 1987.
23 F. Pruvost, T. Cadeau, P. Laurent-Gengoux, F. Magoules, and F.-X. Bouchez,
“Numerical accelerations for power systems transient stability simulations,” in
17th Power System Computation Conference, 2011.
24 H. Jóhannsson, “Development of early warning methods for electric power
systems,” Ph.D. dissertation, Technical Univ. of Denmark, 2011.
25 T. A. Davis, Direct Methods for Sparse Linear Systems. SIAM, 2006.
26 F. Zhang, Ed., The Schur Complement and Its Applications, ser. Numerical
Methods and Algorithms, 2005, vol. 4.
27 P. R. Amestoy, T. A. Davis, and I. S. Duff, “Algorithm 837: AMD, an approximate
minimum degree ordering algorithm,” ACM Trans. Math. Softw., vol. 30, no. 3, p.
381–388, Sep. 2004.
28 Y. Saad, Iterative Methods for Sparse Linear Systems. SIAM, 2003.
29 Y. Saad and M. Sosonkina, “Distributed schur complement techniques for general
sparse linear systems,” SIAM J. Sci. Comp., vol. 21, 1997.
30 Z. Ye, D. Vasilyev, Z. Zhu, and J. Phillips, “Sparse implicit projection (SIP)
for reduction of general many-terminal networks,” in IEEE/ACM International
Conference on Computer-Aided Design, 2008.
31 K. Matam and K. Kothapalli, “Accelerating sparse matrix vector multiplication in
iterative methods using GPU,” in ICPP, 2011.
32 A. Buluç and J. R. Gilbert, “Parallel sparse matrix-matrix multiplication and index-
ing: Implementation and experiments,” SIAM Journal on Scientific Computing,
vol. 34, no. 4, 2012.
33 R. Zimmerman, C. Murillo-Sanchez, and R. Thomas, “MATPOWER: steady-state
operations, planning, and analysis tools for power systems research and education,”
Transactions on Power Systems, vol. 26, 2011.
IET Research Journals, pp. 1–8
8 c© The Institution of Engineering and Technology 2015
