Order reduction of large thermal and electrical models with system-theoretic techniques and matrix equation algorithms by Φλώρος, Γεώργιος Σ.
U n i v e r s i t y  o f  Th e s s a l y
D o c t o r a l  T h e s is
Order reduction of large thermal and 
electrical models with system-theoretic 
techniques and matrix equation algorithms
Author: Supervisors:
George FLOROS Prof. Nestor Ev m o r f o p o u l o s
Prof. George St a m o u l is  
Prof. Gerasimos Po t a m ia n o s
A thesis submitted in fulfillment of the requirements 
for the degree of Doctor of Philosophy
in the
Electronics Lab
Department of Electrical and computer Engineering
November 30, 2019
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
11
U N I V E R S I T Y  OF
T Η E S S ALY
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
iii
Declaration of Authorship
I, George Floros, declare that this thesis titled, “Order reduction of large thermal and elec­
trical models with system-theoretic techniques and matrix equation algorithms ” and the work 
presented in it are my own. I confirm that:
• This work was done wholly or mainly while in candidature for a research degree at this 
University.
• Where any part of this thesis has previously been submitted for a degree or any other 
qualification at this University or any other institution, this has been clearly stated.
• Where I have consulted the published work of others, this is always clearly attributed.
• Where I have quoted from the work of others, the source is always given. With the 
exception of such quotations, this thesis is entirely my own work.
• I have acknowledged all main sources of help.
• Where the thesis is based on work done by myself jointly with others, I have made 
clear exactly what was done by others and what I have contributed myself.
signed:
Date:
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
v“Madness has no purpose. Or reason. But it may have a goal...”
Spock (Leonard Nimoy)
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
vii
UNIVERSITY OF THESSALY
Abstract
Department of Electrical and Computer Engineering 
Doctor of Philosophy
Order reduction of large thermal and electrical models with system-theoretic 
techniques and matrix equation algorithms
by George Floros
Computer simulation of modern IC subsystems, like power grids, interconnect structures and 
substrate regions, possesses a fundamental role in the EDA industry. The electric models of 
the aforementioned systems are enormous, and their functional simulation requires solving 
systems of equations with dimension that can reach several millions or even billions. Model 
Order Reduction (MOR) techniques provide attractive ways to reduce these highly complex 
models, replacing them by models with much smaller internal dimensions whose behavior at 
the input/output ports approximates that of the original models.
Balanced Truncation (BT) type MOR methods of internal states that are at the same time 
insufficiently controllable at the inputs and insufficiently observable at the outputs have the 
advantage of very reliable estimates for the accuracy of the reduced model. However, they 
have significant computational and storage costs for producing the reduced-order models, as 
they require the solution of Lyapunov matrix equations which is an intensive computational 
procedure, and also involve the storage of dense matrices, making them applicable only to 
models of few thousand states. To eliminate computational costs and storage needs, low-rank 
factorization methods have been developed that allow the use of these methods in real-world 
applications.
The emphasis in the present work is given in circuit and thermal models derived from 
integrated circuits and the calculation of the reduced-order models on specific frequency or 
time windows, as well as on the efficient implementation of Balanced Truncation type meth­
ods. More specifically, in most applications the circuit is intended to operate only in specific 
frequency windows, which means that the reduced-order model can become unnecessarily 
large to achieve approximation over all frequencies. Correspondingly, for the thermal models 
there is usually a final time in which all thermal effects can be assumed to have reached steady 
state. The focus is on model order reduction methods in specific frequency or time windows, 
combined with the efficient implementation of sparse methods for calculating Lyapunov-type 
matrix equations in order to manipulate large-scale input models.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
IX
Greek Abstract
Μείωση τάξης μεγάλων θερμικών και ηλεκτρικών μοντέλων με τεχνικές θεωρίας 
συστημάτων και αλγορίθμους εξισώσεων πινάκων
Η αριθμητική προσομοίωση των σύγχρονων υποσυστημάτων των ολοκληρωμένων 
κυκλωμάτων, όπως τα ηλεκτρικά δίκτυα, οι αγωγοί διασύνδεσης και οι περιοχές υπ­
οστρώματος, έχουν θεμελιώδη σημασία για την βιομηχανία ολοκληρωμένων κυκλωμάτων. 
Τα ηλεκτρικά μοντέλα των προαναφερθέντων συστημάτων είναι τεράστια και η λει­
τουργική τους προσομοίωση απαιτεί την επίλυση συστημάτων εξισώσεων με διαστάσεις 
που μπορούν να φτάσουν αρκετά εκατομμύρια. Οι τεχνικές υποβιβασμού τάξης μοντέλου 
(model order reduction - MOR) παρέχουν ελκυστικούς τρόπους για τη μείωση αυτών των 
πολύπλοκων μοντέλων, αντικαθιστώντας τα με μοντέλα με πολύ μικρότερες εσωτερικές 
διαστάσεις των οποίων η συμπεριφορά στις θύρες εισόδου /  εξόδου (ports) προσεγγίζει 
αυτή των αρχικών μοντέλων.
Οι μέθοδοι MOR τύπου “εξισορρόπησης και αποκοπής” (Balanced Truncation - BT) 
των εσωτερικών καταστάσεων που είναι ταυτόχρονα μη επαρκώς ελέγξιμες από τις εισό­
δους και μη επαρκώς παρατηρήσιμες στις εξόδους, διαθέτουν το πλεονέκτημα των πολύ 
αξιόπιστων εκτιμήσεων για την ακρίβεια προσέγγισης του μειωμένου μοντέλου. Εμ­
περιέχουν, όμως, σημαντικό υπολογιστικό και αποθηκευτικό κόστος για την εξαγωγή 
των μειωμένων μοντέλων, καθώς απαιτούν την επίλυση δαπανηρών εξισώσεων πυκνών 
πινάκων τύπου Lyapunov, γεγονός που τις καθιστά άμεσα εφαρμόσιμες μόνο σε μοντέλα 
της τάξης λίγων χιλιάδων καταστάσεων. Για να απαλειφθεί το υπολογιστικό κόστος και 
οι ανάγκες αποθήκευσης, έχουν αναπτυχθεί μέθοδοι παραγοντοποίησης χαμηλού βαθμού 
(low-rank) που επιτρέπουν την χρήση αυτών των μεθόδων σε πραγματικές εφαρμογές.
Η έμφαση στην παρούσα εργασία δίνεται σε κυκλωματικά και θερμικά μοντέλα που 
προκύπτουν από ολοκληρωμένα κυκλώματα και στον υπολογισμό του ελαττωμένου μον­
τέλου σε συγκεκριμένα παράθυρα συχνοτήτων ή χρόνου, καθώς και στην αποδοτική 
υλοποίηση μεθόδων τύπου Balanced Truncation. Στις περισσότερες εφαρμογές το κύκλ­
ωμα προορίζεται να λειτουργεί μόνο σε συγκεκριμένα παράθυρα συχνοτήτων, πράγμα που 
σημαίνει ότι το το μοντέλο μειωμένης τάξης μπορεί να γίνει άσκοπα μεγάλο για να επι­
τευχθεί προσέγγιση σε όλες τις συχνότητες. Αντίστοιχα στα θερμικά μοντέλα υπάρχει 
συνήθως ένας τελικός χρόνος κατά τον οποίο όλες οι θερμικές επιδράσεις μπορούν να 
θεωρηθούν ότι έχουν φθάσει σε σταθερή κατάσταση. Η εστίαση δίνεται σε μεθόδους 
υποβιβασμού τάξης μοντέλου σε συγκεκριμένα παράθυρα συχνοτήτων ή χρόνου, σε συνδ­
υασμό με την αποδοτική υλοποίησης αραιών μεθόδων για τον υπολογισμό των εξισώσεων 
πινάκων τύπου Lyapunov με σκοπό τον χειρισμό μεγάλων μοντέλων εισόδου.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
xi
Acknowledgements
Firstly, I would like to express my sincere gratitude to my advisor Prof. Nestor Evmor- 
fopoulos for the continuous support of my Ph.D study and related research, for his patience, 
motivation, and immense knowledge. His guidance helped me in all the time of research and 
writing of this thesis. I could not have imagined having a better advisor and mentor for my 
Ph.D study.
Besides my advisor, I would like to thank the rest of my thesis committee: Prof. George 
Stamoulis, and Prof. Gerasimos Potamianos, for their insightful comments and encourage­
ment, but also for the hard question which incented me to widen my research from various 
perspectives. My sincere thanks also goes to the other committee members for accepting the 
invitation.
I thank my fellow labmates in for the stimulating discussions, for the sleepless nights we 
were working together before deadlines, and for all the fun we have had in the last four years. 
Last but not the least, I would like to thank anyone who supported me spiritually throughout 
writing this thesis and my life in general.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
xiii
Contents
Declaration of Authorship iii
Abstract vii
Greek Abstract ix
Acknowledgements xi
1 Introduction 1
1.1 Introduction...........................................................................................................  1
1.2 Contributions........................................................................................................  2
1.3 Outline .................................................................................................................  3
2 Model Order Reduction 5
2.1 MOR Overview for LTI System s........................................................................  5
2.1.1 M om en ts .............................................................................................  6
2.1.2 Stability................................................................................................  6
2.1.3 Passivity .................................................................................................. 6
2.2 Moment Matching Methods ............................................................................... 7
2.3 Balanced Truncation Methods ............................................................................ 8
3 Frequency-Limited Reduction of Regular and Singular Circuit Models via Low-
Rank Model Order Reduction 11
3.1 Introduction...........................................................................................................  11
3.2 Related W o rk ........................................................................................................  12
3.3 Background ...........................................................................................................  13
3.3.1 MOR by Balanced Truncation for Circuit Simulation Problems . . . . 13
3.3.2 Handling of Singular Descriptor M odels........................................... 16
3.3.3 Balanced Truncation in Limited Frequency Intervals .........................  17
3.4 Low-rank EKS method for frequency-limited Lyapunov eq uations ................ 18
3.4.1 Extended Krylov Subspace Method for Solving Lyapunov Equations . 18
Sparse matrix in p u t s ............................................................................... 19
Orthogonalization in steps 2 and 9 ......................................................... 19
Solution of small-scale Lyapunov equation in step 5   19
Orthogonalization in step 8 .....................................................................  20
Convergence criterion ............................................................................ 20
Lower rank solution ............................................................................... 20
3.4.2 Application of EKS Method to Frequency-Limited Lyapunov Equa­
tions ...........................................................................................................  20
3.4.3 Sparse Implementation for Singular Descriptor M odels......................  21
Construction of RHS ............................................................................... 22
Sparse linear system solutions ...............................................................  22
Sparse matrix-vector products ...............................................................  23
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
xiv
Construction of system m a tr ix ...............................................................  23
3.5 Modified ADI method for solving frequency-limited Lyapunov equations . . 23
3.6 Complete procedure with the EKS m ethod......................................................... 25
3.7 Complete procedure with the ADI m ethod ......................................................... 26
3.8 Experimental R esu lts ...........................................................................................  26
4 Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction 31
4.1 Introduction...........................................................................................................  31
4.2 Related W o rk ........................................................................................................  32
4.3 On-Chip Thermal Modeling ............................................................................... 33
4.4 Low-Rank Model Order Reduction for Thermal M odels.................................... 36
4.4.1 Balanced Truncation for Thermal M o d e ls ............................................  36
4.4.2 Low-Rank Solution of Lyapunov Equations.........................................  38
4.4.3 Extended Krylov Subspace Method ...................................................... 38
4.4.4 EKS Method Implementation D e ta i ls ..................................................  39
4.5 Proposed Methodology for Hotspot Thermal Simulation ................................ 40
4.6 Extension in Limited-Time In tervals..................................................................  42
4.7 Computation of the RHS of the Time-Limited Gramians ................................. 42
4.8 Proposed Methodology for Time-Limited Hotspot Thermal Simulation . . .  43
4.9 Experimental R esu lts ...........................................................................................  43
5 Conclusions and Future Directions 49
5.1 Conclusions...........................................................................................................  49
5.2 Future Directions .................................................................................................. 49
A Relevant Publications 51
Relevant Publications ..................................................................................................... 51
Bibliography 53
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
xv
List of Figures
2.1 Model Order Reduction of LTI systems................................................................ 6
3.1 Comparison of transfer functions of ROMs from standard BT and frequency-
limited BT in MNA_4 benchmark at ports (2,2) and (3,2)..................................  27
3.2 Comparison of transfer functions of ROMs from standard BT and frequency-
limited BT in PG2 benchmark at ports (1,3), (3,2) and (4,2).............................. 29
4.1 Spatial discretization of chip for thermal analysis, and formulation of electri­
cal equivalent problem............................................................................................ 35
4.2 Schematic depiction of input heat sources and output hotspots in the 3D dis­
cretization of a chip................................................................................................. 36
4.3 Magnitude of Hankel Singular Values for two benchmark circuits........................ 45
4.4 Results of transient analysis for original and reduced-order model at a hotspot
point in two benchmark circuits.................................................................................46
4.5 Transient simulation of a hotspot in benchmark ckt3 with time-limited BT. . . 47
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
List of Tables
3.1 Reduction results of frequency-limited BT vs standard BT for various circuit
benchmarks.............................................................................................................. 26
4.1 Analogy between electrical and thermal circuits.................................................. 34
4.2 Statistics of benchmark circuits. Material layers include both metal and in­
sulator layers, and heat sources represent sources of power dissipation from
chip logic blocks......................................................................................................... 44
4.3 Model Order Reduction results.............................................................................. 44
4.4 Runtime results for transient thermal simulation of the original and the reduced-
order model with direct methods............................................................................... 45
4.5 Runtime results for transient thermal simulation of the original and the reduced-
order model with iterative methods........................................................................... 46
4.6 Reduction results of time-limited BT vs standard BT for various circuit bench­
marks........................................................................................................................ 46
xvii
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
xix
List of Abbreviations
IC Interconnect Circuit
VLSI Very Large Scale Integration
MOR Model Order Reduction
CAD Computer Aided Design
BT Balanced Truncation
EDA Electronic Design Automation
EKS Extended Krylov Subspace
ADI Alternating Direction Implicit
LTI Linear Time Invariant
SVD Singular Value Decomposition
GPU Graphic Processor Unit
IC Integrated Circuit
SOI Silicon on Insulator
FDM Finite Difference Method
PCG Preconditioned Conjugate Gradient
FEM Finite Element Method
ADI Alternating Direction Implicit
RHS Right Hand Side
ROM Reduced Order Reduction
NN Neural Net
LUT Look Up Table
PDE Partial Differential Equation
MNA Modified Nodal Analysis
ODE Ordinary Differential Equations
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
xxi
Dedicated to ... wish I knew
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
1Chapter 1
Introduction
1.1 Introduction
The ongoing miniaturization, below the 45-nm process technology, of modern IC devices like 
transistors, has continued unabated for the last 50 years, in strict accordance with the provi­
sions of Moore’s Law. This has led to extremely complex circuits (modern processors contain 
many billions of transistors and are easily the most complex human structures) and to a corre­
sponding increase of the problems associated with the analysis and simulation of the physical 
operation of these circuits. In particular, the performance and reliable operation of integrated 
circuits are largely determined by several critical subsystems such as the power distribution 
network, multi-conductor interconnections, and the semiconductor substrate (through which 
undesired digital signals and noise propagation are transmitted by digital signals into ana­
log sections). The electrical models of the above subsystems are very large, consisting of 
hundreds of millions or billions of electrical elements (mostly resistors R, capacitors C, and 
inductors L), and their simulation depends on solving systems with dimensions up to 1011 
which is becoming a huge mathematical problem. Even if their individual solution is feasi­
ble, it is completely impossible to combine them with the rest of the integrated circuit and 
with many consecutive time-steps or frequencies. However, for the above subsystems it is 
often not necessary to fully simulate all internal state variables (node voltages and branch 
currents), but only to calculate the responses in the time or frequency domain for a small 
subset of output terminals (ports) for given excitations at some input ports. In such cases, 
the very large electrical model can be replaced by a much smaller model whose behavior 
at the input/output ports is close to the original model. This process is called Model Order 
Reduction (MOR).
Downgrading large-scale electrical models is of paramount importance for companies 
that are active in integrated circuits and development of CAD tools, as well as in applications 
in the field of electricity. This field has attracted a great deal of research interest in the last 
ten years, during which there has been a dramatic increase in the dimension of the aforemen­
tioned electrical models. As a result, compact modeling of passive RLC interconnect net­
works is a major research area nowadays due to increasing complexity of high-performance 
VLSI designs [1]. The dimension reduction at the model level, is mostly done by math­
ematical algorithms, which produce a much smaller model, usually by a sufficiently large 
factor, that reproduces the original model response at a prescribed level of accuracy. MOR 
techniques are generally classified into two classes, namely moment-matching (or Krylov- 
subspace) techniques and balancing-type (or Gramian-based) techniques.
The idea behind Krylov-subspace techniques is to project the original models into an or­
thonormal subspace, which is called Krylov subspace. Projection procedures try to preserve 
the moment information of the original transfer function. Moreover, in order to ensure the 
passivity of the reduced-order model they usually adopt a congruence transformation that 
can preserve the reduced-order model passivity if the original model matrices are also in a 
passive form. A successful moment-matching algorithm that is widely used and developed in
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
2 Chapter 1. Introduction
the field of circuit simulation is the PRIMA algorithm [2], which uses the Krylov subspace 
vectors to form the projector and build a congruence transformation, which leads to passive 
models with the matched moments. Projection-based methods, however, have several draw­
backs, with the most prominent one, is that they are not efficient for circuits with many input 
and output terminals since reduction cost depends on the number of ports. Moreover, it can 
be shown that the number of poles of reduced-models has a direct relationship to the number 
of terminals. Secondly, moment-matching methods do not generally preserve structure prop­
erties like reciprocity of circuit equations. Finally, it is difficult to apply moment-matching 
methods to high frequency models where the model parameters are usually frequency depen­
dent.
The other major class of model order reduction for circuit models is by means of system- 
theoretic Balanced Truncation (BT) methods [3], where the weak unobservable and uncon­
trollable state variables are eliminated in order to produce the reduced-order models. More­
over, in system-theoretic techniques, like BT the reduced-order model is not depended from 
the number of ports, and has very satisfactory and reliable bounds for the approximation 
error. The BT methods generally produce nearly optimal models but they are more compu­
tationally expensive than projection-based methods. This happens due to the solution of the 
two Lyapunov matrix equations that govern BT-type methods with O (n3) time complexity 
and O (n2) storage needs, which is an important drawback.
In this dissertation, we focus on the compact modeling of on-chip interconnects and ther­
mal models that arise in VLSI designs by applying system-theoretic techniques along with 
the efficient computation of Lyapunov matrix equation in order to make this type of meth­
ods applicable to real-world circuit simulation problems. This class of problems generally, 
produces linear time invariant (LTI) systems because interconnect parasitics are usually mod­
eled as linear RLC circuits, while thermal models have a direct representation in RC circuits 
which we will describe later in this thesis.
1.2 Contributions
In order to address the limitations of the existing system-theoretic techniques for Model Order 
Reduction in VLSI interconnect problems, the purpose of this dissertation is to introduce 
an efficient BT framework for the reduction of large-scale circuit and thermal models by 
adopting state-of-the-art matrix equation solution algorithms that can handle large-scale input 
models.
More specifically, the contributions are described below:
• Develop procedures for applying large-scale matrix equations solvers to the solution 
of standard, time-limited and frequency-limited Lyapunov equations (which are differ­
ent than the standard Lyapunov equations). This was made possible by implementing 
efficient computational choices by keeping sparse system matrices into their original 
forms (both for circuit and thermal models). By the proposed approaches the distill 
mathematical techniques are transformed into a complete algorithmic procedure, and 
the potential of the system-theoretic BT approach is evaluated in actual VLSI prob­
lems.
• For circuit simulation problems, MOR techniques based on BT offer very good error 
estimates and can provide compact models with any desired accuracy over the whole 
range of frequencies (from DC to infinity). However, in most applications the cir­
cuit is only intended to operate at specific frequency windows, which means that the 
reduced-order model can become unnecessarily large to achieve approximation over 
all frequencies. In this dissertation, we present a frequency-limited approach which,
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
1.3. Outline 3
combined with efficient low-rank sparse implementations of the Extended Krylov Sub­
space (EKS) and the Alternating Direction Implicit (ADI) methods, can handle large 
input models and provably leads to reduced-order models that are either smaller or 
exhibit better accuracy than full-frequency BT.
• For efficient thermal simulation temperature is not required to be computed at every 
point of the IC but only at certain hotspots, in order to assess the circuit's compliance 
with thermal specifications. This makes the thermal analysis problem amenable to 
Model Order Reduction techniques. System-theoretic techniques like Balanced Trun­
cation offer very reliable bounds for the approximation error, which can be used to 
control the order and accuracy of the reduced models during creation, at the expense 
of greater computational complexity to create them. To this end, we propose a compu­
tationally efficient low-rank Balanced Truncation algorithm based on the EKS method, 
which retains all the system-theoretic advantages in the reduction of model order for 
fast hotspot thermal simulation.
1.3 Outline
The next chapters of the dissertation are organized as follows. Firstly, the relevant back­
ground information regarding Model Order Reduction for dynamical systems is presented in 
Chapter 2. Then, in Chapter 3 we describe a methodology for reducing large-scale regular 
and singular electrical models in limited-frequency windows along with efficient computa­
tional choices, making the method applicable to large-scale interconnect problems. Chapter 
4 describes the theory behind thermal analysis for VLSI circuits and an efficient method 
for thermal analysis of the hotpots that arise in modern VLSI designs. Finally, Chapter 5 
concludes the dissertation.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
5Chapter 2
Model Order Reduction
2.1 MOR Overview for LTI Systems
In this dissertation, we focus on LTI systems which described by the following set of equa­
tions:
dx(t) 
dt
A x (t) +  B u (t),
(2.1)
y (t) =  Lx(t)
Model Order Reduction (MOR) aims at generating a reduced-order model:
dx (t ) 
dt
Ax (t) +  B u(t), 
y (t) =  Lx (t)
(2.2)
with A  E R rxr, B E R rxp, L E R qxr as shown in Fig. 2.1, which both exhibits r < <  n 
and constitutes a good approximation of (2.1), in that the output error is bounded as ||y  (t) — 
y(t) ||2 <  ε11u(t ) 112 for given input u (t) and given small ε. The bound in the output error 
can be equivalently written in the frequency domain as ||y(s) — y(s) ||2 <  ε| |u(s) 112 via 
Plancherel’s theorem [4]. If
H (s) =  L(sI — A ) - 1 B
(2.3)
H (s) =  L (sI — A )—1B
are the transfer functions of the original and the reduced-order model, then the output error 
in the frequency domain is:
| |y (s) — y (s)||2 =  ||H (s)u (s) — H (s)u (s ) |h  (24)
< | | H (s) — H ( s ) | |« | |u (s)||2 ( . )
where | |. | |TO is the induced L2 matrix norm, or norm of a rational transfer function. 
Therefore, the output error can be bounded by bounding the distance between the transfer 
functions as | |H (s) — H (s) | |TO <  ε.
Due to the large dynamical systems that arise in nowadays VLSI interconnect problems 
the MOR algorithms should satisfy the following three requirements.
• Must have a reasonable accuracy in the reduced-order model.
• Must have a satisfactory reduction rate in the reduced-order model.
• Must have an efficient computational complexity for producing the reduced-order model.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
6 Chapter 2. Model Order Reduction
Figure 2.1: Model Order Reduction of LTI systems.
2.1.1 Moments
The transfer function of (2.3) is a function of s, and can be expanded into a moment expansion 
around s =  0 as follows:
H (s) =  Mo +  M is +  M2s2 +  M3S3 . . .  (2.5)
where the M0, M i, M2, M3, . . .  are the moments of the transfer function. Specifically, in 
circuit simulation problems the M0 moment is the DC solution of the linear system. This 
means that the inductors of the circuit are considered as short circuits, and the capacitors as 
open circuits. Moreover, the M i moment is the Elmore delay of the linear model, which 
calculates the time for a signal at the input port to reach the output port. The Elmore formula 
can be defined as follows:
ted = th(t)d t  (2.6)
0
where h (t) is the impulse response function. By applying the Laplace transformation in the 
transfer function of (2.3) and expanding the exponential factor with a Taylor series, it can be 
shown that the Elmore delay corresponds to the M i moment of the transfer function.
2.1.2 Stability
The eigenvalues and the poles of a system are related to the stability of the system. The basic 
property of a stable system is that ensures that its output signal is limited in the time domain. 
Recalling the linear model of (2.1), the system is stable if and only if, it holds that for all the 
eigenvalues Aj, Re(Aj) <  0 and for all eigenvalues that Re(Aj) =  0, Aj is simple. In this 
case the system matrix A is proved to be stable.
Stability of the system is strongly associated with several properties. The most important 
property is that if A is stable the inverse and the transpose matrix, A - 1 and A T respectively, 
are also stable.
2.1.3 Passivity
The stability of physical structures is usually a common property. However, stability is not 
always strong enough for VLSI applications. A stable structure can become unstable if non­
linear components are connected to it. Therefore passivity is a stronger property than stability
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
2.2. Moment Matching Methods 7
and it shows, that a passive model is incapable of generating energy. Generally for linear 
systems that are stable and passive, it would be preferable to apply a reduction process in 
order to preserve this properties after the reduction.
In order to define the passivity property intuitively, we first consider a linear circuit model 
with n-ports, then the passivity is defined with the following way. If the power absorbed by 
this n-port circuit is greater or equal to zero for all frequencies s such that Re(s) >  0, then 
this model is passive.
Following this example, we can now define the passivity based on this argument. As a 
result, the transfer function H (s) of a linear system is passive if and only if
H(s) +  H*(s) >  0 (2.7)
with Re(s) >  0 for all s.
2.2 Moment Matching Methods
One of the most important and successful MOR methods for linear systems is based on 
Krylov subspaces. They are usually called moment-matching methods and they are very 
efficient in circuit simulation problems. Methods based on moment matching are formulated 
in order to have for a direct application to the linear model of (2.1).
By applying the Laplace transformation to (2.1), we obtain s domain equations as:
sX(s) -  X(0) =  AX(s) +  BU(s)
Y(s) =  LX(s)
Assuming that X(0) =  0 and an impulse response in applied in U(s) (i.e. U(s) 
the above system of equations is:
(2.8) 
1) then
(si -  A)X(s) =  B 
Y(s) =  LX(s)
and by expanding the Taylor series at s =  0:
(si — A) (x0 +  xis +  X2s2 +  . . . )  =  B
Finally, we can obtain a moment computation formula as follows:
X0 =  A —1B, m 0 =  Lx0 
xi =  A —1X0, m i =  Lxi 
X2 =  A —1 xi, m 2 =  Lx2
(2.9)
(2.10)
(2.11)
while generally the i-th moment can be computed as
m; =  L(A —1 );A —1B (2.12)
For moment-matching reduction techniques the goal is the derivation of a reduced-order 
model where some moments m ; of the reduced-order transfer function H  (s) match some 
moments of the original transfer function H(s).
Let us now denote the two projection matrices onto a lower dimensional subspace, as 
W £ R nxr and V £ R rxn respectively. This matrices can be computed from the associated 
moment vectors using one or more expansion points. For the ease of representation we
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
8 Chapter 2. Model Order Reduction
assume that s =  0, then the matrices W and V are defined as follows:
range(W) = span{A -1B, (A-1 )2B ,. . . ,  (A-1 )rB} 
range(V) = span{L, A -T L, (A-T )2B ,. . . ,  (A-T )rB}
The computed reduced-order model matches the first 2r moments and is obtained by the 
following matrices
A  = W T AV, B =  W T B, L =  LV (2.14)
and provides a good approximation around 0.
The first approach on moment-matching MOR for circuit simulation problems was the 
Asymptotic Waveform Evaluation method (AWE) presented in [5]. It is an efficient frequency- 
domain analysis approach, however, the method was suffered due to the explicit moments 
computation, causing numerical instability. In order to overcome the problem associated 
with the AWE method, a more recent research work led to a numerically robust method of 
Pade via Lanczos (PVL) [6] where the Lanczos process, which has better numerical stability 
for computing the eigenvalues of a matrix, was deployed to find the Krylov subspaces of 
the corresponding matrices. This method computes orthogonal bases of V and W implic­
itly, and also the moments are implicitly matched by the reduced model. This method has 
been succeeded from the most successful moment-matching reduction method for passive 
reduced-order interconnect macromodeling (PRIMA) [2] and the block structure preserva­
tion alternates known as SPRIM [7] and BSMOR [8].
Because the field of moment matching is mature, recent methods try to address the MOR 
problem by attacking in three domains, sparsity, terminal-merging, and efficient frequency 
selection points. Firstly, since the reduced-order models are usually dense, a recent approach 
tried to exploit sparsity preservation techniques [9]. Moreover, since the dimension of the 
reduced model has a direct relationship with the input and output number of ports, authors 
in [10, 11] compute a reduced-order model for a subset of terminals, while the authors in [12] 
firstly partition the circuit in several blocks, and reduce each one separately. Finally, for 
matching the transfer function over a larger frequency range, rational methods including 
multiple expansion points such as [13] were developed, as well as efficient frequency hopping 
algorithms for choosing the expansion points [14]. In general, moment-matching techniques 
suffer from several drawbacks, the most prominent of which is that they do not offer any 
a-priori estimation for the approximation error. This can result in reduced models that are 
not very accurate or of sufficient small order. Error-bounds are very important in VLSI 
interconnect modeling problems and we need to employ system-theoretic techniques in order 
to address this problem.
2.3 Balanced Truncation Methods
The other important class of MOR methods is the Balanced Truncation type methods which 
generally deliver better reduced models than Krylov subspace techniques, by making an extra 
effort in choosing the projection subspaces based on the Controllability and Observability 
of the Linear Time Invariant (LTI) system. The controllability and observability Gramian 
matrices are:
P exp(A t)B B Texp (A t)Tdt
exp (A t)T LT Lexp(At)dt
(2.15)
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
2.3. Balanced Truncation Methods 9
which are equivalently derived by the solution of the Lyapunov matrix equations [15]:
AP +  PAT =  -B B T 
A T Q +  QA =  - L T L
(2.16)
The controllability Gramian P characterizes the input-to-state behavior, i.e. the degree to 
which the states are controllable (reachable) by the inputs, while the observability Gramian 
Q characterizes the state-to-output behavior, i.e. the degree to which the states are observ­
able at the outputs. A reduced-order model can, in principle, be obtained by eliminating 
(truncating) the states that are difficult to reach or observe. However, in the original state- 
space coordinates there might be states that are difficult to reach but easy to observe, and vice 
versa. The process of “balancing” is to transform the state vector into a new coordinate sys­
tem where for every state the degree of difficulty is the same for both reaching and observing 
it. There exists such a transformation Tx(t), which leads to a new model:
d(Tx(t))
dt
TAT-1 (Tx(t)) +  TBu(t) 
y(t) =  l t -1 (Tx(t))
(thus preserving the transfer function H (s)) and makes [16]:
(2.17)
P =  Q =  diag(a \, σ^.... ,  ση) (2.18)
where σ;, i =  1, . . . , n are known as the Hankel singular values (HSVs) of the model and 
are equal to the square roots of the eigenvalues of the product PQ (in any coordinate system 
of state-space), i.e. σ; = λ ;(P Q ),i =  1 , . . . , n. In the balanced model (2.17) the states
that are easier to reach and observe correspond to the largest HSVs, and if r of them are kept 
(truncating the n — r states corresponding to the smallest HSVs) it can be shown that the 
distance between the original and the reduced-order transfer functions is bounded as [15]:
1 |H (s) — H  (s) 11~ — 2(Or+1 +  0"r+2 +  ... +  ση ) (2.19)
The latter is an “a-priori” criterion for selecting the order of the reduced model for a desired 
output error tolerance ε, and is a significant advantage of BT-type methods for MOR. The 
main steps of BT are summarized in the following Algorithm 1 [17]:
Algorithm 1 MOR by Balanced Truncation
1: Solve the Lyapunov equations (2.16) to obtain the Gramian matrices P and Q 
2: Compute the eigenvalue decomposition of PQ, or equivalently the singular value de­
composition (SVD) of the product of the Cholesky factors P =  Z p Zp and Q =  Z qZ Q ,
i.e. Z PZ q =  ϋ Σ Υ
3: Compute the truncated part of the balancing transformations T(rxn)
and T ( n x r ) ZP U (nxr) Σ
—1/2 T
■^rxr) V(rxn)ZQ
(r1x/r2) , and the corresponding reduced-order model matrices as
T (rx n) AT(nx rL B =  T (rx n) B, LT
1
( n x r )
1
1
BT-type algorithms provide generally better approximations, since they have explicit er­
ror bounds. While this class of methods has not the same maturity like moment-matching 
algorithms, in the last decade have been adopted in circuit simulation problems. Recent 
works in [18-20] try to address the passivity preserving problem for reduced-order inter­
connect problems, by computing passive models through balancing and truncation methods. 
Moreover, similar to moment matching several approaches try to find a better approximation
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
10 Chapter 2. Model Order Reduction
in specific frequency ranges [21-23], where circuits usually work. However, recent trends 
try to address the computation of the system Gramians [24] which is the first step to perform 
the BT algorithm. For large scale problems of order 103 and larger, the Gramians can not be 
determined exactly, but they are only approximated. In the next chapter of this dissertation, 
we provide an extensive overview of computing reduced-order models with system-theoretic 
techniques and we describe a comprehensive methodology for efficient matrix solution algo­
rithms.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
11
Chapter 3
Frequency-Limited Reduction of 
Regular and Singular Circuit Models 
via Low-Rank Model Order 
Reduction
3.1 Introduction
Computer simulation of modern IC subsystems, like power grids, interconnect structures and 
substrate regions, possesses a fundamental role in the EDA industry. The electric models of 
the aforementioned systems are enormous, and their functional simulation requires solving 
systems of equations with dimension that can reach several millions or even billions. Model 
Order Reduction (MOR) techniques provide attractive ways to reduce these highly complex 
models, replacing them by models with much smaller internal dimensions whose behavior at 
the input/output ports approximates that of the original models.
As we stated previously, MOR methods are divided in two main categories. Moment 
matching techniques [2] are well established due to their computational efficiency to produce 
the reduced-order models for circuit simulation problems. Their drawback is that the size 
of the reduced models is based on an ad-hoc choice of the number of matching moments, 
since they do not provide an “a priori” metric for the accuracy of the reduced models. On 
the other hand, system theoretic techniques like Balanced Truncation (BT) [3] have very 
satisfactory and reliable bounds for the approximation error. However, BT techniques require 
the solution of Lyapunov matrix equations which are very expensive computationally, and 
also involve the storage of dense matrices even if the system matrices are sparse. To alleviate 
the computational cost and storage needs, low-rank solution methods such as the Extended 
Krylov Subspace (EKS) and Alternating Direction Implicit have been developed [25,26].
The majority of BT-type methods attempt to approximate the original model over the 
whole frequency range (from DC to infinity). In most practical applications, however, we 
are only interested in a specific finite frequency range. Frequency-weighted BT methods 
have been proposed in the past [27,28], where a user-specified frequency weighting function 
is introduced so as to obtain solutions of Lyapunov matrix equations that improve accuracy 
according to this particular function. The problem is that the specification of the weighting- 
function is not straightforward [29], and the user would rather give only the intended fre­
quency range as input to the BT method.
A different frequency-limited BT method has been proposed in the past in [30] (and 
recently rediscovered and analyzed in detail in [31]), in which only a frequency range needs to 
be specified instead of a vague frequency-weighted function. The purpose of this dissertation 
is to introduce the frequency-limited BT framework for the reduction of large circuit models, 
with its specific goals and contributions being to: (i) develop procedures for applying both the
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
12 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
EKS and the ADI method to the solution of frequency-limited Lyapunov equations (which 
are different than the standard Lyapunov equations), (ii) implement efficient computational 
choices by keeping sparse system matrices into their original forms (both for regular and 
singular circuit models), (iii) distill mathematical techniques into a complete algorithmic 
procedure, (iv) evaluate the potential of the frequency-limited BT approach in actual circuit 
benchmarks. In particular, experimental results demonstrate that frequency-limited BT can 
produce reduced-order models with either smaller size or superior accuracy compared to 
standard BT in a specific frequency range.
The rest of the chapter, is organized as follows. Section 3.2 describes previous work on 
existing MOR techniques developed for circuit simulation problems. Section 3.3 presents 
the theoretical background of BT for the reduction of regular and singular circuit models, 
and the low-rank approximate solution of Lyapunov equations along with the frequency- 
limited BT methodology. Section 3.4 presents our main contributions on the application of 
low-rank EKS to frequency-limited Lyapunov equations, as well as its efficient execution 
by sparse matrix manipulations (both for regular and singular circuit models), while Section 
3.5 presents the application of the ADI method to frequency-limited Lyapunov equations. 
Sections 3.6 and 3.7 describe the proposed complete procedures for the EKS and the ADI 
method respectively. Finally, Section 3.8 presents our experimental results.
3.2 Related Work
In this section, we briefly describe some previous works in the area of MOR techniques de­
veloped for circuit simulation problems. As mentioned before, mainly model order reduction 
methods have so far relied on moment matching and system theoretic techniques.
The moment matching techniques like [2] and the structure preserving alternate [7] per­
form moment matching by projecting the original system onto a Krylov subspace. Further­
more, the frequency weighting approach has been incorporated in these methods by multi­
point moment matching approaches [32], in order to achieve better accuracy in a specific 
frequency range. For this kind of methods, the cost associated with model derivation is pro­
portional to the number of ports, and the projection matrix can become dense and very large 
with the increasing number of ports. To address this challenge, research works introduce 
decentralized techniques like [33] or terminal reduction techniques [34, 35] before applying 
a standard moment matching method to a multi-port system. These algorithms exploit corre­
lations between different ports in order to merge them together. However, this is not always 
applicable to practical circuits and is usually an error-prone process. Despite the successful 
application of these methods in circuit simulation problems they do not yet provide any error 
control for the accuracy of the reduced models.
In contrast to moment matching techniques, system theoretic approaches like Balanced 
Truncation type methods [3,36] are capable of providing a global error bound, without having 
to compute the reduced-order models first. The adaptive choice of the order of the approxi­
mate models based on a predefined error bound has led research works to seek reduce models 
in frequency windows that circuits are usually work in order to produce smaller models or 
models that exhibit better accuracy. Based on this fact, several frequency weighted BT meth­
ods are developed in order to exploit this attribute [27,28, 37, 38]. While these methods 
provide better approximation the need of input and output weights which are usually not 
explicitly specified, hinders the applicability of these methods.
The approach in [39] bears a resemblance to the proposed method since a frequency- 
limited Gramian approach is used, where only the end frequencies are required. However, 
this method is inspired by the modified frequency-limited Gramians [62] which they usually
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.3. Background 13
do not exhibit fast eigenvalue decay [31] and they induce a significant computational cost 
making this method difficult to apply in large-scale circuit models.
Clearly, the potential of the frequency-limited Gramians that introduced in [30] has not 
yet been explored in the context of circuit simulation. The proposed methodology, alleviate 
the computational costs by applying state-of-the-art sparse numerical techniques in order 
to compute the frequency-limited Gramians making this method amenable to large circuit 
models. Moreover, the singular descriptor models are treated with sparse matrix operations, 
without introducing significant computational cost to the proposed methodology.
3.3 Background
3.3.1 MOR by Balanced Truncation for Circuit Simulation Problems
Consider the Modified Nodal Analysis (MNA) description of an n-node, m-branch (induc­
tive), p-input, and q-output RLC circuit in the time domain:
(  G T W - W T 0
v(t)
i (0 +
C
0
y (t)
M
Φ ι
v (t)
,i (t)
+  D u(t)
(3.1)
where G E R nxn (node conductance matrix), C E R nxn (node capacitance matrix), M E 
R mxm (branch inductance matrix), W E R nxm (node-to-branch incidence matrix), v E R n 
(vector of node voltages), i E R m (vector of inductive branch currents), u  E (vector 
of input excitations from current sources), Bi E R nx p (input-to-node connectivity matrix), 
y E (vector of output measurements), Li E Kqxn (node-to-output connectivity matrix), 
D E R qxp (input-to-output connectivity matrix). Without loss of generality, we assume 
in the above that any voltage sources have been transformed to Norton-equivalent current 
sources, and that all outputs are obtained at the nodes as node voltages. Further, we are 
denoting V(t) ξ  ^vl!) and i(t) ξ  .
If we now denote the model order as N  ξ  n +  m and the state vector as x(t) ξ  
and also:
A Ξ - ( W t W ) , E ξ  (C  M )  ■
B Ξ l01 , L Ξ  (L1 0)
then the expression (3.1) can be written in the following generalized state-space form, or 
so-called descriptor form:
E~ 1 T  = Ax(t) +  Bu(i) (3.2)
y(t) =  Lx(t) +  D u(t)
The objective of MOR is to produce a reduced-order model:
E ^ =  Ax Bu(*) (3.3)
y (t) =  Lx (t) +  D u(t)
where A,  E E R rxr, B E R r x p E R qxr , and in which both the order r < <  N  and the 
output error is bounded as ||y  (t) — y(t) ||2 <  e ||u (t) ||2 for given input u (t) and given small
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
14 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
ε. The bound in the output error can be equivalently written in the frequency domain, since 
in circuit simulation problems the interest is usually in the approximation of the frequency 
response, as | |y(s) — y(s) 1|2 <  ε| |u(s) 1|2 via Plancherel’s theorem [4]. If
H (s) =  L(sE — A )- 1 B +  D 
H  (s) =  L (sE — A )—1B +  D
are the transfer functions of the original and the reduced-order model, then the output error 
in the frequency domain is:
lly (s) — y (s ) ||2 =  ||H  (s)u(s) — H (s)u (s ) ||2 f_ 4)
<  l|H(s) — H (s ) ||w||u (s ) ||2 ( . )
where | | . || is the induced L2 matrix norm, or norm of a rational transfer function. 
Therefore, in order to bound the output error, we need to bound the distance between the 
transfer functions ||H (s) — H(s) ||TO <  ε.
Balanced Truncation (BT) and related methods for MOR make use of the controllability 
and observability Gramian matrices:
P exp( E—1 A i)E —1BBT E—Texp(E—1A t )Tdt
Q = exp(E 1A i)TLTLexp(E 1At)d t  
J 0
which are equivalently derived by the solution of the Lyapunov matrix equations [62]:
(E—1 A )P +  P (E —1 A )T =  —(E—1 B )(E—'1B)T 
(E—1A )T Q +  Q (E —1 A) =  —LT L
(3.5)
(3.6)
0
where we have assumed that the matrix E is nonsingular (we will present a treatment for the 
case of singular E in the next sub-section).
The controllability Gramian P characterizes the input-to-state behavior, i.e. the degree to 
which the states are controllable (reachable) by the inputs, while the observability Gramian Q 
characterizes the state-to-output behavior, i.e. the degree to which the states are observable 
at the outputs. A reduced-order model can, in principle, be obtained by eliminating (trun­
cating) the states that are difficult to reach or observe. However, in the original state-space 
coordinates there are states that are difficult to reach but easy to observe, and vice versa. The 
process of “balancing” is to transform the state vector into a new coordinate system where 
for every state the degree of difficulty is the same for both reaching and observing it. There 
exists such a transformation Tx(t), which leads to a new model:
TET—1 d(TdXt(t)) =  TAT—1 (Tx(t)) +  TBu(t) 
y(t) =  LT—1 (Tx(t)) +  D u(t)
(thus preserving the transfer function H(s)) and makes [62]:
P =  Q =  diag(a·1, σ2,...., σΝ) (3.8)
where σ , i =  1 , . . . ,  N  are known as the Hankel singular values (HSVs) of the model and 
are equal to the square roots of the eigenvalues of product PQ (in any coordinate system of 
state-space), i.e. σ  =  y /λ; (PQ), i =  1 , . . . ,  N. In the balanced model (8) the states that 
are easier to reach and observe correspond to the largest HSVs, and if r of them are kept 
(truncating the N  — r states corresponding to the smallest HSVs) it can be shown that the
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.3. Background 15
distance between the original and the reduced-order transfer functions is bounded as:
| |H (s) — H (s) 1 |TO <  2(or+l +  Hf+2 +  ··· +  σΝ) (3.9)
The latter is an “a-priori” criterion for selecting the order of the reduced model for a desired 
output error tolerance ε, and is a significant advantage of BT-type methods for MOR. The 
main steps of BT are summarized in Algorithm 2.
Algorithm 2 MOR by Balanced Truncation for descriptor systems 
1: Solve the Lyapunov equations (3.6) to obtain the Gramian matrices P and Q in low-rank 
format as described in Sections 3.4 and 3.5
2: Compute the eigenvalue decomposition of PQ, or equivalently the singular value de­
composition (SVD) of the product of the Cholesky factors P =  Z pZp and Q =  Z qZQ,
i.e. ZPZQ =  ϋ Σ ν
3: Compute the truncated part of the balancing transformations T(rxN) =  Σ —^) V(rxN) Z Q 
and T (Nxr) = ZpU(Nxr)Σ - ΧΓ), and the corresponding reduced-order model matrices
as E =  T (rxN) E T(NXr)' A = T (rxN) A T (NXr), B =
T(rxN) B  L =  LT ( rx N)
The main drawback of BT is the significant computational and memory cost for deriv­
ing the reduced model, which seriously hinders the applicability of BT for the reduction of 
large-scale models (with N  more than a few thousand states or so). That is because the solu­
tion of Lyapunov equations, the Cholesky factorization and the SVD are all computationally 
expensive tasks of complexity O( N 3), and also involve dense matrices since the Gramians 
P, Q are dense even if the system matrices E, A, B, L are sparse.
However, it is almost always the practical case that the number of inputs and outputs is much 
smaller than the number of states, i.e. p,q << N. This means that the products BBP and 
LpL will have low numerical rank compared to N, and this will also hold for the correspond­
ing Gramians [40], allowing their own approximation by low-rank products instead of full 
Cholesky factorizations, i.e. P ~  Zp Zp and Q ~  Z q ZQ with Zp, Z q £ R Nxk (k << N). 
This greatly reduces the memory requirements, as well as the complexity of the factoriza­
tion and SVD which are now of size k instead of full N, leaving the solution of Lyapunov 
equations as the main computational task of low-rank BT.
Two recent classes of algorithms that have been developed for directly solving the Lya­
punov equations in low-rank factorized form are the Alternating Direction Implicit (ADI) 
[26] and the projection-type or Krylov-subspace methods [41]. The ADI method exhibits 
fast convergence but requires the input of a number of shift parameters, whose choice greatly 
affects convergence but until recently relied on unclear heuristics and was very problem- 
dependent. On the other hand, projection-type methods do not depend on the selection of 
specific parameters and their algorithmic implementation is more straightforward and well- 
studied, having been successfully used for several years for the solution of conventional linear 
systems of equations. However, they generally have not been competitive with ADI meth­
ods, until the recent development of the extended Krylov subspace (EKS) method [25] which 
employs two complementary subspaces to radically speed up convergence [42]. In the next 
Sections we adopt both the EKS and ADI methods for the low-rank solution of Lyapunov 
equations arising in the reduction by BT of large-scale circuit models and provide details for 
efficient application in both cases.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
16 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
3.3.2 Handling of Singular Descriptor Models
In certain circuit simulation problems the matrix E might be singular. A method for dealing 
with such models is to compute spectral projections onto the left and right deflating sub­
spaces corresponding to the finite eigenvalues of the model [43], which is computationally 
prohibitive for large-scale systems. However, in circuit problems singular descriptor mod­
els typically result when there are some nodes, say n2, where no capacitance is connected, 
leading to corresponding all-zero rows and columns in the submatrix C (the submatrix M of 
inductive branches is always nonsingular if the circuit contains no voltage sources). If the n2 
nodes with no capacitance connection are enumerated last, and the remaining ni =  n — n2 
nodes first, then (3.1) can pe partitioned as follows:
f  G ii G 12 W A  f v i ( t) \
I G j  G22 W2 1 I V2 (t) I +
—w [  —WT 0 f  i(t)
fC i 0 0 \  f v i ( t ) \  f B i \
I 0 0 0 I I  v2(t) I =  I B2 1 u (t) (3.10)
0 0 M i ( t )  \  0
f v i (t)\
y(t) =  (Li L2 0 ) I v2(i) I +  D u(t)
i(t)
where G ii E R n  xni , G i2 E R ni xn2, G22 E R n2xn2, W i E R ni xm, W2 E R n2xm, 
Ci E R nixni, v i E R ni, v2 E R ni, Bi E R nixp, B2 E R n2xp, Li E R qxni, L2 E R qxn2.
Assuming now that the submatrix G22 is nonsingular (a sufficient condition for this is at least 
one resistive connection to ground at the n2 non-capacitive nodes), the second row of (3.10) 
can be solved for v2 (t) as follows:
V2(t) =  G- i B2u(t) — G — GT2 vi (t) — G22i W2 i(t) (3.11)
The above can be substituted to the first and third row of (3.10), as well as the output part of 
(3.10), to give:
(G ii — G i2 G—i G T2 )vi (t) +  (Wi — G i2 G—  W2)i(t) 
+ C i Vi(t) =  (Bi — G i2G22i B2 )u(t)
(WT G—2i G [2 — W [ )vi (t) +  W j G—2i W2 i(t) +  M i (t)
=  W j G22i B2u(t)
y(t) =  (Li — L2 G22i Gj2 )vi (t) — L2 G—2i W2i(t)
+  (L2 G22i B2 +  D )u (t)
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.3. Background 17
This can be put together in the following descriptor form:
G ii — G i2G 221G T2
W j G—1G j  — w j
'C l 0 ν ι ( ή
0 M V i(t)
Wi — G i2G—1 W2λ /v i  (t) 
W j G —1W2 i(t)
+ Bl 2  G 12G221B2 W j G -21 B2 u(t)
y(t) =  (L1 — L2G2—1 G j  L2G2-21W2) ( vi1((ti)))
+  (L2 G221B2 +  D )u (t)
(3.12)
The above is a nonsingular (or regular) state-space model which can be reduced normally by 
the BT method of Algorithm 2.
3.3.3 Balanced Truncation in Limited Frequency Intervals
By employing Parseval’s theorem in (3.5) we can obtain expressions for the Gramians in the 
frequency domain:
P 1
2π
(ίω ΐ — E—1A )—1E—1BBT E—T (ίω ΐ
1 to
Q =  (ίω ΐ — E—1 A )—H LT L(z^ I
2π  to
E—1A )—Ηάω 
E—1A )—1 άω
(3.13)
where (ίωΐ — E—1A )—1 is the Fourier transform of exp(E—1At).
If now the frequency interval is not taken as the entire (—to, to) but is restricted to a certain 
[—ω2, — ω 1 ] U [ω1, ω2], we obtain the frequency-limited Gramians:
Pω
1 — ω1
—  ( (ίω ΐ — E—1 A )— 1E—1BBT E—T (ίω ΐ — E—1 A )— Ηάω
2 π  ω2
+  2 (ίω ΐ — E—1A )—1E—1BBT E—T (ίω ΐ — E—1A )—Ηάω)
ω1
Q ω =  ^  (ίω ΐ  —E—1A )—ΗLTL(ίωI — E—1A )—1άω
2π  ω2
ω2
+  (ίω ΐ — E—1 A )— HLT L ^  — E—1 A )—1 άω)
(3.14)
Equivalently, Ρω and Q ω can be derived by the solution of the following modified Lyapunov 
equations [30,31]:
(E—^ ) Ρ ω +  Ρω (E—1A )T 
=  —(FE—1BBT E—T +  (FE—1BBT E—T )T)
(3.15)
(E—1A )T Q ω +  Q ω (E—1 A)
=  —((LT LF)T +  LT LF)
where 1 — ω1 ω2
F =  -  ( (ίω ΐ —E—1 A )—1άω +  (ίω ΐ — E—1 A )— 1άω) (3.16)
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
18 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
The above matrix integral can be evaluated as:
1 f  ω2 1 1
F =  Re( (ζ'ω ΐ  -  E-1 Α ) -1άω)
π  ωι
= R e ( - ln ( (  E-1 A +  ίω1ΐ ) -1 (E-1 A +  ιω2Ϊ ))) (3.17)
π
= Re( ~ ln (( A  +  ω  E) -1 (A +  ίω2Ε)))
where the matrix logarithm ln(Y) is the inverse of the matrix exponential, i.e. a matrix X 
such that exp(X ) = Y.
The frequency-limited Gramians characterize the controllability and observability of the 
model in the selected frequency range, and the process of balancing and truncation will elim­
inate states that are difficult to reach and observe inside this frequency range. This means 
that more states can be eliminated for a given tolerance in (3.9) (e.g. states that are easy to 
reach/observe in other frequencies and which would not have been eliminated otherwise), 
leading to lower order r in the reduced model, or alternatively to lower error in the frequency 
range for a given order r .
3.4 Low-rank EKS method for frequency-limited Lyapunov equa­
tions
3.4.1 Extended Krylov Subspace Method for Solving Lyapunov Equations
The essence of low-rank projection-type methods for solving the large-scale Lyapunov equa­
tions (3.6) is to iteratively project them onto a lower-dimensional subspace, and then solve 
the resulting small-scale equations to obtain the low-rank approximate solutions of (3.6). 
The dimension of the projection subspace is increased in every iteration until convergence is 
achieved. More specifically, if K £ K Nxk (k < <  N ) is a projection matrix whose columns 
span the k-dimensional Krylov subspace:
Kk(A e, Be ) =  span{Be, A eBe, A| Be, . . . ,  a | - 1Be}
where
AE =  E- 1A, Be =  E- 1 B
then the projected Lyapunov equation (for the controllability Gramian P) onto Kk (Ae, Be) 
is:
(KT AE K)X +  X(KT AE K )t =  - K T Be b T K (3.18)
(the same things hold for the observability Gramian Q with Ag, LT in place of A e, Be). 
The solution X £ R kxk of (3.18) can be back-projected to the N-dimensional space to give 
an approximate solution P =  KXK for the original large-scale equation (3.6), and a low- 
rank factor Z £ R Nxk of P can be obtained as Z =  KS where X =  SST is the Cholesky 
factorization of X.
The projection process is independent of the subspace selection, but its effectiveness is 
critically dependent on the chosen subspace and can sometimes take many iterations of sub­
space updating before converging to the final solution. The convergence can be accelerated 
by enriching the standard Krylov subspace Kk (Ae, Be ) with information from the subspace 
Kk (A- 1, Be ) corresponding to the inverse matrix A - 1, leading to the extended Krylov sub­
space:
K  (Ae , Be ) =  Kk (Ae, Be ) +  Kk (A- 1, Be ) =
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.4. Low-rank EKS method for frequency-limited Lyapunov equations 19
span{BE, AE1 Be, AeBe, A e2Be, A eBe, . . . ,  (3.19)
A e (k-1) Be, a E-1Be }
The extended Krylov subspace (EKS) method starts by the pair {Be, A- 1 Be} and generates 
a sequence of extended subspaces (Ae, Be) of increasing dimensions, solving the pro­
jected Lyapunov equation (3.18) in each iteration, until a sufficiently accurate approximation 
of the solution of (3.6) is obtained. The complete EKS method is given in Algorithm 3.
Algorithm 3 Extended Krylov Subspace method for low-rank solution of Lyapunov equa­
tions that arise in descriptor systems
Input: A e ξ  E- 1A, Be ξ  E- 1 B (or AE, LE)
Output: Z such that P ~  ZZE ,
1 j =  1; p =  size_col(BE);
2 K (j) =  Orth ([Be , AE1 Be ])
3 repeat
4 M =  K (j)E A e K (j); R =  K (j)E b e
5 Solve MX +  XM e =  - RR e for X e  R 2pjx2pj
6 k1 =  2p(j -  1); k2 =  k1 +  p; k3 =  2pj
7 K1 =  [Ae K (j) (:, h  +  1 : k2), A - 1K (j) (:, k2 +  1 : k3)]
8 K2 =  Orth (K1) w.r.t K(j)
9 K3 =  Orth (K2)
10 K (j+1) =  [K(j), K3 ]
11 j = j + 1
12 until convergence
13 S =  Chol(X)
14 Z =  K (j) S
Some details on the efficient implementation of the EKS method of Algorithm 3 are as 
follows:
Sparse matrix inputs
The inputs to Algorithm 3 are not actually A e ξ  E- 1A or A E ξ  (E- 1 A )e but the sparse 
system matrices A, E or A T, ET, since the (generally dense) inverse matrices are only needed 
in products with p vectors (initially and in step 2) and 2pj vectors (in steps 4 and 7 at every 
iteration, where the iteration count j  is typically very small and thus 2pj < <  N). These can 
be implemented as sparse linear solves EY =  R and AY =  R (or EeY =  R, A EY =  R) by 
any sparse direct or iterative algorithm like [44] or [45].
Orthogonalization in steps 2 and 9
A modified Gram-Schmidt procedure [46] is employed to implement the corresponding (Orth ()) 
procedures.
Solution of small-scale Lyapunov equation in step 5
A direct Schur decomposition method [47] can be employed for the solution of the small- 
scale (2pj x 2pj) Lyapunov equation in each iteration of Algorithm 3.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
20 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
Orthogonalization in step 8
In order to perform orthogonalization with respect to matrix K (j) we employ the following 
Gram-Schmidt procedure [46]: 
for ki =  1 , . . j  do
k2 = (ki -  1) * 2p; k3 = ki * 2p;
T =  K (j)T((:,k2 +  1: ka))Ki 
K2 =  K1 -  K (j) (:, k2 +  1 : ks)T 
end for
Convergence criterion
An appropriate stopping criterion is the residual of (3.6) with the approximate solution P =T
KXKT to reach a certain threshold in magnitude, i.e.
A tolerance of tol 
of the solution.
| |A e K (j ) XK(j )T +  K (j) XK(j)TAE +  Be BT || ^
I|Beb e || -  to
(3.20)
10 10 is typically sufficient in practice to acquire a good approximation
Lower rank solution
The solution Z obtained after the termination of Algorithm 3 has rank 2pj, where j  is the final 
iteration count. This can be reduced even further by employing in step 13 the eigendecom- 
position X =  WAW instead of the Cholesky factorization X =  SS , for the solution X 
of the final projected Lyapunov equation. By keeping only the k eigenvalues above a certain
threshold (a fair choice of threshold is 10- 12), along with the corresponding eigenvectors, a1
factor Z of P with lower rank k <  2pj can be obtained as Z =  KW(2pjXk)Λ (\χ^ .
3.4.2 Application of EKS Method to Frequency-Limited Lyapunov Equations
The frequency-limited Lyapunov equations (3.15) differ from the standard Lyapunov equa­
tions (3.6) in the right-hand-sides (RHS) which do not have the forms - B eBE (with Be ξ  
E-1 B) or - LTL. The EKS algorithm presented in [31] modified the step 5 of Algorithm 3 
to solve a small-scale frequency-limited Lyapunov equation, which is a projected version of 
the large-scale equations (3.15) onto a lower-dimensional subspace. However, the projection 
subspace is still the extended Krylov subspace (3.19) related to the RHS of the standard Lya­
punov equations, which can render the projection procedure ineffective. Here, we show that 
it is possible to write the RHS of (3.15) in a factorized form similar to - B eBe or - L TL, so 
that Algorithm 3 can be invoked with the corresponding factor and create the proper Krylov 
subspace. Specifically, observe that the RHS of (3.15) can be written as:
- ( F E -1 BBt E-T +  (FE-1 BBt E-T )t )
(E- 1B)t 
(FE- 1B )T
-  ((LT LF)T +  LT LF)
E-1 B FE-1B1 0 p I pIp 0 p
-  (LT (LF)T V  (  LIq 0q LF
0q
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.4. Low-rank EKS method for frequency-limited Lyapunov equations 21
(where Ip, Iq,0p, 0q are the p x  p and q x q identity and zero matrices respectively), i.e. in 
factorized —BωJB^ and — JLω forms with
Βω =  (E—1B FE—XB)
L l  =  (LT (LF)T) (3.21)
J = ( °  p 1 p
J Ip 0p
By entering Βω G R nx2p or L)^  G Knx2q instead of Be G R nx p or LT G R nxq in Algorithm 
3, the proper Krylov subspaces related to the actual RHS of (3.15) are created, and the only 
modification required is the RHS of the small-scale equation in step 5 to become — R JR T. 
Note that the Krylov subspace spanned by the columns of projection matrix K is not affected 
by the presence of matrix J, since J is symmetric and thus diagonal up to an orthogonal sim-
ό  o n = i  ο  —u  - v n i  i  i n
ilarity transformation [48,49] - specifically
To construct the Bω and L ,^ inputs of Algorithm 3 we need to compute the matrix loga­
rithm (3.17), which is generally an intensive computational procedure. However, the matrix 
logarithms are not explicitly needed in (3.21), but only their effects in pre-multiplying p vec­
tors (1n(Y)E—1B) or post-multiplying q vectors (Lln(Y) =  (1n(Y)HLT)H) are required. 
These can be efficiently computed by a similar EKS algorithm for evaluating matrix func­
tions [51], which iteratively creates an extended Krylov subspace related to the matrix input 
and the multiplied vectors:
KE (Y, B) =  Kk (Y, B) +  Kk (Y—1, B) =
span{B, Y—1B, YB, Y—2B, Y2B ,. . . ,  Y —(k—1)B, Yk—1B} =  
span{B, (A +  ίω2Ε )—1 (A +  ίω1 E )B ,. . . ,
(A +  ω  E )—k+1 (A +  ίω2 E)k—1B} 
and whose implementation is shown in Algorithm 4.
Note that in steps 4 and 6 the operations YK(j) and Y—1 K (j) can be written as:
YK(j) =  (A +  ίω 1 E )—1 (A +  ίω2 E)K (j) =
(A +  ίω1 E )—1(A +  ίω1 E — ίω 1 E +  ίω2 E)K (j) =
K (j) +  ί(ω 2 — ω 1 )(A  +  ίω 1 E )—1 EK(j
Y—1 K (j) =  (A +  ίω2 E )—1 (A +  ίω1 E)K (j) =
K (j) +  ί(ω 1 — W2)(A  +  ίω2 E )—1 EK(j
which involve a sparse product of 2pj vectors with E followed by a complex sparse linear 
solve with A +  ί ω ^  or A +  ίω2E. Also, a standard inverse scaling and squaring algorithm 
[47] is used for the small-scale computation of matrix logarithm times p vectors in step 5.
3.4.3 Sparse Implementation for Singular Descriptor Models
For the reduction in limited frequency intervals of the model (3.12) that results from the 
regularization of a singular descriptor model, the execution of Algorithms 3 and 4 is com­
putationally inefficient because the inversion of G22 renders the matrices dense and hinders
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
22 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
Algorithm 4 Extended Krylov subspace method for matrix logarithm multiplying p vectors
B___________________________________________________________________________
Input: Y, B, Convergence tolerance e
Output: v =  ln(Y)B
1: j  =  1;
2: K (j) =  Orth([B,Y- 1 B]);
3: repeat
4: X =  K (j)T YK(j); R =  K (j)T B
5: Compute y =  ln(X)R
6: ki =  2p(j -  1); k2 = ki +  p; k3 =  2pj
7: Ki =  [YK(j) (:,ki +  1 : k2); Y- 1 K (j)(:,k2 +  1 : k3)]
8: K2 =  Orth (K1) w.r.t K (j)
9: K3 =  Orth (k  2)
10: K (j+1) =  [K(j), K3]
11: j  =  j  +  1
12: until ||y - j , ^  <  elly(j) II
13: v =  K (j)y
the solution procedure. In this section we present efficient ways to implement the EKS algo­
rithms by keeping the system matrices in their original sparse forms.
Construction of RHS
The input-to-state and state-to-output connectivity matrices
/ B1 — G i 2G 221b 2\  l t _  — G 12G - 1 Lj
W j G —21 B2 , _  W j G—21 L j
(3.22)
are constructed explicitly to compute the Βω and Lj  inputs of Algorithm 3 from (3.21), 
1 1where the products G-  B2 and G—2 L2 are computed by p and q sparse linear solves respec­
tively.
Sparse linear system solutions
The system matrix
. G 11 — G12G-21GT2 W 1 — G 12G —21 W2
W j G- 1 G jT2 — W j W j G - 1 W2
(3.23)
of model (3.12) is rendered dense due to the inversion of G22. The linear system solutions 
with A (or A T), E, and A +  ω Ε  in steps 2, 4, 7 of Algorithms 3 and 4 can be handled by
partitioning the RHS of these systems conformally to A, i.e. R =  ^ R^  with R1 £ R n1x2pj,
R2 £ R mx2pj (or R1 £ R n1x2qj, R2 £ R mx2qj), and implementing their solution efficiently
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.5. Modified ADI method for solving frequency-limited Lyapunov equations 23
by keeping sub-blocks in their original sparse form as follows:
AX =  R = ^
/ - G n  W i - C M  [ Xi \  f R i \
I W [ 0 w 2T I I X21 =  I R21
- G T  W2 G22 Y 0
Ci 0 
- 0  M
EX
Xi
X2
R
Ri
R2
(A +  z'^E)X =  R = ^
[ - G i i  +  iwCi - W i  - G 12\  [ X i \  [R A
I w [  ω Μ  W j I I X2 I =  I R2 I
G 12 W2 G22 Y 0
(3.24)
where Y G R n2x2pj (or Y G R n2 x2qj) is a temporary sub-matrix.
A comparable approach for the reduction of power systems via the low-rank ADI method 
was presented in [52], but is more complicated computationally and is applicable to RC-only 
circuits.
Sparse matrix-vector products
The matrix-vector products with K (j) in steps 4 and 7 of Algorithm 3 can be implemented 
efficiently by observing that:
A -G ii - W i  G i2 G- i  G j
W j 0 +  -  W j G 22i G 1T2
W i
0 + G22
i
G i2 G22i W2 
_w J  G - 1 W2
G T2 W2)
(3.25)
Thus the product A K (j) with the 2pj vectors K (j) can be carried out by a sparse solve
G22X =
- G Ti2 )  
W2T
( G
X.
-  W 2 K ( j ) , followed by a sum of products - G i i  - W i
W iT
K (j) +0
Construction of system matrix
The dense system matrix (3.23) to be reduced needs sparse solves in the submatrix G22 
for its construction. Since it is usually n2 <<  ni, it is better to compute first the left- 
solves G ^ G - ,1 and Wj G—i , followed by products with G j2 and W2. The left-solves can be 
performed as G22X =  G i2 and G 22 X =  Wj  where X contains the rows of each left-solve.
3.5 Modified ADI method for solving frequency-limited Lyapunov 
equations
Besides the Lyapunov equation of (3.6) one can work on the following generalized Lyapunov 
matrix equations [62]:
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
24 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
APET +  EPAT =  -B B T,
T T T (3.26)
A T Q ; E +  ET Q ; A =  —LT L
where a post-processing step is needed to obtain the observability Gramian as
Q =  ET Q ; E
Equivalently to the generalized Lyapunov equations, Ρω and Q ω can be derived by the solu­
tion of the following modified generalized Lyapunov equations [30,31]:
APω ET +  EPω A T =  —(EFBBT +  (EFBBT )T)
(3.27)
AT Qω E +  ET Qω A =  — ((LT LFE)T +  LT LFE)
where a similar post-processing step is needed to obtain the observability Gramian as
Qω =  ET Qω E
For the modified generalized Lyapunov equations (3.27) we have to deal again with the differ­
ent RHS, which is in the forms — φ ωBT +  BBi;) and — (C j ,C +  LTLω) (where Bω ξ  EFB 
and Lω ξ  LFE) instead of the standard forms —BBT and —LTL of (3.6). This requires the 
modification of the standard ADI method presented in [26], which we describe in this section.
Observe that the RHS of the frequency-limited generalized Lyapunov equations can be 
similarly written as:
— ^ ω  BT +  BBj0)
— ^ τω L +  LT Lω)
B Bω;
LT LT
0p Ip
Ip 0p
0q Iq
Iq 0q
BT
Bω
L
Lω
(3.28)
(where Ip and Iq are the p x p and q x q identity matrices and 0p, 0q are the corresponding 
zero matrices).
This permits the use of an LD LT variant of ADI proposed in [48], which expects the RHS to 
be in the form of — SRST and gives the solution of Lyapunov equations in the form LDLT, 
where L is the low rank factor and D is a block diagonal matrix. This variant of ADI is given 
in Algorithm 3 and we can use it to solve (3.15) by entering respectively:
S Ξ  (B Bω) =  (B E FB ), R =  0^ 0p
S ξ  (Lt Lω) =  (LT (LFE)T) ,  R =  ^  0^)
Since the output of Algorithm 5 is obtained in the form of LDLT , we must transform it in the 
form of ZZ to be able to use it in the BT Algorithm 2. To do that, observe that with I ξ  I p 
or Iq the matrix R of (3.29) can be written as:
2R 0 II 02 I —II 0I 0I I I I
and thus the block diagonal matrix D =  —2 W kdzag(Re(pi)R,. . . ,  Re(py)R)) can be writ-
Tten in factorized form as D =  M M T where:
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.6. Complete procedure with the EKS method 25
M
Λ
V 
/
V
-Re ( ^ ) I  V  0 S
y / - R e ( p i) I 0 I I0 iI - I  I 
0
0
I I
~Re(hj) I I
0
) I 0
j) 0 iI
\
I 0 )
0 i I ) \  
\
-  I ) /
(3.30)
0
(with i being the imaginary unit). In this way, we can write the solution of the frequency- 
limited Lyapunov equations (3.15) in the form LDLT =  LM (LM )T, and employ them 
normally in Algorithm 2.
The ADI method depends crucially on the choice of shift parameters μj (complex num­
bers in general), for which an efficient adaptive strategy has been proposed in [50]. The ADI 
algorithm also involves the solution of p (or q) sparse linear systems in the matrix (A +  μjE) 
in each iteration j , which can be solved by any sparse direct or iterative method.
Algorithm 5 ADI method for solving Lyapunov equations of descriptor systems in low-rank
LDLT form__________________________________________________________________
Input: E, A (or ET, A T), S, R, ADI-shifts μι, μ2, . . ., Convergence tolerance e 
Output: L, D such that P ~  LDLT (or Q «  LDLT)
1: L =  []; K (0) =  S; j  = I
2: while ||K (j-l)R K (j-l)T || >  e ||S R S T|| do
3: if j  =  I then
4: Solve (A +  μjE)X =  K (j-l) for X
5: else
6: Solve (A +  μjE)X =  EK (j-l) for X
7: end if
8: if μj E K then
9: K (j) =  K (j-l) -  2μjX
10: L =  [L; X]
11: else
12: η =  λ/ 2 ; 3j =  R e ^ j ) / I m ^ j );
13: K (j+l) =  K (j-l) -  4Re(μj )(Re(X) +  3jIm(X))
14: L =  [L; η (Re(X) +  3jIm(X)); η +  1 Im(X)]
15: j  =  j  +  1
16: end if
17: j  =  j  +  1
18: end while
19: D =  - 2  blkdiag (Re ^ i ) R , . . . ,  R e ^ j )R))
3.6 Complete procedure with the EKS method
For given model matrices E, A, B, L and given frequencies ωι, ω2, the complete procedure 
for frequency-limited BT of large-scale models is as follows:
• Evaluate the quantities FE-1B and LF of (3.21), with F being the matrix logarithm of 
(3.17), through the EKS method of Algorithm 4.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
26 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
• Compute the low-rank factors Zp and Z q of the frequency-limited Gramians through 
the EKS method of Algorithm 3, by inserting the matrices Βω, of (3.21) instead of
Be , LT.
• Execute steps 2 and 3 of the BT Algorithm 2.
It must be noted that there is no guarantee that the reduced-order models by frequency- 
limited BT preserve passivity or stability. However, the focus of MOR in recent years has 
been shifted from provably passive models to passivity enforcement after efficient reduc­
tion. A wealth of passivity enforcement techniques such as [53] have been developed, which 
can be used to assure passivity (and also stability) of the reduced-order models obtained by 
frequency-limited BT.
3.7 Complete procedure with the ADI method
For given model matrices E, A, Β, L and given frequencies ω ι, ω2, the complete procedure 
for frequency-limited BT of large-scale models is as follows:
• Evaluate the Βω and 0!ω parts of the RHS of frequency-limited Lyapunov equations 
from (3.21), through the extended Krylov subspace method of Algorithm 4.
• Solve the frequency-limited Lyapunov equations (3.15) in low-rank LDL form, through 
the modified ADI method of Algorithm 5 by inserting the matrices S and R of (3.29).
• Compute the low-rank factors of the frequency-limited Gramians Ρω and Q ω as Zp =  
LM and Z Q =  ETLM, where L is the output of Algorithm 3 and M is the matrix of 
(3.30), and then execute steps 2 and 3 of the BT Algorithm 2.
3.8 Experimental Results
Table 3.1: Reduction results of frequency-limited BT vs standard BT for 
various circuit benchmarks.
Ckt #nodes #ports
Standard BT Frequency-limited BT
ROM order Max error Times (s) ROM order for same error
Reduction
percentage
Max error 
for same order Times(s)
MNA_1 578 9 102 6.4142e-04 0.21 85 16.66% 2.7648e-09 0.57
MNA_2 9223 18 480 3.4431e-04 404.89 411 14.37% 7.5592e-06 437.99
MNA_3 4863 22 415 1.3421e-04 136.49 368 11.32% 1.8127e-07 213.64
MNA_4 980 4 122 5.3913e-04 1.24 100 18.03% 7.5183e-11 1.44
MNA_5 10913 9 135 8.5454e-06 947.36 102 24.44% 1.5469e-11 1047.26
TL 3253 22 140 9.5354e-05 10.50 103 26.42% 2.4554e-11 9.34
LNA 29885 79 432 6.4324e-05 2041.84 401 7.17% 1.5314e-10 2171.70
MX3 867 110 133 1.1610e-05 1.77 123 7.51% 8.4520e-10 1.85
MX7 133 66 70 2.5481e-07 0.13 66 5.71% 1.2960e-11 0.15
IS 16862 646 1268 7.9140e-04 1295.44 1136 10.41% 4.6970e-08 1842.14
PG1 100000 370 2554 1.9454e-06 6878.21 2117 17.11% 5.5470e-09 7146.87
PG2 110000 410 2832 1.3240e-06 7612.87 2431 14.15% 4.4791e-09 7853.56
For the experimental evaluation of the proposed methodology we have used the available 
MNA benchmarks in SLICOT [54] and SparseRC [55], as well as two custom large-scale 
power grids. Their characteristics are shown in the first three columns of Table 3.1, where 
the SLICOT benchmarks are MNA_1 to MNA_5, the two power grids are PG1 and PG2, and 
the SparseRC benchmarks are a transmission line (TL), a low noise amplifier (LNA), two 
mixers (MX3, MX7) and an interconnect structure (IS).
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.8. Experimental Results 27
Fia}myRtBp(niatpcii(22l(fMNA_4
ftq[Hz]
RejmyltepiK a ja.t(3,2) (fMNA_4
fcqp]
Figure 3.1: Comparison of transfer functions of ROMs from standard BT 
and frequency-limited BT in MNA_4 benchmark at ports (2,2) and (3,2).
The frequency-limited BT was implemented with the procedure of Section 3.6 for a fre­
quency range of [ωι, ω2] =  [103, 1010] (purely for testing purposes - one can choose any 
other range depending on the application), and was compared with the standard (infinite fre­
quency) BT. The reduced-order models (ROMs) of standard and frequency-limited BT were 
compared both with respect to their order for given tolerance ε, and w.r.t. their accuracy for 
given ROM order. In the first case the error tolerance was chosen as ε = 10-4 , while in the 
second case the order r was determined by the execution of standard BT and was reused for 
the truncation of the HSVs of frequency-limited Gramians. All experiments were executed 
with MATLAB R2015a on a Linux workstation, having a 3.6GHz Intel Core i7 processor 
with 16GB memory.
The results are reported in the remaining columns of Table 3.1. In the table, Max error 
refers to the maximum error between the transfer functions of the original model and the 
ROM in the selected frequency range, Time refers to the computational time (in seconds) 
needed to generate the ROMs, while Reduction percentage refer to the reduction percentage 
of the ROMs between the standard BT and the frequency-limited BT. From the table it can be 
clearly verified that frequency-limited BT can produce ROMs that exhibit either smaller size 
for given error, or smaller error for given order in comparison to standard BT when restricted
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
28 Chapter 3. Frequency-Limited Reduction o f Regular and Singular Circuit Models viaLow-Rank Model Order Reduction
in a finite frequency range. The time needed to generate the frequency-limited ROMs is 
slightly larger than standard BT because the computation of the matrix logarithm incurs an 
additional overhead, but the EKS method of Algorithm 4 can effectively mask this overhead 
to a substantial extent and also makes the procedure applicable to very large circuit models.
Especially for the experiments comparing ROM accuracy for given order, we have plotted 
in Fig. 3.1 and Fig 3.2 the transfer functions of the original model and the ROMs generated 
by frequency-limited BT and standard BT for two benchmarks and two and three ports per 
benchmark. In both figures, the response of the frequency-limited ROM is indistinguishable 
from the original model in the selected frequency range, while the response of the ROM from 
standard BT can be seen to exhibit a clear deviation.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
3.8. Experimental Results 29
F re q u e n c y  R e s p o n s e  at p o r t(1 ,3 )  o f  P G 2
f re q  [H z]
F re q u e n c y  R e sp o n se  a t  p o r t(4 ,2 )  o f  P G 2
f re q  [H z]
F re q u e n c y  R e sp o n se  a t  p o r t(4 ,2 )  o f  P G 2
f re q  [H z]
Figure 3.2: Comparison of transfer functions of ROMs from standard BT 
and frequency-limited BT in PG2 benchmark at ports (1,3), (3,2) and (4,2).
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
31
Chapter 4
Efficient IC Hotspot Thermal 
Analysis via Low-Rank Model Order 
Reduction
4.1 Introduction
Temperature hotspots are an inevitable consequence of the rising power consumption due 
to the technology downscaling and its nonuniform distribution across the chip. Local tem­
perature gradients have significant impact on chip performance and functionality, leading to 
slower transistor speed, more leakage power, higher interconnect resistance and reduced reli­
ability. Stacking multiple layers in a 3D chip requires extensive thermal analysis as the power 
density and temperature of these architectures can be quite high. Moreover, the problem be­
comes more pronounced due to the use of new device technologies, like FinFETs and Silicon 
on Insulator (SOI), that are more sensitive to the self-heating effect [56]. Management of the 
above remains a key issue for future microprocessors and ICs [57].
Therefore, thermal analysis and especially hotspot analysis is one of the most critical 
challenges arising from the technological evolution. The continuous effort for smaller sizes, 
in sub 45-nm era, and greater performance as well as the new 3D structures have begun to 
outspace the ability of heat sinks to dissipate the on-chip power. Due to this fact, ICs ther­
mal analysis problems have drawn considerable attention over the past two decades. To 
deal with the thermal analysis challenge, most previous methodologies have focused on 
solving linear systems of equations, that result from modeling approaches such as the Fi­
nite Element Method (FEM), the Finite Difference Method (FDM) and the Green’s Function 
method [58-60]. The huge linear systems resulting from thermal modeling approaches re­
quire unreasonably long computational times. While the formulation problem, by applying 
a thermal equivalent circuit, is prevalent and can be easily constructed, the corresponding 
3D equations network has an undesirably time consuming numerical simulation over many 
time-steps.
Previous methodologies, propose efficient solution algorithms for the entire systems of 
equations that results from any thermal modeling approach but those techniques do not scale 
well with the equations dimension and can only be applied to a narrow range of problems. 
However, in many cases, thermal analysis does not need to be performed across the whole 
chip, but only at some pre-specified hotspots, in order to validate the thermal reliability of the 
chip or can be useful in addressing the performance of individual devices. In those cases, the 
very large thermal model can be substituted by a much smaller model with similar behavior 
at the hotspot points, through a Model Order Reduction (MOR) process.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
32Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
Moment-matching methods have been reasonably successful in circuit simulation prob­
lems to produce reduced order models in a computationally efficient way [2], but they exhibit 
some drawbacks, the most prominent of which is that they do not offer any a-priori estimation 
for the approximation error. This can result in reduced models that are not very accurate or 
of sufficient small order. On the other hand, balancing-type methods (in particular Balanced 
Truncation [61]) rely on rigorous system theoretic concepts and are capable of providing 
a global error bound, without having to compute the reduced model first. Thus, models 
obtained by balancing are generally superior to those obtained by moment-matching [62]. 
However, they have greater computational complexity than moment-matching methods due 
to the need for solution of Lyapunov matrix equations which take up O(n3) time.
In this Section we propose an approach for hotspot thermal analysis that is based on 
RC equivalent circuit in conjunction with Balanced Truncation for MOR [63, 64], which 
overcomes the high computational demands by using an Extended Krylov Subspace (EKS) 
method along with low-rank factorized storage, for solving the Lyapunov matrix equations. 
The EKS method is a state-of-the-art projection type method which exhibits fast convergence 
and straightforward implementation.
In particular, the contributions to the problem of thermal analysis are:
• Compact modeling of hotspots. The complete equation network is transformed into 
a reduced model that captures heat flow only in the hotspots, which is sufficient for 
validating the thermal compliance of the circuit.
• Accurate results within a predefined error bound. System theoretic techniques have 
exact error bound formula allowing to trade off the accuracy and the order of the re­
duced model.
• Efficient derivation of the reduced model. The use of low-rank EKS method method 
for solving the Lyapunov matrix equations require small computational times and re­
duced memory storage.
• Reusable models for different dissipation scenaria. The reduced model can be com­
puted once but it can be reused with different input sequences that define the heat flow 
in the IC, and can be solved many times preserving the same error bound.
Experimental results demonstrate around 97% model size reduction as compared to the 
full 3D network model, with approximation in the order of 10-4 .
The rest of the Chapter, is organized as follows. Section 4.2 describes previous work 
on thermal simulation problem. Section 4.3 introduces the thermal model that was used in 
the present work. Section 4.4 provides a detailed description of MOR, and more specifically 
on the theoretical background of low-rank Balanced Truncation and EKS methods. Section 
4.5 describes the proposed approach, combining the methods presented in the two previous 
sections. Moreover, in Sections 4.6, 4.7, and 4.8 we provide an extension of the proposed 
methodology for limited-time intervals along with efficient computational of the RHS that 
arise in the particular problem. Finally, Section 4.9 presents the results and a discussion 
about the advantages of the method.
4.2 Related Work
In this section, we briefly describe some previous works in the area of thermal analysis. As 
mentioned before most transient thermal analysis methodologies have so far relied on solving 
of the entire system, using different modeling techniques, based mainly in FEM, the FDM 
and the Green's Functions.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.3. On-Chip Thermal Modeling 33
Research work in [65] adopts the FDM method, with a multigrid approach in order to 
speed up the simulation process and the FDM method with temporal and spatial adaptation 
to further accelerate thermal analysis is proposed in [66]. Similarly, in [67], the full-chip 
thermal transient equations are solved in a similar manner using an Alternating Direction Im­
plicit (ADI) method for enhanced computational efficiency. Also, in [69] the FDM approach 
and the RC equivalent is used along with modeling of the fluids for microcooling 3D struc­
tures. Moreover, parallel approaches with general [70] or dedicated [71,72] precoditioners 
was proposed to map the thermal analysis problem in GPUs. In [59] the FEM method is 
adopted for 2D and 3D geometries along with a multigrid preconditioning method and auto­
matic mesh generation for chip geometries. Finally Green's functions, are used in [60] with 
discrete cosine transform and its inversion in order to accelerate the numerical computation 
of the homogeneous and inhomogeneous solution. However, these methods are efficient for 
limited range of problems, and with the escalation of manufacturing technology can lead to 
huge systems of equations.
Besides the previous conventional approaches different methods like a Neural Net (NN) 
approach is used in [73], but since it is based in predictions it does not always provide accu­
rate solution to the crucial problem of thermal analysis. Moreover, a Look Up Table (LUT) 
method based on the power thermal relation, which develops a double-mesh scheme to cap­
ture thermal characteristics and store the results in library files is presented in [74]. However 
nowadays chips can lead to huge library files due to the highly complex combined heat maps.
The approach in [75], bears a resemblance to the proposed method since a MOR method 
with moment-matching method is used along with FDM modeling. However, this technique 
considers only input currents as ports and, as mentioned before, moment-matching tech­
niques do not always provide compact and accurate models. Another moment matching 
MOR method is described in [76], but it is applicable on the architectural level.
Clearly, the concept of a low-rank system theoretic technique has not yet been introduced 
in the context of transient thermal analysis. A balancing-type approach, becomes more at­
tractive even for large scale systems with the recent results on low rank approximations that 
boost the solution of Lyapunov equations, which are the bottleneck of the method.
4.3 On-Chip Thermal Modeling
The primary mechanism of heat transfer in solids is by conduction. The starting point for 
thermal analysis is Fourier’s law of heat conduction [77]:
q(r, t) =  - k t V T(r, t) (4.1)
which states that the vector of heat flux density q (heat flow per unit area and unit time) is 
proportional to the negative gradient of temperature T at every spacial point r =  [x, y, z]T 
and time t, where kt is the thermal conductivity of the material.
The conservation of energy also states that the divergence of the heat flux q equals the dif­
ference between the power generated by external heat sources and the rate of change of 
temperature, i.e.
V · q (r , t) =  g (r t) -  pcp dT^  ^  (4·2)
where g(r, t) is the power density of the heat sources, Cp is the specific heat capacity of the 
material, and ρ is the density of the material. By combining (4.1) and (4.2) we have:
-k t  V 2T(r, t) =  g(r, t) -  pcp
dT(r, t)
dt
(4.3)
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
34Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
which may be rewritten as the following parabolic Partial Differential Equation (PDE):
=  kt (
d2T(r, t) + d2T(r, t) 
dx2 dy2
kt V 2T(r, t) +  g(r, t)
+
d2 T( r, t) 
dz2 ) +  g (r, t)
(4.4)
(normally accompanied by appropriate boundary conditions [78]).
A common procedure for the numerical solution of (4.4) is by discretization along the 
3 spatial coordinates with steps Δχ, Δy  and Δζ, and substitution of the spatial second-order 
derivatives by finite difference approximations, leading to the following expression for tem­
perature Tj,j,k at each discrete point (i, j, k) in relation to its neighboring points:
dTi,j,k , Ti+l,j,k — 2Ti,j,k + Ti-1,j,k
pCr ^ T  = kt------------ΔΧ" --------
+ k r
Tii,j+1,k -  2 T -k  + Tii,j,k  1 ,j-1,k
+ k r
i,j,k+1
Δy2
— 2Ti,j,k +  Ti,j,k-1
Δζ2 +  gi,j,k
or by multiplying by ΔxΔyΔz:
pcv (ΔxΔyΔz) dTi,j,k
dt
- k t
- k t
- k t
ΔyΔz
Δχ
ΔχΔζ
Δ xΔ y
Δζ
(Ti+1,j,k
(Ti,j+1,k
(Ti,j,k+1
2Ti,j,k +  Ti-1,j,k) 
2Ti,j,k +  Ti,j-1,k) 
2Ti,j,k +  Ti,j,k-1)
= gi,j,k (ΔxΔyΔz)
(4.5)
(4.6)
There is a well known analogy between thermal and electrical conduction, where tem­
perature corresponds to voltage and heat flow corresponds to current (see Table 4.1).
Table 4.1: Analogy between electrical and thermal circuits.
Electrical Circuit Thermal Circuit
Voltage
Current
Electrical Conductance 
Electrical Resistance 
Electrical Capacitance 
Current Source
Temperature 
Heat Flow
Thermal Conductance 
Thermal Resistance 
Thermal Capacitance 
Heat Source
In light of this analogy, eq. (4.6) has a direct correspondence to an electrical circuit where 
there is a node at every discrete point or cell in the thermal grid (see Fig. 4.1). Every circuit 
node is connected to spatially neighboring nodes via conductances in the directions x, y, z 
with values:
GΧ
kt ΔyΔz G _  ktΔχΔζ G _  ktΔ xΔ y  
, Gy = ~ A f ~ , G  = (4.7)
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.3. On-Chip Thermal Modeling 35
and there is a capacitance to ground at every node or thermal cell with value:
C =  pcp (Ax AyAz) (4.8)
The heat sources constitute input excitations and are modeled in the equivalent circuit as 
current sources with values:
I  ,j,k = gt,j,k (Δχ AyAz) (4.9)
The above current sources are connected at the specific points (i, j, k) or circuit nodes where 
there is heat flow (i.e. power dissipation from the underlying chip logic blocks).
Figure 4.1: Spatial discretization of chip for thermal analysis, and 
formulation of electrical equivalent problem.
It must be noted that in practical IC thermal modeling the thermal conductivity kt is 
different for silicon, insulator, and metal materials. However, kt is typically assumed to be 
layer-wise uniform, where the substrate has the conductivity of bulk silicon and the intercon­
nect layers, consisting of a mix of insulator and metal materials, take on an average value 
of kt depending on the metal density of each layer. If the thickness of the slimmest bottom 
layer is an integer multiple of the discretization step Az, then kt may be taken as constant 
in (4.6) and (4.7) for points (i, j, k) within the same layer (with proper modifications of the 
conductances (4.7) for points lying at the interface of two layers - see [79]).
The resulting electrical equivalent circuit is described in the time domain, using the Mod­
ified Nodal Analysis (MNA) framework, by a system of Ordinary Differential Equations 
(ODE):
G x(t) +  C A ! r  =  Eu(t) (4.10)
where G E R nxn is a symmetric and positive definite matrix of the conductances (4.7), 
C E R nxn is a diagonal matrix of cell capacitances (4.8), x E R n is the vector of unknown 
temperatures Ti,j,k at all discretization points (constituting internal states of the system), u  E 
R p is the vector of input excitations from the current sources Ii,j,k of (4.9), and E E R nx p is 
the input-to-state connectivity matrix.
In many practical scenarios, there is no need for full temperature evaluation at every point 
(i, j, k) of the 3D discretization, but only at some specific vulnerable or problematic points 
(“hotspots”) that critically affect the operation and reliability of the chip (see Fig. 4.2). In 
those cases, the hotspot temperatures constitute the output portion of the state vector x(t) of
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
36Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
temperatures at all possible points, and can be formulated as:
y(t) =  Lx(t) (4.11)
where y G R q is the vector of output hotspot temperatures, and L G R qxn is the state-to- 
output connectivity matrix.
FIG U R E  4.2: Schematic depiction of input heat sources and output hotspots 
in the 3D discretization of a chip.
4.4 Low-Rank Model Order Reduction for Thermal Models
4.4.1 Balanced Truncation for Thermal Models
Eq. (4.10) and (4.11) can be written in standard state space form as:
^Xdp =  A x(t) +  Bu (t), 
y (t) =  Lx(t)
with A =  — C -1 G, and B =  C -1 E.
Model Order Reduction (MOR) aims at generating a reduced-order model:
(4.12)
d X (t) 
dt
= A x  (t) +  Bu (t),
y (t) = Lx (t)
(4.13)
with A G R rxr, B G R rxp, L G R qxr, which both exhibits r < <  n and constitutes 
a good approximation in the time domain of (4.12), in that the output error is bounded as 
||y (t) — y (t ) | |2 <  ε ||u ( t ) | |2 for the given vector of input thermal excitations u ( t ) and 
given small ε. The bound in the output error can be equivalently written in the frequency 
domain as | |y (s) — y(s) | |2 <  ε| |u(s) 112 via Plancherel’s theorem [4]. If
H (s) =  L(sI — A )- 1 B 
f t  (s) =  L (si — A )—1B
(4.14)
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.4. Low-Rank Model Order Reduction for Thermal Models 37
are the transfer functions of the original and the reduced-order model, then the output error 
in the frequency domain is:
lly (s) -  y (s)||2 =  ||H  (s)u(s) -  h (s)u (s) | |2 (415)
— l|H(s) -  H (s ) ||« ||u ( s )||2 ( . )
where | |. | |TO is the induced L2 matrix norm, or norm of a rational transfer function. 
Therefore, the output error can be bounded by bounding the distance between the transfer 
functions as | |H (s) — H(s) | |TO <  ε.
Balanced Truncation (BT) and related methods for MOR make use of the controllability 
and observability Gramian matrices:
P
Q
exp(A t)BBTexp(A t)Tdt
exp (A t)T LT Lexp(At)dt
(4.16)
which are equivalently derived by the solution of the Lyapunov matrix equations [15]:
0
0
AP +  PAT =  —BBT 
A T Q +  QA =  —LT L
(4.17)
The controllability Gramian P characterizes the input-to-state behavior, i.e. the degree to 
which the states are controllable (reachable) by the inputs, while the observability Gramian 
Q characterizes the state-to-output behavior, i.e. the degree to which the states are observ­
able at the outputs. A reduced-order model can, in principle, be obtained by eliminating 
(truncating) the states that are difficult to reach or observe. However, in the original state- 
space coordinates there might be states that are difficult to reach but easy to observe, and vice 
versa. The process of “balancing” is to transform the state vector into a new coordinate sys­
tem where for every state the degree of difficulty is the same for both reaching and observing 
it. There exists such a transformation Tx(t), which leads to a new model:
d(Tx(t))
dt
TAT-1 (Tx(t)) +  TBu(t) 
y(t) =  l t -1 (Tx(t))
(4.18)
(thus preserving the transfer function H (s)) and makes [16]:
P =  Q =  diag(a \, σ2,...., σ ) (4.19)
where σ , i =  1 , . . . ,  n are known as the Hankel singular values (HSVs) of the model and 
are equal to the square roots of the eigenvalues of the product PQ (in any coordinate system 
of state-space), i.e. σ  = \J  λ  (PQ ), i =  1 , . . . ,  n. In the balanced model (4.18) the states 
that are easier to reach and observe correspond to the largest HSVs, and if r of them are kept 
(truncating the n -  r states corresponding to the smallest HSVs) it can be shown that the 
distance between the original and the reduced-order transfer functions is bounded as [15]:
| |H (s) -  H (s) | !<» — 2(σΤ+1 +  σ '+2 +  ... +  ση) (4.20)
The latter is an “a-priori” criterion for selecting the order of the reduced model for a desired 
output error tolerance ε, and is a significant advantage of BT-type methods for MOR. The 
main steps of BT are summarized in the following Algorithm 6 [17]:
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
38Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
Algorithm 6 MOR by Balanced Truncation
1: Solve the Lyapunov equations (4.17) to obtain the Gramian matrices P and Q in low-rank 
format as described in Section 4.4.3
2: Compute the eigenvalue decomposition of PQ, or equivalently the singular value de­
composition (SVD) of the product of the Cholesky factors P =  ZpZp  and Q =  Z qZQ,
i.e. ZPZQ =  ϋ Σ Υ
3: Compute the truncated part of the balancing transformations T(rxn)
and T (nxr) Zp U(nxr) Σ —/.2, and the corresponding reduced-order model matrices as
-1/2 T
Σ (;χ;) V (rxn)Z Q
T / x AT(rxn) (nxr)' 18 =  T (rxn)B, LT (nxr)
1
1
4.4.2 Low-Rank Solution of Lyapunov Equations
The main drawback of BT is the significant computational and memory cost for deriving 
the reduced-ordered model, which is a serious obstacle in the applicability of BT for the 
reduction of large-scale models (with n  more than a few thousand states or so). That is 
because the solution of Lyapunov equations, the Cholesky factorization and the SVD are all 
computationally expensive tasks of complexity O(n3), and also involve dense matrices since 
the Gramians P, Q are dense even if the system matrices A, B, L are sparse.
However, it is almost always the practical case that the number of inputs and outputs is 
much smaller than the number of states, i.e. p, q << n. This means that the products BBT 
and LTL will have low numerical rank compared to n , and this will also hold for the cor­
responding Gramians [40], allowing their own approximation by low-rank products instead 
of full Cholesky factorizations, i.e. P ~  Zp Zp and Q ~  Z q ZQ with Zp, Z q £ R nxk 
(k << n). This greatly reduces the memory requirements, as well as the complexity of 
the factorization and SVD which are now of size k instead of full n, leaving the solution of 
Lyapunov equations as the main computational task of low-rank BT.
As we mentioned before, the two recent classes of algorithms that have been developed 
for directly solving the Lyapunov equations in low-rank factorized form are the Alternating 
Direction Implicit (ADI) [26] and the projection-type or Krylov-subspace methods [41]. The 
ADI method exhibits fast convergence but requires the input of a number of shift parameters, 
whose choice greatly affects convergence but relies on unclear heuristics and is very problem- 
dependent. Projection-type methods do not depend on the selection of specific parameters 
and their algorithmic implementation is more straightforward and well-studied, having been 
successfully used for several years for the solution of conventional linear systems of equa­
tions. However, they generally have not been competitive with ADI methods, until the recent 
development of the extended Krylov subspace method (EKS) [25] which employs two com­
plementary subspaces to radically speed up convergence [42]. In this dissertation we propose 
the use of the EKS method for the low-rank solution of Lyapunov equations arising in the 
application of BT for the reduction of large-scale thermal models.
4.4.3 Extended Krylov Subspace Method
The essence of low-rank projection type methods is to project the large-scale Lyapunov equa­
tions (4.17) onto a lower-dimensional subspace, and then solve the resulting small-scale equa­
tions to obtain the low-rank approximate solutions of (4.17).
More specifically, if K £ R nxk (k < <  n) is a projection matrix whose columns span 
the k-dimensional Krylov subspace Kk(A, B) =  span{B, AB, A2B ,. . . ,  Ak-1B}, then the
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.4. Low-Rank Model Order Reduction for Thermal Models 39
projected Lyapunov equation (for the controllability Gramian P) onto Kk(A, B) is:
(KT AK)X +  X(KT A K )T =  - K T BBT K (4.21)
The solution of X £ R kxk of (4.21) can be back-projected to the n-dimensional space to give 
an approximate solution for the original large-scale equation (4.17) as P =  KXK , and a 
low-rank factor Z £ R nxk of P can be obtained as Z =  KS where X =  SST is the Cholesky 
factorization of X.
The projection process is independent of the subspace selection, but its effectiveness 
is critically dependent on the chosen subspace and can sometimes take many iterations of 
subspace updating before converging to the final solution. The convergence problem can 
be alleviated by enriching the standard Krylov subspace Kk (A, B) with information from 
the subspace Kk (A- 1, B) corresponding to the inverse matrix A - 1, leading to the extended 
Krylov subspace:
KE (a , B) =  Kk (A, B) +  Kk (A - 1, B) =
span{B, A - 1 B, AB, A - 2B, A2B ,. . . ,  A -(k -1)B, Ak-1 B}
The extended Krylov subspace method starts by the pair {B, A - 1 B} and generates a se­
quence of extended subspaces KE (A, B) of increasing dimensions, solving the projected 
Lyapunov equation (4.21) in each iteration, until a sufficiently accurate approximation of the 
solution of (4.17) is obtained. The complete EKS procedure for the thermal grids is given in 
Algorithm 7.
Algorithm 7 Extended Krylov Subspace (EKS) Method for low-rank solution of Lyapunov 
equations arise in thermal models 
Input: A, B (or A T, LT)
Output: Z such that P ~  ZZT ,
1: p =  size_col(B); j  =  1;
2: K (j) =  Orth([B ,A - 1B])
3: while j  <  maxiter do
4: M =  K (j)T A K (j); R =  K (j)T B
5: Solve MX +  XMT =  - R R T for X £ R 2pjx2pj
6: if converged then
7: S =  Chol (X)
8: Z =  K (j) S
9: break;
10: end if
11: s =  2p(j -  1); e =  s +  p; f  =  2pj
12: K1 =  [AK(j) (:,s +  1 : e), A - 1 K (j)(:,e +  1 : f )]
13: K2 =  Orth (K1) w.r.t K (j)
14: K3 =  Orth (K2 )
15: K (j+1) =  [K(j), K3]
16: j  =  j  +  1
17: end while
4.4.4 EKS Method Implementation Details
In this section we elaborate on some points regarding the efficient implementation of the EKS 
method Algorithm 7.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
40Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
• M atrix products with inverse of sparse matrix. Algorithm 7 involves the inverse 
A - 1 of the sparse system matrix A. Unfortunately, the inverse of a sparse matrix is a 
dense matrix, and also inversion is a very expensive computational task which should 
be avoided whenever the inverse is not needed explicitly. Since, however, the inverse 
A - 1 is only applied in products with the n x  p matrix B (initially) and the n x pj 
matrix K (j)(:, e +  1 : f ) in each iteration j  (where p, pj < <  n, and the iteration 
count is typically very small), we can compute these products by solving the p and pj 
sparse systems AY =  B and AY =  K (j)(:,e +  1 : f ). The normal structure of the 
thermal grids allows the use a highly parallel iterative Preconditioned Conjugate Gra­
dient (PCG) method, which overcomes the computational demands for the very large 
systems arising from the thermal modeling algorithm like [71,72].
• Orthogonalization. An Arnoldi iteration which uses a modified Gram-Schmidt pro­
cedure [46], which exploit sparse-matrix vector products, is employed in Algorithm 7 
to handle the orthogonalization (Orth()) steps.
• Solution of small-scale Lyapunov equation. A direct Schur decomposition method 
[81] similar to the previous chapter can be employed for the solution of the small-scale 
(2pj x 2pj) Lyapunov equation in each iteration of Algortihm 7.
• Convergence criterion. An appropriate stopping criterion as described in the previous 
Chapter, for Algorithm 7, is the residual of eq. (4.17) with the approximate solutionT
P =  KXKT to reach a certain threshold in magnitude, i.e.
11AKXKT +  KXKTAT +  BBT || <  tol (4.22)
In fact for this type of problems, it can be shown [25] that the above residual norm 
is equal to | |RTMX| | which can be computed more efficiently, and thus the stopping 
criterion becomes:
||R TMX|| <  tol (4.23)
A tolerance of tol =  10-10 is typically employed in practice to acquire a good approx­
imation of the solution.
• Lower rank solution. The solution Z obtained after the termination of Algorithm 7 
has rank 2pj, where j  is the final iteration count. This can be reduced even further by 
employing in step 7 the eigendecomposition X =  W AW T, instead of the Cholesky 
factorization X =  SS for the solution X of the final projected Lyapunov equation. 
By keeping only the k eigenvalues above a certain threshold (a fair choice of threshold
is 10- 12), along with the corresponding eigenvectors, a factor Z of P with lower rank
1
k <  2pj can be obtained as Z =  KW(2pj.k)Λ (\χ^ .
4.5 Proposed Methodology for Hotspot Thermal Simulation
The proposed methodology for hotspot thermal simulation is summarized in the following 
steps:
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.5. Proposed Methodology for Hotspot Thermal Simulation 41
• 3D discretization of the chip. The spatial steps Δχ, Δ;  ^in the x- and y-direction are 
user defined, but the step Δζ  along the z-direction is typically chosen to coincide with 
the interface between successive layers (metal and insulator). The discretization pro­
cedure naturally covers multiple layers in the z-direction, and can be easily extended 
to model heterogeneous structures that can be found in modern chips (e.g. heat sinks).
• Construction of equivalent electrical circuit. The RC elements of the electrical 
equivalent are calculated by (4.7) and (4.8).
• Formulation of equivalent circuit description. Using Modified Nodal Analysis, the 
equivalent circuit is described by the ODE system (4.10).
• Estimation of power consumption profile of chip logic blocks. This determines the 
location and the time behavior of heat sources, which in turn specify the structure of 
the input-to-state connectivity matrix E, and the value of current sources (4.9) that con­
stitute the vector u (t) in (4.10).
• Selection of hotspots. The hotspots are usually the same points where heat sources 
are applied, but can be any other user-defined points along the layer stack of the chip (a 
specially inter-layer vias and points on the upper metal layer where power distribution 
pins are connected). Since the hotspots constitute the outputs y (t) of the model, their 
location specifies the structure of the state-to-output connectivity matrix L in (4.11) 
(with 1s at the appropriate matrix positions).
• Formulation of state-space model. This results from (4.10) and (4.11) as:
dx
dt
C -1G x(t) +  C -1 Eu(t),
y (t) =  Lx(t)
(4.24)
where the inversion of C is trivial since it is a diagonal matrix.
• Construction of reduced-order model. This is performed by the Balanced Trunca­
tion Algorithm 6, with the EKS method Algorithm 7 employed to compute low-rank 
solutions of the Lyapunov equations (4.17) and analogous approximations of the sys­
tem Gramians P and Q. Note that there is no need for passivity preservation in thermal 
analysis problems, since the reduced-order model is not interconnected with other ther­
mal models but is used individually in multiple transient thermal simulations, and thus 
it is only needed to be accurate enough to capture all thermal effects at the hotspots.
Simulation of the reduced-order model. This can be performed by the Backward- 
Euler (BE) differential approximation, where a direct or iterative linear solver can be 
employed for the solution of the resulting linear systems at each discrete point in time.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
42Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
4.6 Extension in Limited-Time Intervals
In most practical applications there is typically a final time at which all thermal effects can 
be considered to have reached steady-state. This means that the reduced-order model can 
become unnecessarily large to achieve approximation over an infinite time period.
Recalling the controllability and observability Gramians of (4.16) and restricting the in­
tegration limits to a time interval [0, ti], with 0 <  ti < to, the time-limited Gramians are 
obtained by
f‘ ti
P t =  exp (A t)B B Texp (A t)T dt
0
Q t =  exp(A t)TLTLexp(At)dt
0
(4.25)
Equivalently, Pt and Qt can be derived by the solution of the following modified Lyapunov 
equation [30,31]:
APt +  Pt A T =  -B B T +  FB(FB)T 
A T Qt +  Qt A =  - L T L +  (LF)T LF
where
F =  eAti (4.27)
The time-limited Gramians characterize the controllability and observability of the model 
in the selected time window, and the process of balancing and truncation will eliminate states 
that are difficult to reach and observe inside this time range. This means that more states can 
be eliminated for a given tolerance in (4.20) leading to lower order r in the reduced model, 
or alternatively to lower error in the time range for a given order r. However, in order to 
compute P t and Q t by solving (4.26) we have to deal with the different RHS of time-limited 
Lyapunov equations which require the computation of a matrix exponential. In order to deal 
with the computation of the matrix exponential, we adopt the EKS method as described in 
the next section.
4.7 Computation of the RHS of the Time-Limited Gramians
To construct the RHS of the time-limited Lyapunov equations of (4.26) we need to compute 
the matrix exponential F of (4.27). Computing the matrix exponential is generally an inten­
sive procedure. Fortunately, the matrix exponential is not explicitly needed in but only the ef­
fects in pre-multiplying p, (eYB) or post-multiplying q vectors (LeY) are required. Similarly, 
these can be effectively computed by projection algorithms for evaluating matrix functions, 
which iteratively project the large-scale input matrices onto lower dimensional subspaces and 
compute the analogous small-scale problem of matrix functions times a vector. The dimen­
sion of the projection subspace is increased in every iteration until convergence is achieved. 
The most effective modern algorithm for evaluating large-scale matrix functions is a similar 
procedure based on the EKS method [51], where the two complementary subspaces are em­
ployed in the same way as we described them in the previous section. The implementation is 
shown in Algorithm 8.
Note that in steps 4 and 7 the operations YK(j) and Y-1K (j) we use the same parallel 
iterative Preconditioned Conjugate Gradient (PCG) method as previous.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.8. Proposed Methodology for Time-Limited Hotspot Thermal Simulation 43
Algorithm 8 Extended Krylov subspace method for matrix exponential multiplying p vectors
B
Input: Y, B, Convergence tolerance e
Output: v =  exp(Y)B
1 j  =  i;
2 K (j) =  Orth([B, Y- i B]);
3 repeat
4 X =  K (j)T YK(j); R = K (j)T B
5 Compute y =  exp(X)R
6 ki =  2p(j -  1); k2 =  ki +  p; k3 =  2pj
7 Ki =  [YK(j) (:, ki +  1 : k2); Y- i K (j) (:,k2 +  i  : k3)]
8 K2 =  Orth (K i) w.r.t K(j)
9 K3 =  Orth (K2)
10 K (j+i) =  [K(j), K3 ]
11 j =  j  +  i
12 until ||y(j)-y(j-i)|1 <  e ||y(j)|| <  e
13 v =  K (j) y
4.8 Proposed Methodology for Time-Limited Hotspot Thermal 
Simulation
In addition to the previously described steps, only a small modification is needed in the 
construction o f the reduced-order model, where the following steps need to be employed:
• The quantity of (4.27) are evaluated through the extended Krylov subspace method of 
Algorithm 8.
• The low-rank factors Zp and Zq of the time-limited Gramians are computed through 
the EKS method of Algorithm 7.
• The states that are difficult to reach or observe in the selected time interval are elimi­
nated (truncated) by executing steps 2 and 3 of BT algorithm 6.
4.9 Experimental Results
In order to evaluate the efficiency of the proposed methodology for thermal analysis, we have 
created a set of artificial benchmark circuits that represent simplified microprocessor designs 
with random control logic and datapath. The characteristics of the constructed benchmarks 
are shown in Table 4.2.
The 3D discretization of each benchmark was performed with 10000 points along each 
material layer, and the RC equivalent electrical circuit was constructed. All hotspots were 
taken at the same points as the heat sources (with no loss of generality), i.e. every input port 
was also an output port in the state space model (4.12). Finally, the dissipation excitation of 
the heat sources where random piece-wise-linear functions.
The Reduced-Order Models (ROMs) were obtained by the BT Algorithm 6, with the 
EKSM Algorithm 7 for the computation of Gramians in low-rank form. All experiments 
were run using Matlab R2015a on a Linux workstation having an Intel Core i7 processor 
with 8 cores at 3.6GHz and 16GB memory. The tolerance error for BT was selected as 
ε =  i0 -4 , which is very strict but still leads to compact ROMs. The reduction results are 
shown in Table 4.3.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
44Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
Table 4.2: Statistics of benchmark circuits. Material layers include both 
metal and insulator layers, and heat sources represent sources of power 
dissipation from chip logic blocks.
Benchmark Metal Layers Material Layers Heat Sources
ckt1 3 5 200
ckt2 3 6 220
ckt3 4 7 260
ckt4 4 8 295
ckt5 5 9 330
ckt6 5 10 370
ckt7 6 11 410
Table 4.3: Model Order Reduction results.
Benchmark Original Size Size of ROM Reduction Percentage
ckt1 50000 1543 96.91%
ckt2 60000 1753 97.07%
ckt3 70000 1976 97.17%
ckt4 80000 2177 97.27%
ckt5 90000 2321 97.42%
ckt6 100000 2540 97.46%
ckt7 110000 2732 97.51%
The above results demonstrate that system theoretic techniques like BT can achieve very 
high reduction percentages, of about 97%, and thus can lead to very compact ROMs for the 
efficient capture of thermal effects at the hotspots. The resulting ROMs exhibit low memory 
requirements for storing the system matrices and significantly faster transient simulation.
Fig. 4.3 also displays graphically the magnitude of the Hankel Singular Values (HSVs) 
in decreasing order for two benchmark circuits, where it can be clearly seen that more than 
90% have negligible contribution to the system dynamics.
Furthermore, to evaluate the accuracy and efficiency of the resulting ROMs, we compared 
their transient simulation against the original, with both state-of-the-art direct methods and 
iterative methods with zero-fill incomplete factorization preconditioners [44,45]. Both the 
original models and the ROMs were simulated over 200 time-steps, from 0 to 0.2 seconds, 
and the runtime results are reported in Tables 4.4 and 4.5 for direct and iterative methods 
respectively. In the tables, Simul. Time refers to the average time (in seconds) per time-step 
required for the transient solution. It can be observed, that the compact models provide a 
significant acceleration which ranges from 26.33X to 46.33X for direct methods, and from 
3.75X to 5.63X for iterative methods. The speedup is significantly more pronounced for 
direct methods since the size of the resulting ROMs is quite small.
Also, Fig. 4.4 depicts the simulated waveforms of temperature over time at certain 
hotspots of benchmarks ckt4 and ckt5, where the responses of the ROMs show a nearly 
perfect match with the responses of the original models.
Finally, in order to achieve enhanced accuracy the time-limited BT was implemented 
with the procedures that was described in the previous sections for a time range [0, 2] (purely 
for testing purposes - one can choose any other range depending on the application), and 
was compared with the standard (infinite time) BT that we calculated before. The ROMs 
of standard and time-limited BT were compared both with respect to their order for given
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.9. Experimental Results 45
Index
(A) HSVs of ckt3. (B) HSVs of ckt5.
Figure 4.3: Magnitude of Hankel Singular Values for two benchmark
circuits.
Table 4.4: Runtime results for transient thermal simulation of the original 
and the reduced-order model with direct methods.
Benchmark Original Model Simul. Time
ROM
Simul. Time Speedup
ckt1 1.58 0.06 26.33X
ckt2 1.78 0.06 29.66X
ckt3 2.34 0.07 33.42X
ckt4 3.04 0.09 33.77X
ckt5 3.87 0.10 28.77X
ckt6 4.89 0.11 44.45X
ckt7 5.56 0.12 46.33X
tolerance ε, and w.r.t. their accuracy for given ROM order. In the first case the error tolerance 
was chosen as ε = 10-4 , while in the second case the order r was determined by the execution 
of standard BT and was reused for the truncation of the HSVs of time-limited Gramians. The 
results are reported in the of Table 4.6 where Max error refers to the absolute maximum error 
between the temperature waveforms of the original model and the ROM in the selected time 
range, while Reduction Percentage refers to the reduction percentage of the original model 
and the time-limited BT. From the table it can be clearly verified that time-limited BT can 
produce ROMs that exhibit either smaller size for given error, or smaller error for given order 
in comparison to standard BT when restricted in a finite frequency range.
The above results demonstrate that the proposed enchaced time-limited methodology 
can achieve slightly higher reduction percentages, of more than 97%, but provides a clear 
improvemnt in the error metric.
Finally, Fig. 4.5 depicts the simulated waveforms with the time-limited methodology of 
temperature over time at a hotspot of ckt3 benchmark, where the response of the ROM show 
a perfect match with the responses of the original model in the selected time window (with 
error less than 10-8 ).
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
46Chapter 4. Efficient IC Hotspot Thermal Analysis via Low-Rank Model Order Reduction
Table 4.5: Runtime results for transient thermal simulation of the original 
and the reduced-order model with iterative methods.
Benchmark Original Model Simul. Time
ROM
Simul. Time Speedup
ckt1 1.05 0.28 3.75X
ckt2 1.11 0.31 3.58X
ckt3 1.62 0.37 4.37X
ckt4 2.05 0.40 5.12X
ckt5 2.22 0.41 5.41X
ckt6 2.46 0.43 5.72X
ckt7 2.76 0.49 5.63X
Time (s)
(A) Transient simulation of a hotspot in 
benchmark ckt4.
(B) Transient simulation of a hotspot in 
benchmark ckt5.
Figure 4.4: Results of transient analysis for original and reduced-order 
model at a hotspot point in two benchmark circuits.
Table 4.6: Reduction results of time-limited BT vs standard BT for 
various circuit benchmarks.
Bench.
Stadard BT Time-limited BT
ROM order Max error ROM order for same error
Percent
%
Max error 
for same order
ckt1 1543 5.58e-06 1265 97.47% 2.48e-09
ckt2 1753 1.64e-05 1442 97.59% 1.64e-08
ckt3 1976 7.71e-05 1542 97.79% 3.76e-10
ckt4 2177 7.14e-05 1775 97.78% 6.44e-08
ckt5 2321 4.66e-06 1898 97.89% 9.12e-09
ckt6 2540 7.89e-06 2114 97.86% 7.33e-09
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
4.9. Experimental Results 47
Figure 4.5: Transient simulation of a hotspot in benchmark ckt3 with
time-limited BT.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
49
Chapter 5
Conclusions and Future Directions
5.1 Conclusions
This dissertation focus was given in circuit and thermal models derived from integrated cir­
cuits and the calculation of the reduced model on specific frequency or time windows, as well 
as on the efficient implementation of Balanced Truncation and Lyapunov matrix equations 
solution methods. More specifically we have presented:
• An efficient method for reduction of large-scale circuit models in finite frequency win­
dows, which only requires the specification of the end frequencies and avoids the need 
for ambiguous frequency weighting functions. The method has been shown to provide 
clear improvements in model accuracy or size with respect to standard Balanced Trun­
cation, while retaining its benefits of specified error bounds. The implementation of 
the proposed method has been made with efficient computational choices, as well as 
adaptations and modifications of large-scale methods for matrix equations.
• A compact modeling for hotspot transient thermal simulation. Unlike traditional meth­
ods that focus on the solution of the full thermal problem in an IC, this work focused on 
the observation that the simulation of the thermal phenomena in hotspots is sufficient 
to define the proper functionality of the circuit and additional nodes contain thermal 
information with little value to reliability issues. Experimental results shown that this 
method can provide very compact models for large 3D structures with acceptable error 
in transient analysis. Summarizing, MOR techniques can provide attractive solution to 
the problem of thermal analysis.
5.2 Future Directions
In the future, we plan to extend the research presented in this dissertation towards the follow­
ing directions:
• The computational implementation of MOR methods can be improved by exploiting 
the massive parallelism of modern heterogeneous systems with simultaneous multi­
core processors and graphic processing units (GPUs). Existing implementations in the 
literature are scarce, and perhaps the greatest benefits arise in matrix equation solvers, 
which can handle really large models with the advantage of fast computation of the 
reduced model.
• Generating linear models that can be easily synthesized. Concerning the synthesis of 
the reduced models, the industry requirement is the passive composition without trans­
formers, but this usually results in a much larger number of RLC elements compared 
to the reduced-order model. This is directly related to the fact that the matrix loses 
its sparse structure, so tackling the problem with graph-theoretic techniques may also
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
50 Chapter 5. Conclusions and Future Directions
solve the problem of non-minimal composition. Another important problem is that 
the synthesis procedure produces circuits with negative RLC values, which can cause 
various problems when simulating the resulting models.
• Generate guaranteed passive models. As for the passivity of the reduced-order model, 
all existing methods try moving a small subset of poles that initially make the model 
non-passive. Usually, however, this adversely affects the accuracy of the approach, and 
the best solution is to move a wider set of poles (not just those that affect the passivity) 
so that the reduced model becomes passive without loss of accuracy.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
51
Appendix A
Relevant Publications
Conference publications:
• George Floros, Nestor Evmorfopoulos and George Stamoulis. "THANOS: Eliminating 
Redundant States in Transient Thermal Analysis". International Workshop Thermal 
Investigations Of ICs And Systems (THERMINIC), September 25-27, 2019, atLecco, 
Italy.
• George Floros, Nestor Evmorfopoulos and George Stamoulis. "Efficient Circuit Re­
duction in Limited Frequency Windows". International Conference on Synthesis, Mod­
eling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), 
July 15-18, 2019, at Lausanne, Switzerland.
• George Floros, Nestor Evmorfopoulos and George Stamoulis. "Efficient Reduction of 
Large Circuit Models Over Limited Frequency Windows". Design Automation Con­
ference (DAC), June 2-6, 2019, Work-in-Progress (WIP) Poster Session, Las Vegas, 
NV, USA.
• George Floros, Nestor Evmorfopoulos and George Stamoulis. "Efficient Hotspot Ther­
mal Simulation via Low Rank Model Order Reduction". International Conference on 
Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit 
Design (SMACD), July 2-5, 2018, at Prague, Czech Republic.
Journal publications:
• George Floros, Nestor Evmorfopoulos and George Stamoulis. "Frequency-Limited 
Reduction of Large Circuit Models via Low-Rank Sparse ADI Method". IEEE Trans­
actions on VLSI. (submitted)
• George Floros, Nestor Evmorfopoulos and George Stamoulis. "Efficient Hotspot Ther­
mal Simulation via Low-Rank Model Order Reduction". Integration, the VLSI Journal, 
volume 66, May 2019, pp 1-8.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
53
Bibliography
[1] J. Lillis, C. Cheng, S. Lin, and N. Chang, “Interconnect analysis and synthesis”. J oh n  W iley , 1999.
[2] A. Odabasioglu, M. Celik and L. T. Pileggi, “PRIMA: passive reduced-order interconnect macromod­
eling algorithm,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C ircu its  a n d  S y s te m s , 
vol. 17, no. 8, pp. 645-654, 1998.
[3] J. Phillips, L. Daniel and L. Miguel Silveira, “Guaranteed passive balancing transformations for model 
order reduction,” in D e s ig n  A u to m a tio n  C o n fe re n c e  (D A C ), 2002.
[4] K. Grochenig, “Foundations of Time-Frequency Analysis,” in A p p l ie d  a n d  N u m e r ic a l H a r m o n ic  A n a ly ­
s is , 2001.
[5] Pillage, Lawrence T., and Ronald A. Rohrer. “Asymptotic waveform evaluation for timing analysis,” 
in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C ircu its  a n d  S y s te m s , vol 9, no 4, pp 
352-366, 1990.
[6] P. Feldmann, R. W. Freund, “Efficient linear circuit analysis by Padd approximation via the Lanczos 
process,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C ircu its  a n d  S y s te m s , vol 14, 
no 5, pp 639-649, 1994.
[7] R. W. Freund, “SPRIM: structure-preserving reduced-order interconnect macromodeling,” in I n te r n a ­
tio n a l C o n fe re n c e  o n  C o m p u te r  A id e d  D e s ig n  ( I C C A D ), pp. 80-87, 2004.
[8] Hao Yu, Lei He and S. X. D. Tar, “Block structure preserving model order reduction,” in P r o c e e d in g s  o f  
th e  2 0 0 5  IE E E  I n te r n a tio n a l B e h a v io r a l M o d e lin g  a n d  S im u la tio n  W orksh op , pp. 1-6, 2005.
[9] Z. Zhang, X. Hu, C. Cheng and N. Wong, “A block-diagonal structured model reduction scheme for 
power grid networks,” in D e s ig n , A u to m a tio n  &  T est in  E u r o p e , pp. 1-6, 2011.
[10] Pu Liu, Sheldon X.-D. Tan, Boyuan Yan, Bruce McGaughy, “An efficient terminal and model order 
reduction algorithm,” in I n te g r a tio n , vol 41 pp. 210-218, 2008.
[11] Pu Liu et al., “An efficient method for terminal reduction of interconnect circuits considering delay 
variations,” in I E E E /A C M I n te r n a tio n a l C o n fe re n c e  o n  C o m p u te r -A id e d  D e s ig n ,, pp. 821-826, 2005.
[12] P. Miettinen, M. Honkala, J. Roos and M. Valtonen, “PartMOR: Partitioning-Based Realizable Model- 
Order Reduction Method for RLC Circuits,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te ­
g r a te d  C ircu its  a n d  S y s te m s , vol. 30, no. 3, pp. 374-387, 2011.
[13] G. De Luca, G. Antonini and P. Benner, “A parallel, adaptive multi-point model order reduction algo­
rithm,” in IE E E  2 2 n d  C o n fe re n c e  o n  E le c tr ic a l  P e r fo rm a n c e  o f E le c tr o n ic  P a c k a g in g  a n d  S y s te m s , San 
Jose, CA, 2013, pp. 115-118.
[14] W. Zhao, G. K. H. Pang and N. Wong, “Automatic adaptive multi-point moment matching for descriptor 
system model order reduction,” in In te r n a tio n a l S y m p o s iu m  o n  V L S I D e s ig n , A u to m a tio n , a n d  T est, pp. 
1-4, 2013.
[15] A.C. Antoulas “Approximation of large-scale dynamical systems,” in S IA M , 2005.
[16] A.C. Antoulas “An overview of approximation methods for large-scale dynamical systems,” in A n n u a l  
R e v ie w s  in  C o n tro l,  , vol. 29, no. 2, pp. 181-190, 2005.
[17] A. Laub, M. Heath, C. Paige and R. Ward, “Computation of system balancing transformations and other 
applications of simultaneous diagonalization algorithms,” in IE E E  T ra n sa c tio n s  on  A u to m a tic  C o n tro l, 
vol. 32, no. 2, pp. 115-122, February 1987.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
54 BIBLIOGRAPHY
[18] Z. Zhang, Q. Wang, N. Wong and L. Daniel, “A moment-matching scheme for the passivity-preserving 
model order reduction of indefinite descriptor systems with possible polynomial parts,” in A s ia  a n d  S ou th  
P a c ific  D e s ig n  A u to m a tio n  C o n fe re n c e , pp. 49-54, 2011.
[19] T. Reis and T. Stykel, “PABTEC: Passivity-Preserving Balanced Truncation for Electrical Circuits,” in 
IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C ircu its  a n d  S y s te m s , vol. 29, no. 9, pp. 
1354-1367, Sept. 2010.
[20] B. Yan, S. X. -. Tan, P. Liu and B. McGaughy, “Passive Interconnect Macromodeling Via Balanced Trun­
cation of Linear Systems in Descriptor Form,” in A s ia  a n d  S o u th  P a c ific  D e s ig n  A u to m a tio n  C o n fe re n c e , 
pp. 355-360, 2007.
[21] M. Imran, A. Ghafoor and M. Imran, “Frequency Limited Model Reduction Techniques With Error 
Bounds,” in IE E E  T ra n sa c tio n s  o n  C irc u its  a n d  S y s te m s  II: E x p r e ss  B r ie fs ,  vol. 65, no. 1, pp. 86-90, Jan. 
2018.
[22] U. Zulfiqar, M. Imran, A. Ghafoor and M. Liaquat, “A New Frequency-Limited Interval Gramians-Based 
Model Order Reduction Technique,” in IE E E  T ra n sa c tio n s  on  C irc u its  a n d  S y s te m s  II: E x p r e s s  B r ie fs , 
vol. 64, no. 6, pp. 680-684, 2017.
[23] D. Li, S. X. -. Tan and B. McGaughy, “ETBR: Extended Truncated Balanced Realization Method for 
On-Chip Power Grid Network Analysis,” in D e s ig n , A u to m a tio n  a n d  T est in  E u ro p e ,  pp. 432-437, 2008.
[24] v. Simoncini, “Computational Methods for Linear Matrix Equations,” in S IA M  R ev iew ,  vol. 58, no. 3, 
pp. 337-441, 2016.
[25] V. Simoncini, “A New Iterative Method for Solving Large-Scale Lyapunov Matrix Equations, ” in S IA M  
J o u rn a l o n  S c ie n tif ic  C o m p u tin g , vol. 29, no. 3, pp. 1268-1288, 2007.
[26] J. R. Li and J. White, “Low Rank Solution of Lyapunov Equations,” in S IA M  J o u rn a l o n  M a tr ix  A n a ly s is  
a n d  A p p l ic a t io n s , vol. 24, no. 1, pp. 260-280, 2002.
[27] J. Phillips and L. M. Silveira, “Poor man’s TBR: a simple model reduction scheme,” in D e s ig n , A u to m a ­
tio n  a n d  T est in E u ro p e  C o n fe re n c e  a n d  E x h ib it io n  (D A T E ), 2004.
[28] V. Vasudevan and M. Ramakrishna, “An efficient algorithm for frequency-weighted balanced truncation 
of VLSI interconnects in descriptor form,” in D e s ig n  A u to m a tio n  C o n fe re n c e  (D A C ), 2015.
[29] S. Gugercin and A. C. Antoulas, “A Survey of Model Reduction by Balanced Truncation and Some New 
Results,” in In te r n a tio n a l J o u rn a l o f  C o n tr o l ,, vol. 77, no. 8, pp. 748-766, 2004.
[30] W. Gawronski and J. Juang, “Model reduction in limited time and frequency intervals,” in In te r n a tio n a l  
J o u rn a l o f  S y s te m s  S c ie n c e , vol. 21, no. 2, pp. 349-376, 1990.
[31] P. Benner, P. Kurschner and J. Saak, “Frequency-Limited Balanced Truncation with Low-Rank Approx­
imations,” in S IA M  J o u rn a l o n  S c ie n tific  C o m p u tin g , vol. 38, no. 1, pp. 471-499, 2016.
[32] S. Tan and L. He, A d v a n c e d  m o d e l  o r d e r  r e d u c tio n  te c h n iq u e s  in  V L S I d e s ig n , Cambridge University 
Press, 2007.
[33] B. Yan, S. X. -D. Tan, L. Zhou, J. Chen and R. Shen, “Decentralized and Passive Model Order Reduction 
of Linear Networks With Massive Ports,” in IE E E  T ra n sa c tio n s  o n  V ery L a rg e  S c a le  I n te g r a tio n  (V L S I)  
S y s te m s , vol. 20, no. 5, pp. 865-877, 2012.
[34] P. Feldmann, “Model order reduction techniques for linear systems with large numbers of terminals,” in 
P r o c e e d in g s  D e s ig n , A u to m a tio n  a n d  T est in  E u ro p e  C o n fe re n c e  a n d  E x h ib it io n  (D A T E ), 2004.
[35] P.Li and W. Shi, “Model order reduction of linear networks with massive ports via frequency-dependent 
port packing,” in A C M /IE E E  D e s ig n  A u to m a tio n  C o n fe re n c e  (D A C ), 2006.
[36] B. Yan, S. X. -D. Tan and B. McGaughy, “Second-Order Balanced Truncation for Passive-Order Reduc­
tion of RLCK Circuits,” in IE E E  T ra n sa c tio n s  on  C irc u its  a n d  S y s te m s  II: E x p r e ss  B r ie fs , vol. 55, no. 9, 
pp. 942-946, Sept. 2008.
[37] D. Li, S. X.-D. Tan, and B. McGaughy, “ETBR: extended truncated balanced realization method for 
on-chip power grid network analysis,” in P r o c e e d in g s  o f  th e  co n fe re n c e  o n  D e s ig n , a u to m a tio n  a n d  te s t  
in E u ro p e  (D A T E ), 2008.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
BIBLIOGRAPHY 55
[38] P. Heydari and M. Pedram, “Model-order reduction using variational balanced truncation with spectral 
shaping,” in IE E E  T ra n sa c tio n s  o n  C ircu its  a n d  S y s te m s  I: R e g u la r  P a p e r s , vol. 53, no. 4, pp. 879-891, 
April 2006.
[39] A. Ghafoor and V. Sreeram, “Model Reduction Via Limited Frequency Interval Gramians,” in IE E E  
T ra n sa c tio n s  o n  C irc u its  a n d  S y s te m s  I: R e g u la r  P a p e r s , vol. 55, no. 9, pp. 2806-2812, Oct. 2008.
[40] A.C. Antoulas, D.C. Sorensen, Y. Zhou, “On the decay rate of Hankel singular values and related issues,” 
in S y s te m s  a n d  C o n tro l L e t te r s , vol. 46, no. 5, pp. 323-342, 2002.
[41] K. Jbilou and A.J. Riquet, “Projection methods for large Lyapunov matrix equations, ” in L in e a r  A lg e b r a  
a n d  its  A p p l ic a t io n s , vol. 415, no. 2-3, pp. 344-358, 2006.
[42] L. Knizhnerman and V. Simoncini, “Convergence analysis of the extended Krylov subspace method for 
the Lyapunov equation, ” in N u m e r is c h e  M a th e m a tik , vol. 118, no. 3, pp. 567-586, 2011.
[43] Stykel, T., “Gramian-Based Model Reduction for Descriptor Systems” T. M a th . C o n tro l S ig n a ls  S y s te m s , 
vol 16, pp 297-319, 2004.
[44] UMFPACK, http://www.cise.ufl.edu/research/sparse/umfpack/
[45] Barrett, R. and Berry, M. and Chan, T. and Demmel, J. and Donato, J. and Dongarra, J. and Eijkhout, 
V. and Pozo, R. and Romine, C. and van der Vorst, H, T em p la te s  fo r  th e  S o lu tio n  o f  L in e a r  S ys te m s:  
B u ild in g  B lo c k s  f o r  I te r a tiv e  M e th o d s , SIAM, 1994.
[46] G. Golub and C. F. Van Loan, M a tr ix  C o m p u ta t io n s , Johns Hopkins University Press, 1996.
[47] N. Higham, F u n c tio n s  o f  M a tr ic e s :  T h e o r y  a n d  C o m p u ta t io n , SIAM, 2008.
[48] N. Lang, H. Mena and J. Saak, “On the benefits of the LDLT factorization for large-scale differential 
matrix equation solvers,” in L in e a r  A lg e b r a  a n d  its  A p p lic a t io n s ,  vol. 480, pp. 44-71, 2015.
[49] G. Floros, N. Evmorfonoulos and G. Stamoulis, “Efficient Circuit Reduction in Limited Frequency 
Windows,” in n te r n a tio n a l C o n fe re n c e  o n  S yn th e s is , M o d e lin g , A n a ly s is  a n d  S im u la tio n  M e th o d s  a n d  
A p p l ic a t io n s  to  C irc u it D e s ig n  (S M A C D ),  pp. 129-132, 2019.
[50] P. Benner et al, “Self-Generating and Efficient Shift Parameters in ADI Methods for Large Lyapunov 
and Sylvester Equations,” in E le c tr o n ic  T ra n sa c tio n  o n  N u m e r ic a l A n a ly s is ,  vol. 43, pp. 142-162, 2014.
[51] L. Knizhnerman and V. Simoncini, “A new investigation of the extended Krylov subspace method for 
matrix function evaluations ,” in N u m e r ic a l L in e a r  A lg e b r a  A p p l .,  vol. 17, pp. 615-638, 2010.
[52] F. D. Freitas, J. Rommes and N. Martins, “Gramian-Based Reduction Method Applied to Large Sparse 
Power System Descriptor Models,” in IE E E  T ra n sa c tio n s  o n  P o w e r  S y s te m s ,  vol. 23, no. 3, pp. 1258­
1270, 2008.
[53] S. Grivet-Talocia and A. Ubolli, “A Comparative Study of Passivity Enforcement Schemes for Linear 
Lumped Macromodels,” in IE E E  T ra n sa c tio n s  o n  A d v a n c e d  P a ck a g in g ,  vol. 31, no. 4, pp. 673-683, 
2008.
[54] http://slicot.org/20-site/126-benchmark-examples-for-model-reduction
[55] R. Ionutiu, J. Rommes and W. H. A. Schilders, “SparseRC: Sparsity Preserving Model Reduction for 
RC Circuits With Many Terminals, ” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  
C ircu its  a n d  S ys te m s ,  vol. 30, no. 12, pp. 1828-1841, 2011.
[56] C. Xu, S. K. Kolluri, K. Endo and K. Banerjee, “Analytical Thermal Model for Self-Heating in Advanced 
FinFET Devices With Implications for Design and Reliability,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  
D e s ig n  o f  I n te g r a te d  C irc u its  a n d  S y s te m s , vol. 32, no. 7, pp. 1045-1058, 2013.
[57] “International technology roadmap for semiconductors (ITRS) 2015 Edition -ERD,” 2015.
[58] Peng Li, L. T. Pileggi, M. Asheghi and R. Chandra, “IC thermal simulation and modeling via efficient 
multigrid-based approaches,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C ircu its  
a n d  S y s te m s , vol. 25, no. 9, pp. 1763-1776, 2006.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
56 BIBLIOGRAPHY
[59] S. Ladenheim , Y. C. Chen, M. Mihajlovic, V. Pavlidis, “IC thermal analyzer for versatile 3-D structures 
using multigrid preconditioned Krylov methods,” in I E E E /A C M  I n te r n a tio n a l C o n fe re n c e  o n  C o m p u te r -  
A id e d  D e s ig n , pp. 1-8, 2016.
[60] Y. Zhan and S. S. Sapatnekar, “High-Efficiency Green Function-Based Thermal Simulation Algorithms,” 
in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f I n te g r a te d  C ircu its  a n d  S y s te m s , vol. 26, no. 19, pp. 
1661-1675, 2007.
[61] L. Codecasa, D. D’Amore and P. Maffezzoni, “Compact modeling of electrical devices for electrothermal 
analysis,” in IE E E  T r a n sa c tio n s  o n  C irc u its  a n d  S y s te m s  I :  F u n d a m e n ta l T h e o r y  a n d  A p p l ic a t io n s ,, vol. 
50, no. 4, pp. 465-476, April 2003.
[62] A.C. Antoulas, D.C. Sorensen, and S. Gugercin, “A Survey of Model Reduction Methods for Large-Scale 
Systems,” in C o n te m p o r a r y  M a th e m a tic s ., vol. 280, pp. 193-201, 2001.
[63] G. Floros, N. Evmorfopoulos and G. Stamoulis, “Efficient Hotspot Thermal Simulation Via Low-Rank 
Model Order Reduction,” in 2 0 1 8  1 5 th  I n te r n a tio n a l C o n fe re n c e  o n  S yn th e s is , M o d e lin g , A n a ly s is  a n d  
S im u la tio n  M e th o d s  a n d  A p p l ic a t io n s  to  C ir c u it D e s ig n  (S M A C D ), Prague, 2018, pp. 205-208.
[64] G. Floros, N. Evmorfopoulos and G. Stamoulis, “Efficient IC hotspot thermal analysis via low-rank 
Model Order Reduction,” in In te g r a tio n ,, vol 66, pp. 1-8, 2019.
[65] Peng Li, L. T. Pileggi, M. Asheghi and R. Chandra, “Efficient full-chip thermal modeling and analysis,” 
in I E E E /A C M  I n te r n a tio n a l C o n fe re n c e  o n  C o m p u te r  A id e d  D e s ig n ., pp. 319-326, 2004.
[66] Y. Yang, Z. Gu, C. Zhu, R. P. Dick and L. Shang, “ISAC: Integrated Space-and-Time-Adaptive Chip- 
Package Thermal Analysis,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f I n te g r a te d  C ircu its  a n d  
S y s te m s , vol. 26, no. 1, pp. 86-99, 2007.
[67] T. Y. Wang and C. C. P. Chen, “3-D Thermal-ADI: a linear-time chip level transient thermal simulator,” 
in IIE E E  T ra n sa c tio n s  on  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C irc u its  a n d  S y s te m s , vol. 21, no. 12, 
pp. 1434-1445, 2002.
[68] Y. Yang, C. Zhu, Z. Gu and L. Shang and R. P. Dick, “Adaptive multi-domain thermal modeling and anal­
ysis for integrated circuit synthesis and design,” in IE E E /A C M  In te r n a tio n a l C o n fe re n c e  o n  C o m p u te r  
A id e d  D e s ig n ., pp. 575-582, 2006.
[69] A. Sridhar, A. Vincenzi, D. Atienza and T. Brunschwiler, “3D-ICE: A Compact Thermal Model for 
Early-Stage Design of Liquid-Cooled ICs,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r s , vol. 63, no. 10, pp. 
2576-2589, 2014.
[70] X. X. Liu, K. Zhai, Z. Liu, K. He, S. X. D. Tan and W. Yu, “Parallel Thermal Analysis of 3-D Integrated 
Circuits With Liquid Cooling on CPU-GPU Platforms,” in IE E E  T ra n sa c tio n s  on  V ery L a rg e  S ca le  
I n te g r a tio n  (V L S I) S y s te m s , vol. 23, no. 3, pp. 575-579, 2015.
[71] G. Floros, K. Daloukas, N. Evmorfopoulos and G. Stamoulis, “A parallel iterative approach for efficient 
full chip thermal analysis,” in 7 th  I n te r n a tio n a l C o n fe re n c e  o n  M o d e r n  C irc u its  a n d  S y s te m s  T ec h n o lo ­
g ie s  (M O C A S T ), Thessaloniki, pp. 1-4, 2018.
[72] G. Floros, K. Daloukas, N. Evmorfopoulos, G. Stamoulis, “A Preconditioned Iterative Approach for 
Efficient Full Chip Thermal Analysis on Massively Parallel Platforms,” in T ec h n o lo g ie s  2019, 7, 1.
[73] A. Vincenzi, A. Sridhar, M. Ruggiero and D. Atienza, “Fast thermal simulation of 2D/3D integrated 
circuits exploiting neural networks and GPUs,” in I E E E /A C M  I n te r n a tio n a l S y m p o s iu m  o n  L o w  P o w e r  
E le c tr o n ic s  a n d  D e s ig n ., pp. 151-156, 2011.
[74] Y. M. Lee, C. W. Pan, P. Y. Huang and C. P. Yang, “LUTSim: A Look-Up Table-Based Thermal Simulator 
for 3-D ICs,” in IE E E  T ra n sa c tio n s  o n  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C ircu its  a n d  S y s te m s ., vol. 
34, no. 8, pp. 1250-1263, 2015.
[75] T. Y. Wang and C. C. P. Chen, “SPICE-compatible thermal simulation with lumped circuit modeling for 
thermal reliability analysis based on modeling order reduction,” in I n te r n a tio n a l S y m p o s iu m  o n  S ig n a ls ,  
C ircu its  a n d  S y s te m s , pp. 357-362, 2004.
[76] P. Liu, Z. Qi, H. Li, L. Jin, W. Wu, S. X. D. Tan and J. Yang, “Fast thermal simulation for architec­
ture level dynamic thermal management,” in I E E E /A C M  I n te r n a tio n a l C o n fe re n c e  o n  C o m p u te r -A id e d  
D e s ig n ., pp. 639-644, 2005.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
BIBLIOGRAPHY 57
[77] N. Ouzisik, “Heat transfer - A basic approach,” in M c g r a w - H il l  C o lle g e  b o o k  c o m p a n y , 1985.
[78] T. Bergman, B. Lavine, P. Incropera and P. DeWitt, “Fundamentals of Heat and Mass Transfer. ,” in 
W iley  N e w  Y ork, 2017.
[79] Y.-K. Cheng, P. Raha, C.-C. Teng, E. Rosenbaum and S.-M. Kang, “ILLIADS-T: an electrothermal tim­
ing simulator for temperature-sensitive reliability diagnosis of CMOS VLSI chips,” in IE E E  T ra n sa c tio n s  
on  C o m p u te r -A id e d  D e s ig n  o f  I n te g r a te d  C irc u its  a n d  S y s te m s , vol. 17, no. 8, pp. 668-681, 1998.
[80] K. Jbilou and A.J. Riquet, “Projection methods for large Lyapunov matrix equations, ” in L in e a r  A lg e b r a  
a n d  its  A p p l ic a t io n s , vol. 415, no. 2-3, pp. 344-358, 2006.
[81] D. Sorensen and Y. Zhou, “Direct methods for matrix Sylvester and Lyapunov equations, ” in J. A p p l.  
M a th .,  2003, pp. 277-303.
Institutional Repository - Library & Information Centre - University of Thessaly
07/06/2020 16:00:36 EEST - 137.108.70.13
