A fast and retargetable framework for logic-IP-internal electromigration assessment comprehending advanced waveform effects by Jain, Palkesh et al.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1
A Fast and Retargetable Framework for
Logic-IP-Internal Electromigration Assessment
Comprehending Advanced Waveform Effects
Palkesh Jain, Member, IEEE, Jordi Cortadella, Fellow, IEEE, and Sachin S. Sapatnekar, Fellow, IEEE
Abstract— A new methodology for system-on-chip-level
logic-IP-internal electromigration verification is presented in this
paper, which significantly improves accuracy by comprehending
the impact of the parasitic RC loading and voltage-dependent
pin capacitance in the library model. It additionally provides
an on-the-fly retargeting capability for reliability constraints
by allowing arbitrary specifications of lifetimes, temperatures,
voltages, and failure rates, as well as interoperability of the IPs
across foundries. The characterization part of the methodology
is expedited through the intelligent IP-response modeling. The
ultimate benefit of the proposed approach is demonstrated on
a 28-nm design by providing an on-the-fly specification of retar-
geted reliability constraints. The results show a high correlation
with SPICE and were obtained with an order of magnitude
reduction in the verification runtime.
Index Terms— Electromigration (EM), pin capacitance,
reliability, retargeting, signal probability.
I. INTRODUCTION
ELECTROMIGRATION (EM) is a major product agingmechanism revolving around the containment of the aver-
age and rms current densities in interconnects. This, in turn,
requires a cell-external analysis for signals and power nets
connecting to the cells and a cell-internal analysis for wires
within a logic-IP (standard cells) or mixed signal IP block.
Recently, a great deal of innovation and improvement has been
seen on the verification and design strategies for cell-external
signal and power grid EM [1]–[4]. However, there has not
been an adequate focus on the robust design and reuse of
the standard cells. Ensuring EM reliability for standard cells
and IPs in a design implies that the exact context at which
the IP is used must be bounded to guarantee its robustness
in the design. This context could be stated in terms of
design limits (loads, slews, frequencies, and supply voltage), or
reliability (temperature, lifetime, or a failure rate specification
tied to current density limits). Without rigorous assessments,
a set of IPs designed for a particular reliability condition
[e.g., 1.2 V, 105 °C, 100k power-on hours (POH), 0.1%
Manuscript received June 19, 2015; revised October 12, 2015; accepted
November 27, 2015. This work was supported in part under NSF award CCF-
1162267.
P. Jain was with Texas Instruments India Pvt, Ltd., Bangalore 560093, India.
He is now with Qualcomm Technologies Inc., Bangalore 560066, India (e-
mail: palkesh@qti.qualcomm.com).
J. Cortadella is with the Computer Science Department, Universitat Politèc-
nica de Catalunya, Barcelona 08034, Spain (e-mail: jordi.cortadella@upc.edu).
S. S. Sapatnekar is with the University of Minnesota, Minneapolis, MN
55455 USA (e-mail: sachin@umn.edu).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVLSI.2015.2505504
Fig. 1. Typical current density limits as a function of temperature and
lifetime, showing >20× differences between various environments.
cumulative failure, and 10 °C Joule heating (JH) limit]
cannot be guaranteed to be EM-safe at another condition
(e.g., 1.0 V, 115 °C, 200k POH, 0.01% cumulative failure,
and 15 °C JH limit).
Nevertheless, the tradeoffs on these constraints are
increasingly in demand in industry due to accelerated inroads
of semiconductor houses into newer businesses with differ-
ent reliability demands [5]. For example, industrial designs
demand more stringent operating conditions than traditional
computing applications [6], [7]. From an EM standpoint,
meeting these specifications is challenging, as shown in Fig. 1,
which highlights the representative current density per μm2
across various temperature and lifetime specifications. As can
be seen, among the various environments, the current carrying
capability becomes over 20× more stringent. Not only among
different application markets, even for the same system-on-
chip (SoC) itself, different complex IPs (e.g., CPU core or
a DSP) can have different reliability requirements based on
their ON times and temperature specifications. The challenges
increase when such reliability requirements could only be
made available on the fly: that is, either during the final SoC
verification or even after the SoC tape-out; in which cases,
the original reliability targets for the IP, characterized for
one application domain, may not match with the reliability
requirements in a different domain.
One way to meet such diverse specifications is to approach
the design in a bottom-up manner with a fresh logic-IP
portfolio that meets targeted domain-specific reliability spec-
ifications. However, this is very expensive, and economic
1063-8210 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Fig. 2. (a) Traditional approach for EM verification using the safe operating
region concept. (b) Schematic highlighting the EM-critical cell, driving
an RC load network (vis-à-vis safe frequency obtained for pure C load).
and design effort considerations often dictate that the product
integration over all application domains be based on the
same IP portfolio. This implies that the logic-IPs require a
disciplined utilization procedure, making it important to assess
their exact usage boundaries at arbitrary conditions.
A starting point toward this is to ensure that the cell is
EM-safe at a specific load and frequency by selecting wire
widths, so that EM constraints are met. However, this only
implies that a lower load and lower frequency can be con-
sidered EM-safe. The cell may (or may not) be EM-safe at a
lower load and higher frequency, or a higher load and lower
frequency, or a higher load and higher frequency, and this can
only be uncovered through costly detailed analysis.
As an improvement, some industrial implementation
tools [8], [9] use a precharacterized table that models the
tradeoffs in various design/reliability parameters. Fig. 2(a)
shows a representation of one such table, where the x-axis
represents an operating constraint of the cell (load here, but
this could be slew, supply voltage, or any reliability constraint)
and the y-axis represents the alternative constraint (frequency
here), at a baseline reliability condition.
The intuition behind such a table (frequency versus
load; f –L) is simple: the current flow in the IP increases
with the operating load, and hence, the frequency should be
lowered to meet the reliability specification. This model can
be used at the chip level to determine the safe frequency ( fsafe)
of an instance for any design/reliability parameter, and then
make corresponding design fixes. Needless to say, most of
the EM-critical cells are the ones that operate at higher loads,
frequencies, or slews.
However, such a specification is also simplistic and with
advancing technology and convoluted circuit effects, and this
model is inadequate in accurately predicting EM safety for
several reasons. First, the frequency in Fig. 2(a) refers to the
output switching frequency: for multi-input cells, the failure
rate depends on the switching frequency at each input. This
corresponds to a multidimensional space that is computation-
ally expensive to characterize. Second, EM constraints are
often specified in terms of average current density thresholds
or rms [10]. However, having multiple relationships between
operating parameters is infeasible. Finally, while the tradi-
tional f –L model is characterized at purely capacitive loads,
in reality, cells drive RC loads [Fig. 2(b)], and a fast prediction
of cell-internal EM safety under RC loads is an open problem.
Our goal is to address four limitations (L1–L4) associated
with a chip-level cell-internal EM analysis.
L1: Inability to incorporate the impact of arbitrary switching
rates on inputs pins and the effects such as clock
gating: we overcome this by discretely characterizing
the individual current density components (switching or
leakage). In addition, our frequency constraints are self-
consistent, which simultaneously address the average
and rms current density criteria, based on formulations
proposed in [10].
L2: Inability to comprehend RC loads [Fig. 2(b)] and
to model the voltage-dependent pin capacitance (Cin):
we apply intelligent moment-matching-based techniques
as in [3], and propose a novel formulation for
Cin estimation.
L3: Inability to retarget reliability specifications on the
fly for different reliability conditions: we develop the
concept of equivalent stress and present closed-form
formulas.
L4: Nonscalability of cell characterization data for an
entire library due to prohibitive simulation runtimes,
with ∼600 simulations per cell: we perform these sim-
ulations efficiently using intelligent response modeling.
The core methodology of our work naturally enables model
retargeting by separating the current density computation part
from the verification, as against the tight coupling in the model
of Fig. 2(a), where the f –L curve must be characterized
at each reliability condition. In our approach, the reliability
conditions need to be specified in situ: only at the design
verification stage. Moreover, our model can take the operating
frequency ( fop) of an instance as an input, or it can provide
the maximum safe operating frequency as an output.
II. EM MODELING—BASIC FRAMEWORK
UNDER PURELY CAPACITIVE LOADS
A. Electromigration Basics
In this section, we review the key parameters affecting EM.
In our terminology, we refer to metal segments of the IP as
resistors. These resistors are obtained by parasitic extraction,
which retain key information, such as the width, length, and
the metal level for every resistor in the netlist.
Since EM is a statistical process, the time to failure for
metal segments stressed in similar conditions also varies [11].
Industrial markets demand low failure rates (e.g., 100 defective
parts per million over the chip lifetime). Chip reliability
engineers translate this chip-level specification to specific
fail-fraction (FF) targets, in units of failures-in-time (FITs),
on individual resistors.
The classic Black’s equation [11] relates the mean time to
failure (t50, time to failure for half of the population) to the
average current density J across the interconnect cross section
and the wire temperature T as
t50 = AJ−neQ/kB T. (1)
Here, Q is the activation energy, kB is Boltzmann’s constant,
n is the current exponent (typically between 1 to 2), and A is
a fitting parameter.
Black’s equation predicts the time to failure, and in practice,
it is predominantly used to determine the average current
density thresholds to meet a target FF. It has been demon-
strated that FF follows a lognormal dependence on the time
to failure (t f , also known as stress time) [11]. The lognormal-
transformation parameter (z) relates to the time to failure as
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JAIN et al.: FAST AND RETARGETABLE FRAMEWORK FOR LOGIC-IP-INTERNAL EM ASSESSMENT 3
follows, where σ is the standard deviation of the distribution,
which is process dependent:
z = ln(t f ) − ln(t50)
σ
; FF =
∫ z
−∞
e−x2/2√
2π
dx . (2)
The transformation variable z helps in directly represent-
ing the cumulative failure rate with a normal cumulative
distribution function [34]. For example, at stress time
(t f ) = t50, z and FF consistently evaluate to 0 and 0.5,
respectively.
In signal wires, currents flow in both directions, leading to
a limited damage recovery, which can be incorporated by an
empirically estimated recovery factor, ξ , used for adjusting the
computed average current density in Black’s equation [12], as:
Javg = J+avg − ξ J−avg. (3)
Here, J+avg and J−avg indicate the average current density during
current conduction in the positive and negative directions,
respectively. In addition, the wire heating (T ) has an inherent
dependence on the rms current density, Jrms, as
T = cJ 2rms. (4)
Equation (4), with c as a fitting parameter, follows directly
from heat conduction principles. Typically, the limit on the
maximum temperature rise due to JH is a design constraint,
and this place automatically limits on rms current densities.
Pioneering work in [10] combined the two effects of average
EM fails and rms-induced JH in a self-consistent manner
through the concept of duty cycles, making it possible to
simultaneously check both conditions. Thus, given the con-
straints of stress temperature, lifetime, and JH limit, we can
arrive at the EM thresholds that should be met by all metal
segments in the IP. Once we have the EM thresholds in place,
we can embark on the EM verification process across various
resistors in the IP.
It must, however, be noted that fundamentally, EM is
induced by divergence of atomic flux—which is typically
highest at sites, such as vias, contacts, or even points, where
the leads merge. Furthermore, it has been reported in the
literature that even if the incoming atomic flux (signified by
high current density) is high at such sites, the site itself may
not fail due to it maintaining a low divergence; while a simple,
individual-lead-based Black’s equation continues to predict
failure for such a structure. This inefficiency has recently been
revisited by various researchers resulting into the evolution of
alternate paradigms in EM checking [31]–[33]. Fundamentally,
such alternative methods rely on computing some form of
atomic flux divergence at EM-probable sites and, subsequently,
comparing them against set thresholds. One such method, as
reported in [31], is the vector via-node-based method, wherein
the physical and directional interactions among various leads
are incorporated to perform the reliability verification. Notably,
however, the fundamental inputs required to perform these
calculations still remain the individual current density in every
single interconnect of the circuit, along with the additional
information like the circuit topology.
Consequently, we note that even for alternative EM check-
ing methodologies, the discrete current density in individual
interconnects is still the vital input—which is discussed in
more detail in Section III.
Fig. 3. fsafe plot for a two-input clock-multiplexor cell. Both input clocks
switch at 100%, while the select pin chooses one of them, with varying
likelihoods.
B. Traditional Approach for Modeling EM Reliability
We begin by revisiting the traditional approach, as shown in
Fig. 2(a). Given the physical design of the IP, EM verification
requires a model that provides a tradeoff among various
operating conditions such that within the bounds of those
tradeoffs, the IP remains EM-safe. The generation of this
model requires an iterative search: for example, in Fig. 2(a),
at a fixed loading and reliability condition (say, 50 fF, 1 V,
105 °C, 100k POH), an iterative search over the frequency
space is required to determine the maximum fsafe, where all
resistors within the IP are EM-safe. This is computationally
expensive, since each iteration involves an SPICE-simulation-
based verification. A typical optimized procedure requires
ten binary-search iterations at each loading condition. For a
single-input cell, whose operating load/slew space is covered
through an 8 × 8 matrix in the liberty file, the number of
required iterations is ∼64 × 10 = 640 for fixed values of
other parameters (supply voltage and reliability specifications).
To support operation at multiple supply voltages, as well
as IP reuse across application domains, this number must
be multiplied by the number of use cases, resulting in a
formidable characterization overhead.
While this may even be tractable for single-input cells, for
multiple-input cells, this characterization becomes challeng-
ing, not only just from a computational point of view, but also
from the fundamental modeling (L1) viewpoint. To illustrate
this idea, we consider the example of a two-input clock-
tree mux IP block that is used to alternate among clocks
for downstream propagation. The user may examine typical
workloads, use cases, and provide an EM analysis tool with
information about the switching rates of the input pins of a
block. In this experiment, both the input pins (Clk1, Clk2)
switch at 100%, but the select pin is toggled to allow passing
of the first and second clocks in varying amounts (going from
0% to 100% in steps of 25%).
The f –L plots for the five cases are shown in Fig. 3, and
show a variation of up to 45% in fsafe estimates, depending
on how often Clk1 or Clk2 is selected over the lifetime, but
the traditional model will choose the pessimistic fsafe over all
cases. We also see that while at very low loads, select = 100%
(meaning Clk2 being transmitted) results in least EM stress,
it switches over as the load increases. This switch results
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
from the interplay of various currents (short-circuit and the
output switching current) in the circuit. Such an asymmetric
response can only be captured by the traditional model by
individually generating and storing the f –L data for various
input excitations, which is computationally expensive. Further,
effects like clock gating are not straightforward to handle in
the traditional model.
Another significant drawback (L2) with the traditional
model is the fact that it has been generated using a
lumped C load, while real applications involve RC loading.
Due to resistive shielding effects, a direct application of the
traditional model to assess reliability of instances that drive
RC loads turns out to be severely pessimistic. Finally, we
also note that the traditional model is locked to a particular
reliability specification (supply voltage, temperature, lifetime,
and failure rate target), and is incapable in allowing a tradeoff
on these (L3), unless, the f –L data are regenerated along these
vectors, which becomes computationally unaffordable for the
entire library (L4). With the above background and detailed
understanding of the traditional model (including generation,
usage, and associated limitations), we now look at building the
proposed model, which can address the various limitations.
III. ADDRESSING L1—INCORPORATING ARBITRARY
SWITCHING AND CLOCK GATING IN
FREQUENCY ESTIMATION
A. Library Level Current Density Characterization
In order to build the model that can help predict the
reliability of an IP for arbitrary switching scenarios, we begin
in an ab initio manner by trying to classify the current flow
in the IP as either leakage or switching current. We observe
that for a combinational IP with m inputs, 2m distinct static
states (various combinations of input pins at logic 1 or 0)
are possible. Each of these states can have a different leakage
flow. In addition, based on the IP functionality, there could
be several paths (later referred to as arcs) from an input pin
resulting in an output transition. Every such output transition
causes a switching current flow in the IP-internal resistors
(belonging to the resistor-set ).
Thus, the first step in our approach is to discretely charac-
terize the current flow: average and rms, both through every
resistor R in the IP (resistor-set ), in every legal state
(for leakage current), or arc through the cell (for switching
current). Such a characterization will be used to compute the
eventual effective current density through any resistor of the
cell as a weighted summation of the current densities in unique
scenarios, coupled with the information of arc switching rates
and probabilities of legal state occurrences.
The salient feature of our characterization is that it remains
independent of the reliability condition, which is actually an
input during chip-level verification. As the leakage current
density in the cell only depends on the static states of inputs,
we can easily obtain the current density through  by cycling
through all possible input states in SPICE (note that average
and rms remain the same due to dc nature of the waveform).
On the other hand, switching current densities are tied to a
particular input-pin to output-pin combination (also referred
to as a timing arc), through a fixed cell-internal path, with
other inputs in noncontrolling states enabling the transition.
For example, for a three-input AOI gate (Y =!(A + BC)),
the output Y can fall because of a rise on A in three different
states of BC , namely, 00, 01, and 10. Hence, for this particular
A → Y arc, the current density must be computed through R
for these three states of BC . We can leverage the simulation
framework of industrial timing characterization systems [9],
to obtain information about all such arcs and states through
the cell. For a particular arc i and associated noncontrolling
state k, we denote the time duration over which this current
density is calculated as sik . A similar convention is followed
by Javg,Rik and Jrms,Rik to define the average and rms current
densities through R. As we leverage the timing characteriza-
tion framework, we do not recompute sik , but reuse it from
the timing analysis step [8], [13]. Moreover, sik is typically
greater than the delay itself and, therefore, accurately captures
the tail effects.
B. Effective Current Density Estimation
for a Chip-Level Instance
After characterizing the leakage and switching current den-
sities for various arcs and states, we now present the calcula-
tions for the effective average and rms densities in the circuit.
1) Effective Leakage Current Density Through a Resistor
Across All States: For an m-input gate, let the leakage current
density through resistor R for a state k (of 2m states) in
the positive (negative) direction be denoted by L+Rk [L−Rk ].
Then, the average effective leakage current density (Lavg,R)
covering all the states and incorporating recovery (3) would be
Lavg,R =
l∑
k=1
P+k L+Rk − ξ
⎛
⎝2
m−l∑
i=1
P−k L−Ri
⎞
⎠. (5a)
Here, l is the number of states with a positive current density,
and P+q [P−q ] is the probability of occurrence of state q , in
which the current flows in the positive (negative) direction.
These probabilities are a function of the duty cycle at the
inputs of the gate.
The rms effective leakage current density is given by
L2rms,R =
l∑
k=1
P+k L
2+Rk +
2m−l∑
i=1
P−k L
2−Rk . (5b)
2) Effective Switching Current Density Through a Resistor
Across All Switching Arcs: In similar spirit, the effective-
average-switching current density through R (Javgsw,R) is
given by
Javgsw,R =
all arcs∑
i=1
(
all states∑
k=1
Pik Javg,Rik
sik
Tclk
)
. (6)
Here, Pik and Tclk are the design-level parameters—the switch-
ing probability of the particular arc and the switching period,
respectively. The scaling factor, sik/Tclk, translates the charac-
terized current density (Javg,Rik ), which was averaged during
the characterization over the switching duration sik , into the
entire clock period. This scaling factor accounts for the fact
that the current is inactive during the remainder of the clock
period. Similar calculations for rms current density (Jrmssw,R)
yield
Jrmssw,R =
√√√√all arcs∑
i=1
(
all states∑
k=1
Pik J 2rms,Rik
sik
Tclk
)
. (7)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JAIN et al.: FAST AND RETARGETABLE FRAMEWORK FOR LOGIC-IP-INTERNAL EM ASSESSMENT 5
3) Effective Average and RMS Current Densities: After
computing the effective switching and leakage current densi-
ties independently, we must now compute the effective average
and rms current densities. In a normal design flow, the chip-
level probabilistic activity propagation tools already provide
the effective switching rate [ fik = (Pik/Tclk)] for any given
arc i and associated noncontrolling state k, along with the state
probabilities [P+q in (5a)] for all gates of the design.
Since (5)–(7) discretely describe the leakage and switching
current densities, we can sum them to derive the effective
average current density (Javg,R) and add them in an rms
manner to derive the effective rms current density (Jrms,R)
for any resistor R in the cell.
We compute the average and rms current densities by
consolidating (5)–(7) as
Javg,R = Javgsw,R + Lavg,R (8)
J 2rms,R = J 2rms,sw,R + L2rms,R . (9)
It must be mentioned here that rms formulations work
under the assumption that the different current densities
(leakage and switching) are nonoverlapping. This strictly is
not true; however, we find that this assumption leads to very
marginal errors. Next, we look at incorporating clock gating
in the formulations.
4) Incorporating Clock Gating: Clock gating is a widely
used technique for reducing the dynamic clock power by
disabling the clock signal to the idle parts of the circuit—
thereby also directly affecting the reliability of the signals in
the gated domain [28]. In order to assess the reliability impact
of clock gating, we notice that as a phenomenon, clock gating
can occur in an arbitrary way over the lifetime of the chip.
For instance, the clock could be gated for a fixed number of
cycles, after every specific period of activity, in a repeated
manner. Such uniform gating is akin to a direct reduction in
the operating frequency and can be readily approximated by
specifying the activity-rate-adjusted frequency in (8) and (9).
However, the cases when the clock gating is nonuniform,
or is uniform only in the intervals, are nontrivial and require
equivalent reliability-lifetime calculations. The key determi-
nant in such calculations is the thermal time constant of
JH in interconnect (typically in several microseconds for
copper [30]), which signifies the duration after which the
interconnect responds to the rms current in the form of a tem-
perature rise. Hence, if the time interval between successive
clock gating events is larger than the thermal time constant,
then the full current (without activity correction) should be
ideally used for rms and average density estimations, for the
appropriate durations.
We will defer the treatment for nonuniform clock gating to
Section V-B (subsequent to incorporation of arbitrary reliabil-
ity specifications), and only focus the formulations now for the
uniform case. This makes the solution similar to setting a pin
specific activity rate on the cell. Hence, if a 1-GHz clock-tree
element remains gated-high for 25% of the lifetime, we would
note the corrected fik as 750 MHz in (8) and (9), and state
probability as 0.375 (assuming 50% duty cycle for clock). The
computation procedure can thus be captured as follows.
Algorithm 1 Current Density Computation Through Every
Resistor
C. Instance Safe Frequency Estimation at Chip Level
Once we have estimated the current densities in the cell,
the EM checking procedure can subsequently be approached
in two manners, as noted in Section II earlier.
1) Predict the safety of the cell (pass or fail), given a full
set of operating conditions of the cell.
2) Calculate a set of safe operating parameters for the cell
under a partial set of operating conditions. For example,
if the frequency, slew, and supply voltage are given, the
safe load may be computed.
The first is rather trivially obtained from the above discussion,
since (8), (9), and Algorithm 1 lend themselves readily to
allow substitution of the exact operating conditions, and sub-
sequent verification of current densities (through all resistors)
against the foundry EM thresholds.
In real designs, however, the actual operating frequency of
the instance could be arbitrary, and we must work the problem
backwards by recommending a maximum fsafe based on other
parameters. In contrast to the f –L data of Fig. 2(a) obtained by
iterated binary-search SPICE simulations, our approach here
provides closed-form solutions for fsafe.
It must be noted that potentially, every resistor in the cell
could have a unique frequency dependence, and therefore,
the maximum fsafe procedure must find the minimum safe
frequency over all resistors in the instance.
Let Javg,th(T, t) and Jrms,th(T ) represent the current den-
sity limits for average and rms current densities, respectively,
as a function of stress temperature, stress time, and maxi-
mum heating constraint. Furthermore, note that in (5)–(7),
the dependence on the frequency f = 1/Tclk only appears
in the expressions for the average and rms switching current
densities. By setting the left-hand sides of (8) and (9) to be
no larger than the threshold densities and combining them
with (5)–(7), we can constrain the rms or average-limited
frequencies ( fmax,AVG,R and fmax,rms,R , respectively) for each
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Algorithm 2 Self-Consistent Safe Frequency Estimation of the
Instance
intracell resistor R in the following manner:
fmax,AVG,R = Javg,th(T, t) − Lavg,R
all arcs∑
i=1
(
all states∑
k=1
Pik Javg,Rik sik
) (10)
fmax,RMS,R =
J 2rms,th(T ) − L2rms,R
all arcs∑
i=1
(
all states∑
k=1
Pik J 2rms,Rik sik
) . (11)
Since all parameters on the right-hand sides of the above
equations are known for each resistor in each instance, we can
now apply the self-consistent formulations [10] to estimate the
safe parameter (frequency) of the resistor. The entire process
has to be approached iteratively, as shown in Algorithm 2, to
determine the safe operating frequency for an instance, which
can then be used as a design constraint. The safe frequency
for a resistor is the lower of the two values in (10) and (11),
and the safe frequency fsafe for a cell instance is the smallest
safe frequency over all resistors in the instance.
To evaluate this procedure, we revisit the two-input
clock-tree mux from the earlier discussion around Fig. 3.
Fig. 4 shows the fsafe plot for this case, for a fixed oper-
ating condition and output load, showing the results of the
binary-search-based SPICE simulation, our approach, and the
traditional method that chooses fsafe pessimistically over all
switching conditions. We see that the proposed model fits the
SPICE behavior very well and can model the arbitrary switch-
ing rates on different pins, as against the large pessimism in
the traditional approach.
While the results shared in Fig. 4 were from a single cell,
consolidated results from the entire 28-nm design library will
be shared later in Section VII. It must also be noted that
thus far, we have demonstrated Black’s equation (1)-based
EM verification. However, as our methodology aptly decouples
the current density computation and the verification part, it
easily lends itself to other EM verification schemes, such as the
via-node-based scheme (Section II-A).
Fig. 4. Evaluation of fsafe for the circuit in Fig. 3, at a selected load
point: 0.45×. The value of fsafe varies based on the extent of switching
coming from the first or second pin. The proposed model completely captures
the behavior, but the traditional is excessively pessimistic.
IV. ADDRESSING L2: MODELING THE IMPACT OF
ARBITRARY RC LOADING
The model developed so far is capable of covering fol-
lowing parameters: lumped capacitive load (C load), slews,
multi-input gates, arbitrary switching rate, and clock gating.
This is directly relevant at the chip level, when the IPs are
used at arbitrary frequencies and under clock gating. Next, we
look at incorporating RC load into the assessment.
A. Overview of Prior Work
In Section III, we used the lumped load as one of the metrics
for EM reliability. To a great extent, the C load model in itself
can be used for accurate estimation of average current density,
and is largely independent of resistive effects [3], provided the
rail-to-rail swings for the output net.
On the other hand, RMS current densities additionally
depend on the duration of transfer, and are thereby directly
impacted by the resistive effects of RC loads on the cell [14]–
[19]. The effect of RC load [Fig. 2(b)] for signal EM reliability
was addressed earlier in [3]. It was further established that
resistive shielding cannot be accounted using the traditional
Ceff approach, derived from timing constraints. Hence, a
current-criterion-based moment matching was devised to come
up with a Ceff, by performing the RC tree traversal along with
the basic timing information.
B. Prior Work: Limitations
We notice that there are at least two limitations of the prior
work associated with both the traditional model [Fig. 2(a)] and
the model proposed in Section III.
First, RC loads affect the current flow in all segments:
cell-external and cell-internal. While the cell-external problem
was solved in [3], the cell-internal piece of the problem has
remained unsolved. In fact, it was proposed to simulate the
entire active network with the actual distributed load (at tran-
sistor level) through SPICE. As we will see later (Section VII),
the number of such simulations required for a block/SoC
could run in thousands, becoming a major computational and
logistical overhead.
Second, we notice that not just the effective capacitance,
but also the lumped capacitance depends on the current wave-
form shape—making the load capacitance itself as voltage-
dependent. This can be explained by the fact that the pin
capacitance has an inherent voltage dependence [20]. Hence,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JAIN et al.: FAST AND RETARGETABLE FRAMEWORK FOR LOGIC-IP-INTERNAL EM ASSESSMENT 7
Fig. 5. Error in the rms estimates (versus SPICE) for various Cin modeling
approaches and waveform types (x-axis; going from fully ramp to fully
exponential).
even though there is no explicit dependence of the average
current on the network resistance [3], it implicitly exists
because of the dependence on Cin(V ). Therefore, assuming a
fixed value of Cin for performing current density calculations
on a net becomes very pessimistic. We now attempt to solve
both problems.
C. Proposed Solution: RC Loading and Cin Modeling
We begin by observing that the basic challenge for cell-
internal EM arises from the fact that the characterization of the
fundamental current densities (through Algorithm 1) must be
performed using the single C load values, but the data must be
applied to instances that drive RC loading. Hence, we require
a good proxy of the RC load, which can be used to query
the characterized data. Extending the concepts developed in
[3], if we use Ceff to query only the RMS component of
the current density from the precharacterized data, an accurate
match can be achieved. Indeed, we do see a reduction in error
(compared with SPICE) in this manner: as we will later see
in Fig. 6, the mean error of about 2X in RMS estimation
reduces to about 20% with the Ceff incorporation. However,
there still are outliers and upon detailed investigation, a
majority of them are attributable to Cin modelingof the load
pins.
Next, as an improvement, we also compared the current
densities derived from the case, when the load cells were
modeled as a C1/C2 combination, where C1 represents the
pin capacitance from 0%–50% swing of the voltage, and C2
from 50%–70% [21]. However, we notice that at an individual
load pin level itself, this approach does not yield high accuracy
due to ignorance of the tail effects [3], [22].
Hence, we propose calculating an effective Cin (Cin,eff) from
the multipiece Cin(V ) table [Fig. 5; typically eight points].
Since Cin is a function of the voltage waveform, which,
in turn, is a function of Cin, the entire computation must
be carried out in an iterative manner. Accordingly, in the
kth iteration, we make use of the starting current waveform
[as incident on the load cell Li of Fig. 2(b)]. Such a current
waveform (ILi,k (t)) is obtained through a single Cin,k and uses
a double exponential model with estimated parameters—A0,k,
Ta,k , and Tb,k . The estimation of these parameters is performed
by RC-tree traversal and moment matching technique with
assumption on the waveform shape at the driver (a mixture of
ramp/exponential) [3]. The current waveform is modeled as
ILi,k (t) = A0,k(e−t/Ta,k − e−t/Tb,k ). (12)
Subsequently, the voltage waveform VLi,k (t), as seen on the
load pin, can be generated as an area under the curve of this
current waveform, using a constant Cin,k
VLi,k (t) =
1
Cin,k
∫ t
0
ILi,k (t
′)dt ′. (13)
This voltage waveform can then be used along with the varying
Cin: Cin(V ) table to reconstruct a new current waveform
I ′Li,k (t) as
I ′Li,k (t) = Cin(V )
d
dt
(VLi,k (t)). (14)
Note that only an update in the current waveform at the
load pin is required, since we are interested in the current
specifically at this point. Assuming the duration of this current
waveform as d (approximated by the corresponding 0%–100%
slew at the load pin obtained through STA), its rms is given by
RMS for I ′Li,k (t) =
√
1
d
∫ d
0
I ′2Li,k (t)dt . (15)
Note that for the next iteration, we require an updated value
for Cin. Hence, we make use of the rms current through I ′Li,k (t),
to derive a single effective Cin, assuming an equivalent triangu-
lar current waveform (with d being the delay at the load pin).
For such a triangular waveform, the rms current expression
is standard:
√
(4/3)(CV /d), where C is the equivalent load.
In order to obtain an equivalent pin capacitance that can match
the rms current of I ′Li,k (t), we equate (15) to the rms current
of triangular waveform, to get below capacitance (to be used
for the next iteration) as
Cin,k+1 = dV
√
3
4
∫ d
0
I ′2Li,k (t)dt . (16)
We expect convergence in 2–3 iterations, though, for our
work, we have made only a single update to starting Cin.
In similar way, the average current case can be approached.
While this means that we must ideally compute two separate
capacitances: namely Cin,eff,rms and Cin,eff,AVG our experi-
ments indicate acceptable errors for the average case, and
hence we do iterative computation only for RMS matching.
In summary, we accurately incorporate the impact of the
voltage-dependent input-pin capacitance as well as the impact
of parasitic RC loading on the cell-internal current densities
by the following.
1) Making an initial estimate of the current at the driving
points and iterating with the load’s voltage-dependent
pin capacitance to arrive at the final current flow.
2) Estimating the effective capacitance (Ceff), which
matches the final current flow in the network.
3) Using this Ceff to query the precharacterized cell-internal
rms current density database and C load for querying
AVG cell-internal current densities.
Note that in absence of this method, we would have used
C load to query the cell-internal AVG as well as rms current
densities, which is very pessimistic. Note also that the for-
mulations for incorporating voltage-dependent pin capacitance
automatically improve the accuracy of cell-external current
densities as well.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
D. RC Loading and Cin Model Validation: Results
We perform validation at multiple levels of the RC and Cin
modeling approaches. First, we validate the Cin approach at the
load-level circuit, followed by the validation of the combined
RC loading and Cin modeling in the driver-load pair case
[Fig. 2(b)] and followed finally by the results from several
driver instances (driving unique RC loads).
We begin by showing the load-level comparison first,
for effective Cin estimation (versus SPICE) for different
Cin models [Fig. 5]. This comparison is at the load circuit
level, where we apply a voltage waveform at the load pin
and the load cell is modeled as: 1) single Cin; 2) a two-piece
voltage-dependent capacitor in SPICE [23]; and 3) a single
Cin,eff [obtained from (12)–(16)].
Moreover, as discussed earlier, since Cin is a function of
the starting input voltage waveform, we have computed the
errors for different types of input voltage waveforms (the
x-axis represents waveforms going from fully ramp to fully
exponential), whose shape is controlled with the coefficient a
in below equation, with Tr being the rise time
Vin(t) =
⎧⎨
⎩
a(1 − e−t/Tr ) + (1 − a) t
Tr
, t ≤ Tr
1 − ae−t/Tr , t > Tr .
(17)
Hence, setting a to zero in (17) results in a fully saturated
ramp input waveform, whereas setting a to unity makes it
complete exponential. Such a formulation is a good represen-
tation of the various input waveforms that can be incident on
the load pin.
As we can see from Fig. 5, the traditional approach of single
Cin leads to almost 2× error as compared to SPICE. The error
reduces using the C1/C2 model, but it still remains unaccept-
able, and the effective capacitance computation approach from
an eight-piece piecewise-linear table fits the SPICE results in
a better solution. We can also see that because of increased tail
effects in the exponential input voltage waveform, the errors
are higher for all models for a completely exponential case.
Fig. 6 shows the maximum error from several instances
(which drive different RC loads; plotted on the x-axis).
While the left y-axis shows the errors, the Ceff/C-load ratio,
an indicative of the extent of resistive load the instance
is driving, is plotted on the right y-axis. The exact set of
instances and their driving RC load information are obtained
from a 28-nm production design. We show the comparison
of the traditional case (using lumped load for current density
querying), Ceff model alone and the combined Ceff + Cin
model.
Overall, among all cases, we find about 2× mean error in
rms current density estimation with the usage of lumped load,
which drops to ∼21% mean with the usage of Ceff model,
and further down to ∼7% mean error with the combined
usage of Ceff and Cin model. We also see that for instances,
driving severely resistive loads (indicated by the ratio of Ceff
to C load), the original error with C load usage is very high,
with outliers that cross 50% error.
Algorithm 3 shows the final procedure for estimating the
accurate cell-internal currents for arbitrary loading.
Thus, we have examined the impact of RC loading on the
EM reliability, and demonstrated significant improvement in
accuracy with the proposed method.
Fig. 6. Maximum error in rms current density estimation across several
instances driving different kinds of RC loading (indicated by the
Ceff/C-load ratio) at the design level.
Algorithm 3 Accurate EM Verification Considering RC Loads
V. ADDRESSING L3—ON-THE-FLY RETARGETING OF
RELIABILITY FOR ARBITRARY SPECIFICATIONS
The formulations of Sections III and IV were dependent on
the library data characterized at one set of operating condi-
tions, and the foundry EM thresholds at a specified reliability
condition. However, as described in Section I, there is an
increasing need for on-the-fly reliability retargeting, at design
verification stage, as the IP library is used under different
reliability conditions. As noted earlier, meeting this goal is
impractical under the traditional methodology, as it requires
a new characterizations of the entire IP library [Fig. 2(a)] at
each new condition.
The core methodology of this paper enables the ability to
perform this retargeting efficiently, since the current density
computation part is separated out from the verification part
(whereas these are tightly coupled in the traditional approach).
We begin with the fundamental relation between the
EM lifetime and the lognormal variable. From (1) and (2),
taking logarithm, we obtain
σ z = ln(t f ) − ln(A) + n ln(J ) − QkB T . (18)
Now, if we have two different sets of stresses, denoted
by subscripts a and b, each is described by the same fitting
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JAIN et al.: FAST AND RETARGETABLE FRAMEWORK FOR LOGIC-IP-INTERNAL EM ASSESSMENT 9
parameter A, but other terms in (18) may differ. Naturally,
their reliability is related as follows (by substituting the
parameter A):
σ zcond,b = σ zcond,a + ln
(
tb J nb
ta J na
)
− Q
kB
(
1
Tb + Tb −
1
Ta + Ta
)
. (19)
Here, the variables t , J , T , and T represent the stress
time, current densities, stress temperature, and JH, respec-
tively, while the subscripts a and b refer to the two different
conditions.
This equation is a powerful representation of the scaling
factors that can either be used to assess: 1) the required trade-
offs in new reliability conditions to meet the same FF levels or
2) the actual FFs at the new reliability conditions. For example,
we can directly use above equation to find the equivalent stress
time (tb) that causes the same reliability loss as benchmark
condition, but with increased current densities. In order to do
so, we must set zcond,b = zcond,a , since the reliability loss
has to be equated and obtain the equivalent lifetime tb as a
function of (ta , Ja , and Jb). Obviously, if Jb > Ja , tb will be
estimated to be lower than ta .
We now look at the application of the retargeting concepts,
based on (19), to some of the case studies, followed by
application to nonuniform clock gating. Unlike uniform clock
gating, which was previously treated with generic activity
reductions in Section III-B, nonuniform clock gating requires
a more accurate sliding window-based analysis, wherein every
frame potentially becomes a new reliability condition.
A. Case Studies Incorporating Reliability Retargeting
1) Case I (Variations in Temperature): If the use temper-
ature and/or POH specification are different from the origi-
nal conditions, then it is straightforward to address this by
using (19) to determine new current density thresholds, and
then updating fsafe in (11). Such a modification only affects
the average, and not the rms reliability.
A second situation is the common industry scenario when
the stress profile is provided by the user as a temperature pro-
file, as the series {(J1, T1, t1), (J2, T2, t2), . . . , (Jm, Tm , tm)},
i.e., from time tk−1 to time tk , it experiences current stress Jk
at temperature Tk . If the baseline stress is characterized for J0
at temperature T0, then it can relate the kth stress vector to the
baseline stress at (J0, T0) with an equivalent stress time tk,0.
In other words, the stress at temperature Tk is transposed to
an equivalent stress time at temperature T0. Consequently,
our stress retargeting scheme will map the entire stress to
(J0, T0, teq,0), where teq,0 = ∑mk=1 tk,0.
2) Case II (Variation in Operating Voltage): If the eventual
use voltage of the library is different from the characteri-
zation voltage, current scaling must be performed. Such a
scaling is straightforward in our framework, since the leakage
and switching related components are separately stored, as
described in (6) and (7). Based on our experiments, we see
that a linear scaling works very well for voltage scaling, while
an exponential model is required for leakage. Note that this
scaling must be performed for every discrete component of
the current densities for every resistor in the circuit (6), (7).
Fig. 7. Demonstrating on-the-fly retargeting of the basic frequency-load
curve [Fig. 2(a)] with changes in the constraining criteria (at a fixed
slew point).
Fig. 8. Validation of retargeting methodology versus SPICE for two
conditions, curves c) and e), of Fig. 7.
A second situation (arising due to power management
scenarios like dynamic voltage frequency scaling [24]) is when
the voltage is represented as a series: {(V1, t1) . . . ((Vm, tm)}.
In such a case, we can follow the scaling procedure to obtain
a series of currents, which can then be dealt in the same way
as the earlier case.
3) Case III (Variation in Failure Rate Specification):
Javg,th(T, t) in (11) is really a function of the FF, which, in
turn, is a function of z (2). Therefore, ztarget is the inverse
function of FFtarget. Hence, if the FF specified by the end user
changes from, say, 0.1% to say 0.01% cumulative, it can be
readily translated into z, translated into a current density limit
using (19), and then used in (11) for verification.
Fig. 7 shows a graphical representation of such a retargeting
using the proposed model from a representative cell. For ease
of exposition, we represent our model at a fixed slew, as shown
in Fig. 2(a).
In Fig. 7, curve a) represents the reliability at the baseline
condition. If the FF requirement of the design changes and
drops to 10% of the original, the curve slides down to curve b)
due to reduction in EM capability at tighter FF requirement.
The drop is not drastic as this specific IP is rms-current-
limited, rather than being limited by the average current
density. Similarly, if the use voltage has a 150-mV overdrive
over the characterized value, the reliability is represented by
curve c), which shows degraded reliability due to increased
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Fig. 9. Representative clock activity profile for a large duration.
Different sampling windows show different activity rates (and corresponding
Javg and Jrms).
current flow. Similar behavior is shown in curve d) if the
JH (rms current density specification) is tightened by 5 °C.
Finally, if the temperature requirement becomes 20 °C higher,
design closure becomes more challenging with the reliability
being now represented by the curve e)—almost 3× tightening.
The case study in Fig. 7 is handled very naturally in
our approach. We reiterate that handling them in the tradi-
tional approach require a complete recharacterization of the
fsafe model at various conditions.
Next, to validate the retargeting methodology, we directly
compare the curves of Fig. 7 to the curves of the traditional
methodology (obtained by the actual characterization at the
exact condition). As earlier, we present results from a single
representative cell.
We show the percentage error for two conditions,
curves c) and e) in Fig. 8. For curve e), where the temperature
specification is altered, the required retargeting only affects
the verification part (as the current density limits are scaled),
which incurs little error. For curve c), the retargeting is due to
150-mV overdrive, where we use a more approximate current-
scaling model. The error here, although high, is acceptable,
considering the fact that it is in a lower load regime (usually
a low-current, EM-safe zone).
B. Incorporating Nonuniform Clock Gating
The case studies of Section V-A were helpful in outlining
a general thought process on approaching the problem, when
the eventual use scenario is different from the baseline one.
We now consider the extension of those principles to the
problem of clock gating, which was previously (Section III-
B3), analyzed under uniform assumption. In order to do so, a
key input required is an activity profile of the design over
several clock cycles, as shown in Fig. 9 [25]. We noted
previously that the thermal time constant is a key determinant
in addressing the non-uniform clock gating [26], [27], and
any change in current profile (of a larger duration), should
be handled individually, and cannot be combined as a time-
weighted summation.
Hence, we follow a sliding window approach (with the
duration as the thermal time constant), wherein the complete
clock activity profile is scanned in a step-by-step manner.
For every single time window scanned, we compute the
effective activity rate, which can then be used to compute
the cell-internal current densities through every resistor, arcs,
and states of the cell. Eventually, for a resistor R, in the
Fig. 10. Variation in reliability based on the extent of uniform clock gating
in the first half and the second half of the stress time.
Fig. 11. Comparison of the response modeling approach (14) with full
SPICE (red curve). fsafe obtained through response modeling (blue curve).
i th arc and kth state, we can represent the current densities
as: {(Javg,Rik, ,0, T0), . . . , (Javg,Rik, ,m , Tm)..}, where the index
m refers to the index of the sliding window. Clearly, every
window can have a unique activity rate. For instance, in Fig. 9,
the sampling windows Sa , Sb, Sc, and Sd correspond to a 75%,
50%, 100%, and 67% activity rate, respectively. This current
stress can then be collapsed into a single equivalent stress,
based on the concepts developed in the earlier discussions and
using (19).
Indeed, for a variety of examples considering clock gating,
we can notice a significant difference in the reliability. Fig. 10
shows the normalized reliability (of the clock-tree element)
based on the extent and the nature of the clock gating.
For this experiment, we considered a single clock-tree
element, which underwent different kinds of clock gating,
though all amounting to a net 50% duration of gating in
the chip lifetime. For example, the third column in Fig. 11
corresponds to a case where the clock remains 25% uniformly
gated (meaning gated every one in four cycles) the first half of
stress time, followed by 75% gated in the other half; and so
on, for the other cases. After reliability computations based
on (19), we plot the equivalent stress times for all cases,
considering the (50%, 50%) case as the baseline.
As we can see, for the same clock gating duration among
all cases (50%), the worst case reliability occurs for the
case in which the full-throttle events are clustered together—
thereby meaning maximizing the average current, as well as JH
together. On the other hand, if the clock gating is completely
uniform, the JH is lowered, causing the least reliability loss.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JAIN et al.: FAST AND RETARGETABLE FRAMEWORK FOR LOGIC-IP-INTERNAL EM ASSESSMENT 11
Algorithm 4 Incorporating Nonuniform Clock Gating
For the sake of completeness, we also note that for the same
case, a free running clock corresponds to an equivalent lifetime
of ∼3× as compared with the (50%, 50%) case. Thus, we can
capture the algorithm to incorporate the exact clock gating
impact in Algorithm 4.
It must be mentioned that such a profiling data is hard to
come by in real designs. Therefore, in absence of information,
it is recommended to either assume no clock gating or assume
clock gating in the nonuniform manner.
VI. ADDRESSING L4: ACCELERATED DATA GENERATION
USING CELL-RESPONSE MODELING
Having looked at the various determinants of cell-EM
reliability and ways to incorporate them in our model, we now
look at expediting the characterization process. As discussed
in Section III-C, for the traditional methodology, the safe
frequency estimation requires 640 SPICE simulations per cell.
Indeed, complete data generation for a production 28-nm
library, consisting of a few thousand cells, can run into days of
effort. Such high runtimes for just a single baseline reliability
condition make the process of EM characterization prohibitive
under the traditional model. Although the efficiencies sug-
gested earlier in this work can greatly reduce this overhead,
it is still essential to use the baseline operating condition
and characterize the current densities using (6) and (7): a
process that can be very compute-intensive when carried out
for all load/slew conditions. Hence, we must optimize the
characterization process and in this work, we use response
modeling approach
In the retargeting discussion of the last section, we noticed
that the traditional methodology is inflexible, since it com-
mingles the processes of current density computation and
EM verification. For the same reason, it also does not lend
itself for application of response modeling. The challenge here
is twofold.
1) From the circuit point of view, operating parameters,
such as load and slew nonuniformly, affect the individual
rms and average resistor current densities in various arcs
and states.
2) At the same time, the reliability specifications like
lifetime and FF requirements nonuniformly influence the
average and rms current-density thresholds.
Both of above eventually cause the average and rms-limited
frequencies (discussed in the self-consistent estimation in
Algorithm 2) to be asymmetrically impacted, thereby mak-
ing the traditional frequency-level abstraction as nonscalable
across load/slews (illustrated for example in Fig. 11) and
reliability specifications (discussed earlier in Fig. 7).
On the other hand, a key feature of our approach is to
keep characterization and verification disjoint, which presents
an opportunity for model building during the characterization
phase and accelerate the data generation process.
As noted in Section IV, average current flowing through the
resistor is purely a function of the total charge transferred
(lumped load), while the rms current density also has an
inverse relationship with slew [3]. Based on these observa-
tions, we attempt to model the current density through any
given resistor in the IP as a polynomial function of output
loads/inputs slews
JR j = a0+b1L + b2 L2 + c1s + c2s2 + d1Ls + d2L2s2. (20)
Here, ai , bi , and ci are fitted coefficients, and L and s
are the loads and slews, respectively. We identify seven
critical points (in the 8 × 8 load/slew matrix) that help to
shape up the polynomial model: the four corners, (1, 1),
(1, 8), (8, 1), and (8, 8), and a few internal points (2, 4),
(4, 4), and (4, 2), where the indices represent the index of
the load and slew, respectively, in the table. The parameter
fitting is then performed, based on (20), providing a model
to predict the current densities at any arbitrary load/slew
point. Note that the response modeling must be performed
for every current density component (6)–(9) of the resistor R.
The number of models corresponds to the total number of
unique arcs and states of the cell. For example, for a single-
input clock-tree inverter, we would require a total of four
simulations: two to cover the arcs (input rise to output
fall, and vice versa), and two to cover the static states
(input high and input low).
We now examine the validation of the response model
for a representative IP cell in Fig. 11; the results from the
entire library will be presented in an end-to-end manner in
Section VII.
For various load/slew points on the x-axis, we first develop
the characterization data based on the full SPICE simulations
[using (6) and (7)]. Subsequently, the model from (20) is built
using the simulation data from seven sampled points, and later
evaluated at each of the 64 load/slew points. The normalized
fsafe is plotted on the left y-axis, and the error between model
and SPICE is plotted on the right y-axis.
The nonmonotonic behavior of fsafe with load/slew can be
readily observed from this plot. Such nonmonotonicity arises
from the fact that at different load/slew indices, the metal
segments that limit the EM performance of the cell varies. For
example, at a fixed load condition (say 100 fF), a lower input
slew (∼25 ps) would mean a large rms current in the output
signal resistors, while a smaller short-circuit current in the
power-ground resistors. On the other hand, a higher input slew
(∼200 ps) means vice versa. Thus, for sharp input slews, the
output signal resistors may often limit the cell reliability (due
to rms constraint), while at the sluggish slews, the cell-internal
power-ground resistors may be limiting (due to the average
constraint). Such an interplay finally leads to a nonmonotonic
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
TABLE I
RUNTIME COMPARISONS WITH PROPOSED AND TRADITIONAL
METHODS FOR A SINGLE CELL
fsafe behavior of the cell with load/slews.
Our methodology, however, only works at the current den-
sity level and, hence, remains unaffected by the reliability
constraints that bring in the nonmonotonicity. Using the rep-
resentation from (20) (for every resistor, per arc state), we can
readily obtain the current densities at the chosen load/slew
condition, and can subsequently use those to compute the safe
frequency of the cell by using Algorithm 2, which additionally
requires the reliability condition. Consequently, we can cover
all the load/slew points to get the safe frequency plot, and as
we can see, the response modeling approach works reasonably
well in predicting the current densities and fsafe, with an
acceptable marginal error. Next, the runtime impact for a single
cell is summarized in Table I.
As we see, the characterization runtime for our approach
drops down significantly as compared with the traditional
methodology, which takes ∼10 min to generate the safe
frequency data for all load/slew points, whereas the proposed
methodology (response modeling) is completed in about a
minute. It must be mentioned here that the number of simu-
lations required in the traditional methodology grows linearly
with the number of design/reliability conditions required (as
discussed in Section V). For example, every change in the
voltage, stress temperature, or lifetime requires a new char-
acterization. On the other hand, the proposed methodology
comprehends the design/reliability conditions on the fly using
the same database (Section V), hence further keeping down
on the number of simulations required. Thus, the methodol-
ogy in this section directly addresses the limitation L4 and
significantly speeds up the characterization.
VII. PRODUCTION DESIGN ANALYSIS
We now examine the final application of the pro-
posed methodology in an industrial scenario, discussing the
setup and the workflow. A 28-nm high-performance block
(2 mm×2 mm, ∼600k instances, >10 M transistors), operating
at 1-GHz clock frequency is taken, which is part of a large
industrial SoC. The entire flow is outlined in Fig. 12 for the
proposed method.
The new method, in essence, is a three-step process:
1) IP characterization at a baseline reliability condition;
2) determining the reliability constraints for this design; and
3) integration into the timing/implementation tool. Note that
the true retargeting flexibility of the proposed approach comes
in form of 2), which is a runtime-level input to verification that
is completely detached from 1). The flow of 3) uses a standard
industrial design methodology.
A. Library Characterization
The entire library of a few thousand cells was characterized
in two ways: 1) a full SPICE-based approach, where the
Fig. 12. Overall methodology and data-flow diagram for the proposed
method.
traditional fsafe table was generated at a baseline condition
and 2) the methodology proposed in this paper. Parallelized
and multithreaded SPICE simulations (using Cadence Spectre)
were used. The runtime for 1) was ∼800 CPU hours of raw
simulation, excluding extraction, whereas the methodology
in 2) completes in ∼80 CPU hours. For 2), the production
characterization framework for timing was used to arrive at
the various arcs and logical states for switching and leakage
current characterization.
B. Final Reliability Verification
The final application of the library-generated data was
performed in the timing tool (Encounter Timing System),
through a custom developed scriptware, which reads in both
the characterization data types. The timing analysis of the
design was performed at the baseline condition, to arrive at
the slews and probabilistic switching rates through all the
input pins. In the traditional approach, the scriptware steps
through the timing information of every instance in the design
and compares the queried fsafe (from the traditional model) to
the operating frequency. Note that since this approach suffers
from the problems discussed earlier (specifically, L1 and L2),
a final full SPICE simulation (with the RC loading of the
driver instance) is required after the initial results from the
frequency comparisons. A total of ∼600k instances were
analyzed in this way, and finally, the instances with the
frequency ratio > 1 (around 4500) were simulated further.
The excitations for the SPICE simulations were a simple
1010 transition (at operating frequency), since all the instances
were single-input clock-tree inverters, buffers, and gater cells
(only eight unique cells). The final set of violations after the
full SPICE simulations came down to 426.
On the other hand, in the new approach, the script-
ware additionally implements Algorithms 1–3, and based on
the chip-level design and reliability conditions specifications
(lifetime/temperature/voltage/FF), the equations are updated
on the fly for the final frequency comparison of every instance.
Finally, we plot the population distribution of frequency
ratios in Fig. 13. We consider five cases: curves a) and b)
corresponding to analysis with the traditional and proposed
methods at baseline reliability conditions, respectively, and
curves c), d), and e) corresponding to the analysis at retargeted
reliability conditions of a tighter JH limit, an overdrive case,
and a high temperature requirement, respectively.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
JAIN et al.: FAST AND RETARGETABLE FRAMEWORK FOR LOGIC-IP-INTERNAL EM ASSESSMENT 13
Fig. 13. Distribution plot for a 28-nm block (>600k instances), highlighting
the number of EM-critical instances and violations (with fop/ fsafe ratio > 1)
for curves a) and b) baseline reliability analysis with the traditional and
proposed methods and curves c), d), and e) retargeted reliability condition
analysis with the proposed methodology.
TABLE II
OVERALL COMPARISON OF TRADITIONAL VERSUS PROPOSED
METHODOLOGY. TRADITIONAL METHOD WAS RUN ONLY
AT BASELINE CONDITION DUE TO RUNTIME ISSUES,
WHEREAS THE PROPOSED METHOD COULD RUN
AT VARIOUS RELIABILITY CONDITIONS
For every method, we plot the ratio of fop to fsafe, which
signifies the EM criticality for that instance. Hence, an instance
with fop greater than fsafe (red region in the plot) is deemed as
EM failure and must be acted upon for fixing (by either load
reduction or replacement). The y-axis shows the distribution
of number of instances in design with a particular fop/ fsafe
ratio. We document the total number of violations from various
analyses in Table II.
As we can see from Fig. 13 and Table II, the proposed
approach reports a total of 442 violations, 421 of which
overlaps with the traditional methodology (+SPICE). The
remaining false (21) and escaped (5) violations from the
new approach were found to be relatively less critical, with
frequency ratios in the range of 1.14–0.9. Thus, the new
approach agrees well with SPICE.
Next, we demonstrate the final retargetability of the
proposed approach is evident by the curves c), d), and
e) in Fig. 13, analyzed at retargeted reliability conditions.
Run curve c), corresponding to an additionally tight constraint
of 5 °C lower JH, results in almost three times increased
violations, due to tighter rms limits. Run curve d), which is
at overdrive conditions results in a similar violation profile.
However, run curve e), which corresponds to a 20 °C higher
stress temperature run results in a plethora of violations. Such
a run is a close proxy to a direct application of IPs meant for
handheld businesses to harsher environments!
Finally, based on the stage of the chip-design execution,
design community has multiple ways to act upon this EM
verification feedback. Although a detailed solution to devel-
oping EM fixes is beyond the scope of this paper, we provide
some pointers in the rest of this paragraph. In many cases,
the harshness of reliability criterion softens due to a lower
lifetime requirement—for instance, in infotainment category
chips [29]. Alternatively, an avoidance strategy can be fol-
lowed upfront, wherein, based on the logic, high drive-strength
cells are used to drive large fanout points. However, this
requires careful consideration, since unwarranted improvement
in drive-strength is associated with sharp output-slew reduction
resulting in increased rms currents. On the other hand, a force-
ful lowering of drive-strength for instances with timing slack
causes slew degradation resulting in increased short-circuit
currents. A better approach may be through fanout load or
activity reduction, which predictably reduces the current flow.
VIII. CONCLUSION
In summary, an accurate and retargetable methodology
for IP-internal EM verification was presented in this paper.
Generic switching rates for various pins of the IP are com-
prehended, including aspects of clock gating. Significantly
high accuracy, with respect to SPICE, was achieved by
incorporating the impact of arbitrary parasitic loading and an
intelligent way of coming up with the effective pin capacitance
of load cells. The methodology was shown to be highly
flexible, in terms of allowing on-the-fly retargeting for the
reliability. Finally, the complete data generation process at
library level is expedited by application of cell-response mod-
eling. Results on a 28-nm production setup were shared, to
demonstrate significant relaxation in terms of violations, along
with close correlation to SPICE. We shared various cases
of runtime-level reliability retargeting by specifying varying
reliability conditions for the production block verification.
The methodology presented in this paper is most suitable in a
third-party-IP context. The need is only underlined further with
the increasing porting of designs from one business segment
to a different one, which requires on-the-fly assessment of the
reliability of all the components.
REFERENCES
[1] J. Lienig, “Electromigration and its impact on physical design in future
technologies,” in Proc. ACM Int. Symp. Phys. Design, 2013, pp. 33–40.
[2] X. Huang, T. Yu, V. Sukharev, and S. X.-D. Tan, “Physics-based
electromigration assessment for power grid networks,” in Proc.
ACM/EDAC/IEEE Design Autom. Conf., Jun. 2014, pp. 1–6.
[3] P. Jain and A. Jain, “Accurate current estimation for interconnect
reliability analysis,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 20, no. 9, pp. 1634–1644, Sep. 2012.
[4] ITRS Interconnect Summary. [Online]. Available: http://www.itrs.net,
accessed 2013.
[5] C. Yuan, D. Tipple, and J. Warner, “Optimizing standard cell design for
quality,” Proc. SPIE, vol. 9053, p. 90530O, Mar. 2014.
[6] JEDEC Reliability Standards, JC14. [Online]. Available: http://www.
jedec.org
[7] AEC Q100 Specification, AEC-Q003 Rev-A. [Online]. Available:
http://www.aecouncil.com/AECDocuments
[8] Encounter Design Implementation Tool User Manual. [Online]. Avail-
able: http://www.cadence.com, accessed 2014.
[9] Magma-Talus Implementation Tool User Manual. [Online]. Available:
http://www.synopsys.com, accessed 2014.
[10] W. R. Hunter, “Self-consistent solutions for allowed interconnect current
density—Part II. Application to design guidelines,” IEEE Trans. Electron
Devices, vol. 44, no. 2, pp. 310–316, Feb. 1997.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
14 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
[11] J. R. Black, “Electromigration failure modes in aluminum metallization
for semiconductor devices,” Proc. IEEE, vol. 57, no. 9, pp. 1587–1594,
Sep. 1969.
[12] K.-D. Lee, “Electromigration recovery and short lead effect under
bipolar- and unipolar-pulse current,” in Proc. IEEE Int. Rel. Phys. Symp.,
Apr. 2012, pp. 6B.3.1–6B.3.4.
[13] Altos Tool User-Manual. [Online]. Available: http://www.cadence.com,
accessed 2012.
[14] S. S. Sapatnekar, Timing. New York, NY, USA: Springer, 2004.
[15] S. P. McCormick, “Modeling and simulation of VLSI interconnections
with moments,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT,
Cambridge, MA, USA, 1989.
[16] J. Qian, S. Pullela, and L. Pillage, “Modeling the ‘effective capacitance’
for the RC interconnect of CMOS gates,” IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., vol. 13, no. 12, pp. 1526–1535, Dec. 1994.
[17] C. C. Wang and D. Markovic, “Delay estimation and sizing of CMOS
logic using logical effort with slope correction,” IEEE Trans. Circuits
Syst. II, Exp. Briefs, vol. 56, no. 8, pp. 634–638, Aug. 2009.
[18] J. F. Croix and D. F. Wong, “Blade and razor: Cell and interconnect
delay analysis using current-based models,” in Proc. ACM/IEEE Design
Autom. Conf., Jun. 2003, pp. 386–389.
[19] S. Gupta and S. S. Sapatnekar, “Compact current source models
for timing analysis under temperature and body bias variations,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 11,
pp. 2104–2117, Nov. 2012.
[20] D. Sinha, S. Abbaspour, A. Bhanji, and J. M. Ritzinger, “Method of
employing slew dependent pin capacitances to capture interconnect par-
asitics during timing abstraction of VLSI circuits,” U.S. Patent 8 103 997,
Jan. 24, 2012.
[21] B. Mullen, “CCS-timing: Composite current source delay modeling,” in
Proc. DAC, 2005.
[22] D. D. Ling, C. Visweswariah, P. Feldmann, and S. Abbaspour,
“A moment-based effective characterization waveform for static timing
analysis,” in Proc. 46th ACM/IEEE Design Autom. Conf., Jul. 2009,
pp. 19–24.
[23] HSPICE User Manual. [Online]. Available: http://www.synopsys.com
[24] S. Herbert and D. Marculescu, “Analysis of dynamic voltage/frequency
scaling in chip-multiprocessors,” in Proc. ACM/IEEE Int. Symp. Low
Power Electron. Design, Aug. 2007, pp. 38–43.
[25] PTPX Power Estimation Tool. [Online]. Available: http://www.
synopsys.com
[26] J. Choi, C.-Y. Cher, H. Franke, H. Hamann, A. Weger, and P. Bose,
“Thermal-aware task scheduling at the system software level,” in Proc.
ACM Int. Symp. Low Power Electron. Design, 2007, pp. 213–218.
[27] Y. Zhan, S. V. Kumar, and S. S. Sapatnekar, “Thermally aware design,”
Found. Trends Electron. Design Autom., vol. 2, no. 3, pp. 255–370,
Mar. 2008.
[28] A. Todri and M.-S. Malgorzata, “A study of reliability issues in clock dis-
tribution networks,” in Proc. IEEE Int. Conf. Comput. Design, Oct. 2008,
pp. 101–106.
[29] Automotive Processors Overview. [Online]. Available: http://
www.ti.com/lsds/ti/processors/dsp/automotive_processors/overview.page,
accessed Aug. 2015.
[30] K. Banerjee and A. Mehrotra, “Coupled analysis of electromigration
reliability and performance in ULSI signal nets,” in Proc. IEEE/ACM
Int. Conf. Comput. Aided Design, 2001, pp. 158–164.
[31] Y.-J. Park, P. Jain, and S. Krishnan, “New electromigration validation:
Via node vector method,” in Proc. IEEE Int. Rel. Phys. Symp., May 2010,
pp. 698–704.
[32] S. M. Alam, G. C. Lip, C. V. Thompson, and D. E. Troxel, “Circuit
level reliability analysis of Cu interconnects,” in Proc. 5th Int. Symp.
Quality Electron. Design, 2004, pp. 238–243.
[33] Z. Guan, M. Marek-Sadowska, S. Nassif, and B. Li, “Atomic flux diver-
gence based current conversion scheme for signal line electromigration
reliability assessment,” in Proc. IEEE Int. Interconnect Technol. Conf.,
May 2014, pp. 245–248.
[34] K.-D. Lee, “Electromigration critical length effect and early failures in
Cu/oxide and Cu/low k interconnects,” Ph.D. dissertation, Dept. Mater.
Sci. Eng., Univ. Texas Austin, Austin, TX, USA, 2003.
Palkesh Jain (M’04) received the bachelor’s and
master’s degrees in electrical engineering from
IIT Bombay, Mumbai, India, in 2004. He is cur-
rently pursuing the Ph.D. degree with the Universitat
Politècnica de Catalunya, Barcelona, Spain.
He joined the ASIC Group, Texas Instruments
India, Bangalore, India, where he defined and
developed several gigahertz enabling reliability
methodologies. Subsequently, he joined the Yield
and Product Engineering team, Qualcomm Tech-
nologies Inc., Bangalore, in 2014, where he is
involved in system level power and thermal management methodologies.
He holds 15 U.S. patents (granted/pending).
Jordi Cortadella (S’87–M’89–F’15) is currently a
Professor with the Computer Science Department,
Universitat Politècnica de Catalunya, Barcelona,
Spain. His current research interests include for-
mal methods and computer-aided design of VLSI
systems with a special emphasis on asynchronous
circuits, concurrent systems, and logic synthesis.
Prof. Cortadella is a member of Academia
Europaea. He received best paper awards at the
International Symposium on Advanced Research in
Asynchronous Circuits and Systems in 2004, the
Design Automation Conference in 2004, and the International Conference
on Application of Concurrency to System Design in 2009. He has served
on the technical committees of several international conferences in the field
of design automation and concurrent systems, and is an Associate Editor of
the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED
CIRCUITS AND SYSTEMS.
Sachin S. Sapatnekar (S’86–M’93–F’03) received
the B.Tech. degree from IIT Bombay, Mumbai,
India, the M.S. degree from Syracuse University,
Syracuse, NY, USA, and the Ph.D. degree from
the University of Illinois at Urbana–Champaign,
Champaign, IL, USA.
He taught with Iowa State University, Ames,
IA, USA, from 1992 to 1997. He has been with
the University of Minnesota, Minneapolis, MN,
USA, since 1997, where he holds the Distinguished
McKnight University Professorship and the Robert
and Marjorie Henle Chair.
Dr. Sapatnekar has received six conference best paper awards, the Best
Poster Award, an ICCAD Ten-Year Retrospective Most Influential Paper
Award, the SRC Technical Excellence Award, and the SIA University
Researcher Award.
