Aging Effects of Leakage Optimizations for Caches by Calimera, Andrea et al.
Aging Effects of Leakage Optimizations for Caches
Andrea Calimeray, Mirko Loghiz, Enrico Maciiy, Massimo Poncinoy
yPolitecnico di Torino, 10129, Torino, ITALY
zUniversitå di Udine, 33100, Udine, ITALY
ABSTRACT
Besides static power consumption, sub-90nm devices have
to account for NBTI eects, which are one of the major
concerns about system reliability. Some of the factors that
regulate power consumption also impact NBTI-induced ag-
ing eects; however, to which extent traditional low-power
techniques can mitigate NBTI issues has not been investi-
gated thoroughly.
This is especially true for cache memories, which are the tar-
get of this work. We show how leakage optimization tech-
niques can also be leveraged to extend the lifetime a cache.
Experimental analysis points out that, while achieving a to-
tal energy reduction up to 80%, managing static power can
also provide a 5x factor on lifetime extension.
Categories and Subject Descriptors: B.3.2 [MEMORY
STRUCTURES] : Design Styles.
General Terms: Design, Experimentation, Performance.
Keywords: Memory Hierarchy, Leakage Reduction, Aging.
1. INTRODUCTION
The long-term stability of a conventional six-transistor
SRAM cell is strongly aected by temporal degradation of
MOSFET parameters induced by Negative Bias Tempera-
ture Instability (NBTI) [1]. In particular, the increase over
time of the threshold voltage of the PMOS transistors, un-
der negative bias (i.e., a logic 0 at the input), results in a
reduction in the Static Noise Margin (SNM) of the cell with
consequences on the capability of the cell of reliably storing
a value ([2]{[4]). Unlike random logic, these NBTI-induced
eects cannot be tackled by forcing signal probabilities in
order to reduce the occurrence probability of a logic 0; as a
matter of fact, due to the symmetric structure of a cell, a
SRAM cell ages whatever the value it stores.
Power (and in particular, static power) optimization tech-
niques oer however some mitigation of NBTI eects in
memories. The power management is implemented by either
disconnecting a sub-block memory from the ground/supply
network (power gating) or by reducing the supply voltage
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.
(dynamic voltage scaling { DVS). Both schemes have ben-
ecial eects on NBTI-induced aging. Power gating has
the eect of completely nullifying the aging eects [5, 6].
Similarly, but with a smaller impact, voltage scaling im-
proves NBTI-induced aging because a reduced Vdd corre-
sponds to a smaller bias voltage [7]. While these eects have
been assessed on individual memory cells, the actual im-
pact of their possible architectural embodiments in a multi-
objective (performance, energy, aging) space has not been
assessed in the literature. Objective of this work is to pro-
vide an explorative study of the complex interaction of these
metrics as a function of the typical parameters of a mem-
ory hierarchy (namely, cache size and miss penalty), and of
the power optimization paradigm adopted (power gating or
DVS).
Results emphasize the fact that both power management
schemes are extremely eective in lengthening the lifetime of
a memory, even in the presence of signicant miss penalties.
2. BACKGROUND
NBTI has emerged as the most relevant source of perma-
nent, time-dependent variation of the transistor parameters
for sub-90nm technologies. In particular, NBTI causes an
increase over time of the threshold voltage of pMOS transis-
tors, which in turn reduces the robustness of a SRAM cell.
A conventionally accepted metric for the aging of a SRAM
cell is the Static Noise Margin (SNM), dened as the mini-
mum DC noise voltage necessary to change the state of an
SRAM cell. NBTI impacts SNM because when the pull-up
pMOS of the two cross-coupled inverters of the SRAM cell
are negative biased, its Vth shifts over time, thus lowering the
static characteristics of the two inverters. Therefore, after
some time, the SNM falls below a (technology-dependent)
value that allows safe storage of values.
A detailed treatment of NBTI eects and models is out of
the scope of this paper, thus we refer the reader to classical
tutorial papers on NBTI [1]. We summarize here the basic
factors that impact NBTI eects:
Operating Conditions:. For a given set of technological
parameters and physical dimensions (e.g., doping concentra-
tion, mobility, and oxide thickness) NBTI eects are mainly
dependent on (i) temperature (delay degradation increases
with increasing T ), and (ii) supply Voltage (delay degrada-
tion increases with increasing Vdd).
Signal Statistics:. NBTI induced eects strongly depends
on the actual amount of stress time (time in which the gate
is negative biased). When stress is not applied (when Vgs =
0V ), however, a partial recovery of of delay occurs.
Previous works on NBTI in the EDA domain have dealt with
the issue of how NBTI impacts the SNM of an SRAM cell,
for various technologies and operating conditions ([3]{[4]).
The most relevant result was presented in [2]; based on the
observation that an equal probability of storing a 0 or a 1
guarantees the minimum aging, they provide hardware and
software schemes to periodically invert the entire content of a
memory so as to guarantee a perfectly balanced probability.
In [6], the authors assess the aging benets provided by the
application of power gating to a memory cell, observing that
its impact can be much higher than the one obtained by
controlling the value probability.
3. LEAKAGE OPTIMIZATIONS IN
CACHES AND IMPACT ON AGING
Existing techniques for reducing leakage power in caches are
essentially dierent embodiments of the power management
problem. Based on the denition of (i) some granularity
representing the unit that can be power-managed (e.g., one
cell, one line, one set, one way), (ii) some metric of idleness,
and (iii) some low-leakage state, these schemes simply put a
power-managed unit into a low-leakage state when idleness
exceeds some threshold.
The choice of granularity, idleness metric, and low-power
state characterizes the various approaches. From the func-
tional perspective, however, the most important dimension
is the third one, i.e., the implementation of the low-leakage
state. We can think of two main approaches, corresponding
to two extremes of the leakage-performance space:
 Non-state preserving schemes, where by means of
some form of power gating, the leakage of the power-
managed unit is zeroed, but its content is lost ([10]{
[13]).
 State preserving schemes, where, by means of some
form of dynamic voltage scaling, the leakage of the
power-managed unit is reduced (but not zeroed), while
keeping its content ([14]).
Our objective is to assess how such solutions interact with
the aging of memory cells; to this purpose, we must rst
characterize how the two basic mechanisms (namely, power
gating and voltage scaling) impact the aging of a memory
cell. Furthermore, we should remember that NBTI aging
is value-dependent, thus we should also understand the role
played by the content of a memory cell.
3.1 Impact of Values on SRAM Aging
As pointed out in previous NBTI characterization works for
SRAM cells ([2]{[4]), a SRAM cell ages irrespective of the
value it stores. The best case degradation happens when the
value at the output of each inverter is 0 50% of the time,
i.e., both PMOS degrade of the same amount. Otherwise,
one of the PMOS transistors degrades faster than the other
and the memory cell will fail earlier.
When considering an entire memory block from the func-
tional standpoint, however, the situation is quite dierent.
Consider, for example, a cache line as the unit of power-
management. In a data cache, values in a line will change
only if (i) there is a write in that line, or (ii) that line
is replaced when fetching a block on a cache miss. Both
events have low occurrence probability (writes are far less
than reads, and miss rates are typically quite low). Fur-
thermore, even when a line is overridden, very likely only a
subset of the bits will toggle. Finally, the bit with the most
skewed probability will determine the aging of the entire line
it belongs to.
Based on these considerations, we can safely claim that the
aging of a cache can be calculated based on the worst-case
probability (i.e., xed 0 or 1).
3.2 Impact of Power Gating on SRAM Aging
By power gating we denote the technique in which a footer
transistor (sleep transistor) is used to disconnect a logic
block from the ground. Power gating can be implemented
using cell- or cluster-based approaches, or a distributed
coarse-grained approach. The same holds when applying
power gating to memory structures. Although the basic
scheme [10, 11] uses one sleep transistor per cell, the tran-
sistor can be shared among multiple cells (e.g., a row of the
memory [12, 13]). Whatever the granularity of the power
gating, when the sleep transistor is on (active state), the
cell will operate as usual, yet with a ground voltage equal
to the virtual ground, therefore with a slightly worse perfor-
mance (during read/write operations). Notice that the value
of VV GND does not aect the SNM, which is by denition is
a DC quantity.
When the sleep transistor is o, the cell will be disconnected
from the ground, and, both inverters' outputs will quickly
reach the \1" value. Since this value corresponds to the
recovery state, logic blocks in a stand-by state are naturally
immune to NBTI-induced aging [5]. Notice that this is not
a \logic" state: it is due to electrical reasons and cannot be
forced by writing some value in the cell.
3.3 Impact of Vdd Scaling on SRAM Aging
The aging of a PMOS transistor is determined by the
amount of negative bias voltage, i.e., gate-to-source volt-
age. Reducing Vdd (the source voltage) will therefore reduce
the amount of bias accordingly. While the quantication
of the Vth degradation as a function of bias voltage for a
single PMOS device can be found in the literature ([1, 8,
9]), the analysis of its eect on the SNM of a memory cell
was missing. We have thus run SPICE characterization on
a custom-designed SRAM cell, mapped into a commercial
45nm technology by STMicroelectronics, using annotated
netlist after parasitic extraction.
Results are reported in Table 1, which shows SNM degrada-
tion in % with respect to the nominal, time-zero value, for
the nominal Vdd = 1:1V , and the \drowsy" Vdd = 0:4V one.
Values refer to the worst case of a xed value 0 stored in the
cell. The SNM degradation under the drowsy Vdd (36% of
the nominal Vdd) is about 60% of that at the full Vdd.
SNM Degradation [%]
Years Vdd = 1:1V Vdd = 0:4V
3 20.41 12.77
6 24.08 15.06
Table 1: SNM Degradation as a Function of Vdd.
4. ARCHITECTURES AND MODELS
4.1 Low-Leakage Cache Architectures
We compare two generic cache leakage optimizations imple-
menting one state-preserving and one non-state-preserving
scheme, which are general implementations of approaches
presented in the literature ([10]{[14]).
Both such schemes share two basic principles. First, the
granularity of the power management unit is a cache line.
Second, the decision about whether to turn o a cache line
is based on its usage: lines that are not accessed since a
given number of cycles (the breakeven time) are put into a
low-leakage state. With power gating the line content is lost,
so the line must be invalidated and a cache miss will occur.
Conversely, using DVS, the content is preserved (\drowsy"
state), and only a small time interval is required to restore
the line back into the active state. The two conceptual ar-
chitectures are shown in Figure 1. In both schemes, the
block `Control' implements the counting-based mechanism
that triggers the de-activation of a line.
Cell
0
Cell
1
Cell
2
Cell
n-1… Line i
Valid
bit
Controlt l
Word
Cell
0
Cell
1
Cell
2
Cell
n-1…
Line iValid
bit
Wordline
Sleep
Drowsy
Wordline
(Virtual) Ground
Supply
Drowsy
bit
Controlt l
VddVdd, drowsy
0    1
(set)
Word
(a)
(b)
Figure 1: Reference Architectures: Power-Gated (a)
and Drowsy Scheme (b).
The type of low-leakage state (gated or drowsy) aects the
choice of the breakeven time B. States with lower leakage
(gated) will have a longer breakeven time B. The latter is
calculated as B = ET
PA PS , where ET is the energy spent to
put a cache line in the low leakage state and for the suc-
cessive reactivation, while PA and PS are the leakage power
spent by a cache line when in the standard (active) or in the
low-leakage state respectively.
4.2 Aging Models and Metrics
4.2.1 Aging Model for the Power-Gated Scheme
For the power-gated scheme we can leverage the results pre-
sented in [5], where the probability Psleep of the sleep signal
(i.e., how often a cache line is the put into sleep) is used
as a multiplicative factor of the stored value probability.
In caches, since the dependence on the stored value is im-
material and we assume worst case, only Psleep is relevant.
The threshold voltage degradation can thus be compactly
expressed as Vth(t) = K  ((1 Psleep)  t)1=4, where K is a
constant that lumps all the technological parameters (e.g.,
oxide electric eld, thermal voltage, etc.) and t denote time.
The term (1 Psleep)  t can be viewed as the eective stress
time.
4.2.2 Aging Model for the Drowsy Scheme
For the drowsy scheme the evaluation of the aging is more
elaborated. Since the aging curves are non-linear and since
Vdd appears directly in the threshold degradation equation
(the factor K shown above), the actual alternation of sleep
and active interval matters (and not just their occurrence
probability as for the power-gated case). This is pictorially
explained in Figure 2.
SNM 
Increase [%]
Time [yrs]
0           1           2           3           4           5   6  
5
10
15
20
25 Vdd=1.1V
Vdd=0.4V
18%
20%
(Sleep, Active)
(Active, Sleep)
Figure 2: Eect of Vdd on Aging Depends on Tem-
poral Sequence.
The plot shows the two SNM degradation curves over time
corresponding to the full Vdd value of 1.1V (solid gray curve)
and to the drowsy Vdd value of 0.4V (dotted gray curve).
The plot shows the resulting SNM degradation correspond-
ing to a four-year usage (two-years active mode, two years of
sleep) for the two extreme cases of an active/sleep waveform
with a 50% duty cycle: the worst case (Active, Sleep), and
the best case (Sleep, Active). The former pattern yields a
degradation of about 20% after four year, while the latter
one results in a 18% degradation. Therefore, patterns where
sleep intervals occurs earlier will have less aging and longer
lifetimes. This is intuitive, since the curves grow quickly at
the beginning and atten out for larger time values.
In our methodology, in order to avoid the tracking of the
detailed sleep/idle sequence, we assume the worst case pat-
tern consisting of the execution of all the active intervals
rst, and the sleep intervals then. Although conservative,
this analysis guarantees a safe lower bound to the lifetime
of the cache.
4.2.3 Aging Metrics
SNM is the metric used for assessing the aging of a SRAM
cell. As a compact metric of aging we dene the lifetime
of a cache as the time at which the SNM decreases by 15%
with respect to its nominal value.
Extending lifetime to the entire cache is straightforward;
since a line is the power management unit and we assume
the worst case (always 0 or 1) for value probabilities, the
lifetime of a line coincides with that of a single cell. By
extension, the lifetime of the whole cache is the shortest
lifetime among all the cache lines.
However, since absolute time has dierent performance in-
terpretations for the two architectures (a given amount of
time corresponds to dierent amounts of memory accesses,
due to dierent miss rates), we lump performance and aging
into a single metric, and we also express lifetime (ad dened
above) in terms of memory accesses.
5. EXPERIMENTAL RESULTS
We have applied the above methodology on a set of traces
obtained from the MediaBench suite [15]; these traces have
been fed to a in-house cache simulator that was instrumented
Power Gating Drowsy
Trace LT LT E LT LT E
[yrs] [Pacc] [%] [yrs] [Pacc] [%]
adpcm.dec 1.25x 1.13x 75% 1.02x 1.02x 71%
adpcm.enc 1.19x 0.50x 41% 1.53x 1.53x 71%
cjpeg 3.85x 2.53x 65% 2.53x 2.53x 71%
CRC32 1.25x 0.50x 39% 1.41x 1.41x 73%
dijkstra 2.04x 1.49x 68% 1.71x 1.71x 71%
djpeg 3.37x 2.26x 65% 3.03x 3.03x 72%
t 1 3.19x 1.77x 57% 3.33x 3.33x 72%
t 2 3.88x 2.21x 58% 3.54x 3.55x 72%
gsmd 1.55x 0.73x 48% 1.53x 1.53x 72%
gsme 2.56x 1.30x 53% 1.86x 1.86x 72%
ispell 3.79x 2.53x 67% 3.33x 3.33x 73%
lame 5.20x 2.71x 54% 4.07x 4.07x 72%
mad 5.20x 3.19x 61% 4.60x 4.60x 71%
rijndael i 1.52x 1.05x 65% 2.53x 2.53x 71%
rijndael o 1.35x 1.00x 67% 2.29x 2.29x 70%
say 2.91x 1.55x 55% 2.55x 2.55x 71%
search 4.43x 3.02x 67% 2.29x 2.29x 73%
sha 4.05x 3.36x 74% 2.54x 2.54x 72%
ti2bw 3.01x 2.89x 80% 2.18x 2.18x 73%
AVG 2.9x 1.9x 61% 2.5x 2.45x 72%
Table 2: Leakage, Performance and Aging Results
for a 8K Unied Cache.
with aging models as described above and with energy mod-
els that have been obtained using 45nm technology data
from STMicroelectronics.
The rst set of results refers to the case of a 8K unied cache,
for which we collected total (dynamic and static) energy of
the whole memory hierarchy and aging values. Results are
shown in Table 2, and refer to a miss penalty of 5 cycles. All
gures are relative to the baseline case of a cache without
any power management scheme; aging gures denote im-
provements, whereas energy gures reports percentage sav-
ings. Values are obtained by assuming an innite repetition
of the memory access pattern of the trace.
Notice that two values of lifetime are reported for each
trace: one in absolute times, another one in memory accesses
(specically, 1015 accesses). This two units are necessary
because, due to dierent miss rates and dierent breakeven
time values, a given amount of time corresponds to dierent
numbers of executed instructions.
The table oers a few interesting insights. As a rst re-
sults, we see how both the power gating and the drowsy
scheme provide signicant improvements of lifetime (2.9x
and 2.5x in terms of absolute times). Notice, however, that
in the drowsy scheme a slightly shorter lifetime in years
corresponds to an eective much higher number of useful
accesses. This expresses the fact that much of the extra
lifetime oered by power gating is spent in the transition
from the low-power state and therefore not executing useful
work. This is reected by the performance penalty of the
two schemes (not reported in the table): power gating re-
sults, as expected, in a large overhead (72.8%); conversely,
the high exploitability of the drowsy state limits this over-
head to 4.6% for the drowsy scheme.
Concerning energy, both schemes are obviously very eec-
tive, with the drowsy being best thanks to the smaller energy
overhead. Although a sleep state using power gating results
in zero aging, the longer breakeven times reduce the number
of useful intervals; the drowsy architecture, while providing
far less benet in terms of aging reduction, can be triggered
more frequently.
6. CONCLUSIONS
Reducing energy consumption of cache memories by exploit-
ing a low-leakage state, is also eective in terms of lifetime
extension. Analyses show that an aggressive approach pro-
vides good benets, as in terms of energy saving as well as in
terms of aging relief, if the miss penalty is reasonably small.
On the contrary, whenever the miss penalty increases, more
conservative techiques become more eective, because their
smaller reactivation overhead.
The key dierence between the idleness metrics that rule
energy and aging (worst case for aging, average case for en-
ergy) suggests however that NBTI mitigation can be pushed
further if specic architectural strategies are adopted.
7. REFERENCES
[1] M.A.Alam, \A critical examination of the mechanics of
dynamic NBTI for PMOSFETs," Proc. IEDM, 2003,
pp.346-349
[2] S.V. Kumar, K.H. Kim, S.S Sapatnekar, \Impact of NBTI on
SRAM read stability and design for reliability," ISQED'06:
International Symposium on Quality Electronic Design
March 2006, pp. 213{218.
[3] K.Kang, H. Kuuoglu, K. Roy, M.A. Alam, \Impact of
Negative-Bias Temperature Instability in Nanoscale SRAM
Array: Modeling and Analysis," IEEE Transactions on CAD,
Vol. 26, No. 10, pp. 1770-1781, Oct. 2008.
[4] V. Huard, et al., \NBTI Degradation: from Transistor to
SRAM Arrays," IEEE 46th Annual International Reliability
Physics Symposium, May 2008, pp. 289{300.
[5] A. Calimera, E. Macii, M. Poncino, "NBTI-Aware power gating
for concurrent leakage and aging optimization", ISLPED '09:
International Symposium on Low power Electronics and
Design, August 2009, pp. 127-132.
[6] A. Calimera, E. Macii, M. Poncino, \Analasis of NBTI-Induced
SNM Degradation in Power-Gated SRAM Cells," ISCAS'10:
International Symposium on Circuits and System, MAy 2010,
to be published.
[7] L. Zhang, R. P. Dick, \Scheduled Voltage Scaling for Increasing
Lifetime in the Presence of NBTI,"ASPDAC'09: Asia &
South Pacic Design Automation Conference, pp. 492{497,
Jan. 2009.
[8] R. Vattikonda, et.al. \Modeling and minimization of PMOS
NBTI eect for robust nanometer design,"DAC-44: Design
Automation Conference, pp. 1047-1052, 2006.
[9] A. Calimera, E. Macii, M. Poncino, \NBTI-aware sleep
transistor design for reliable power-gating,"GLS-VLSI'09:
IEEE 19th Great Lakes Symposium on VLSI, May 2009,
pp. 333{338.
[10] K. Nii, et al., \A Low-Power SRAM using
Auto-Backgate-Controlled MT-CMOS," ISLPED'98:
International Symposium on Low Power Electronics and
Design, August 1998, pp. 293{298.
[11] M. Powell, et al. \Gated-Vdd: A Circuit Technique to Reduce
Leakage in Deep-Submicron Cache Memories," ISLPED'00:
International Symposium on Low Power Electronics and
Design, July 2000, pp. 90{95.
[12] S. Kaxiras, Z. Hu, and M. Martonosi, \Cache Decay:
Exploiting General Behavior to Reduce Cache Leakage Power,"
ISCA'01: IEEE/ACM International Symposium on
Computer Architecture June 2001, pp. 240{251.
[13] H. Zhou, M.C. Toburen, E. Rotenberg, T. M. Conte, \Adaptive
Mode Control: A Static-Power-Ecient Cache Design,"ACM
Transactions on Embedded Computing Systems, Vol. 2, No. 3,
August 2003, pp. 347{372.
[14] K. Flautner, N. Kim, S. Martin, D. Blaauw, T. Mudge,
\Drowsy caches: Simple techniques for reducing leakage
power," ISCA'02: Int. Symp. on Computer Architecture, May
2002, pp. 148{157.
[15] M. R. Guthaus et al., \MiBench: A free, commercially
representative embedded benchmark suite", IEEE 4th Annual
Workshop on Workload Characterization, pp. 3{14, Dec.
2001.
