Area efficient architectures for information integrity in cache memories by Kim, Seongwoo & Somani, Arun K.
Electrical and Computer Engineering 
Conference Papers, Posters and Presentations Electrical and Computer Engineering 
1999 
Area efficient architectures for information integrity in cache 
memories 
Seongwoo Kim 
Iowa State University 
Arun K. Somani 
Iowa State University, arun@iastate.edu 
Follow this and additional works at: https://lib.dr.iastate.edu/ece_conf 
 Part of the Computer and Systems Architecture Commons 
Recommended Citation 
Kim, Seongwoo and Somani, Arun K., "Area efficient architectures for information integrity in cache 
memories" (1999). Electrical and Computer Engineering Conference Papers, Posters and Presentations. 
164. 
https://lib.dr.iastate.edu/ece_conf/164 
This Conference Proceeding is brought to you for free and open access by the Electrical and Computer Engineering 
at Iowa State University Digital Repository. It has been accepted for inclusion in Electrical and Computer 
Engineering Conference Papers, Posters and Presentations by an authorized administrator of Iowa State University 
Digital Repository. For more information, please contact digirep@iastate.edu. 
Area efficient architectures for information integrity in cache memories 
Abstract 
Information integrity in cache memories is a fundamental requirement for dependable computing. 
Conventional architectures for enhancing cache reliability using check codes make it difficult to trade 
between the level of data integrity and the chip area requirement. We focus on transient fault tolerance in 
primary cache memories and develop new architectural solutions to maximize fault coverage when the 
budgeted silicon area is not sufficient for the conventional configuration of an error checking code. The 
underlying idea is to exploit the corollary of reference locality in the organization and management of the 
code. A higher protection priority is dynamically assigned to the portions of the cache that are more error-
prone and have a higher probability of access. The error-prone likelihood prediction is based on the 
access frequency. We evaluate the effectiveness of the proposed schemes using a trace-driven 
simulation combined with software error injection using four different fault manifestation models. From 
the simulation results, we show that for most benchmarks the proposed architectures are effective and 
area efficient for increasing the cache integrity under all four models. 
Keywords 
cache storage, data integrity, discrete event simulation, software fault tolerance 
Disciplines 
Computer and Systems Architecture 
Comments 
This is a manuscript of a proceeding published as Kim, Seongwoo, and Arun K. Somani. "Area efficient 
architectures for information integrity in cache memories." In Proceedings of the 26th International 
Symposium on Computer Architecture, pp. 246-255. IEEE, 1999. DOI: 10.1109/ISCA.1999.765955. Posted 
with permission. 
This conference proceeding is available at Iowa State University Digital Repository: https://lib.dr.iastate.edu/
ece_conf/164 
Area Efficient Architectures for Information Integrity in Cache Memories
Seongwoo Kim and Arun K. Somani
Department of Electrical and Computer Engineering
Iowa State University
Ames, IA 50011, USA
fskim, arung@iastate.edu
Abstract
Information integrity in cache memories is a fundamen-
tal requirement for dependable computing. Conventional
architectures for enhancing cache reliability using check
codes make it difficult to trade between the level of data in-
tegrity and the chip area requirement. We focus on transient
fault tolerance in primary cache memories and develop new
architectural solutions to maximize fault coverage when the
budgeted silicon area is not sufficient for the conventional
configuration of an error checking code. The underlying
idea is to exploit the corollary of reference locality in the
organization and management of the code. A higher protec-
tion priority is dynamically assigned to the portions of the
cache that are more error-prone and have a higher prob-
ability of access. The error-prone likelihood prediction is
based on the access frequency. We evaluate the effective-
ness of the proposed schemes using a trace-driven simula-
tion combined with software error injection using four dif-
ferent fault manifestation models. From the simulation re-
sults, we show that for most benchmarks the proposed ar-
chitectures are effective and area efficient for increasing the
cache integrity under all four models.
1. Introduction
Memory hierarchy is one of the most important elements
in modern computer systems. The reliability of the mem-
ory significantly affects the overall system dependability.
The purposes of integrating an error checking scheme in the
memory system are to prevent any error that has occurred in
the memory from propagating to other components and to
overcome the effects of errors locally, contributing to the
overall goal of achieving failure-free computation.
Transient faults, which occur more often than permanent
faults [6], [14], can corrupt information in the memory, i.e.,
instruction and data errors. These soft errors may result in
erroneous computation. In particular, errors in cache mem-
ory, which is the closest data storage to the CPU, can eas-
ily propagate into the processor registers and other mem-
ory elements, and eventually cause computation failures.
Although the cache memory quality has improved tremen-
dously due to advances in VLSI technology, it is not pos-
sible to completely avoid transient fault occurrence. As a
result, data integrity checking, i.e., detecting and correcting
soft errors, is frequently used in cache memories.
The primary technique for ensuring data integrity is the
addition of information redundancy to the original data.
Whenever a data item is written into the cache, a check
(or protection) code such as parity or error-correcting code
(ECC) is also included. We denote a pair of data and check
code by aparity group. When an item is requested, the
corresponding parity group is read and an error syndrome
is generated to check and correct the error if there is one.
The capability of the protection code needs to be determined
properly depending on the degree of required data integrity,
expected error rate based on harshness of the operating en-
vironment, and design and test cost.
Despite the fact that predicting the exact rate and behav-
ior of transient faults in a system is not possible, current
data integrity checking schemes for caches are generally se-
lected on a single-bit failure model basis. Thus, byte-parity
scheme (one bit parity per 8-bit data) [15] and single error
correcting-double error detecting (SEC-DED) code [5] are
widespread. Many higher capability codes for byte or burst
error control have also been studied [3], [8].
Check codes employed for increased reliability in the
caches are constructed in auniform structure, i.e., every unit
of data is protected by a check code of the chosen capability.
This conventional method is reasonable under the assump-
tion that each cache item has the same probability of error
occurrence. However, it has the following deficiencies.
 Check code in the uniform structure is an expensive
way to enhance cache reliability. Therefore, it is
overkill under extremely low error rates.
 It is not flexible in terms of chip area requirement, as
the area occupied by the check code is directly pro-
portional to the cache size. If the budgeted area is not
sufficient for the uniform structure, no intermediate ar-
chitectures are currently available. The high overhead
may result in sacrificing the integrity checking.
The uniform structure enables every item to be checked.
However, error checking is necessary only for those items
that are likely to be corrupted. If it is possible to predict
such cache items, a higher data integrity can be achieved
with a smaller amount of chip area for the check code. In
practice, there are several reasons that soft error occurrence
tends to concentrate in a few locations. Information in the
cache can be altered during read/write operations due to low
noise margins, and thus cache lines that are frequently ac-
cessed may have a higher probability of corruption. Cross-
coupling effects may also induce errors in neighboring lo-
cations of a line being accessed. On the other hand, global
random disturbances commonly affect any location. More
importantly, errors in unused lines are no concern.
In this paper, we take these factors into account to de-
velop area efficient architectural solutions for improving
cache integrity. The underlying idea is that more error-
prone and more likely used cache lines must be protected
first. The random faults are not biased to a specific location
or time. However, if a fault occurs during the access of a
line, it is more likely to affect the data being accessed. As
a result, access frequency makes a difference in the proba-
bility of error occurrence betweenactive(more access) and
inactive(less access) lines. With large caches, the majority
of cache accesses are usually localized in a small portion of
the cache. This frequently accessed part is considered more
error-prone. The corrupted items in the most frequently
used (MFU) lines are likely to be used as instructions or
operands, quickly affecting the computation. On the other
hand, errors in inactive lines have a higher probability of
being replaced or overwritten with new, correct data [13].
Data errors are harmful only if they are used for operation,
suggesting that not providing check codes for inactive lines
may not affect the integrity of the computation.
We present three new architectures,parity caching,
shadow checking, andselective checking, to protect primary
caches. These schemes allow flexible trade-offs between
silicon area and level of data integrity so that both the relia-
bility and area requirements can be met. The new schemes
can achieve an acceptably high level of fault coverage with
much less area than the uniform structure, realizing area ef-
ficient enhancement of cache system reliability.
This paper is organized as follows. Section 2 explains
what affects the transient fault rate, how the cache data sta-
tus changes, and how soft errors affect cache operations.
Section 3 describes our new architectures and correspond-
ing management policies. We also provide an area com-
parison with the conventional organization. To evaluate the
effectiveness of the proposed schemes, we conducted error
injection experiments on a simulation model of the system.
The error models and evaluation methodology are presented
in Section 4. The performance of the proposed schemes is
discussed in Section 5. In Section 6, we make some con-
cluding remarks.
2. Errors in Cache and Their Effects
To reduce CPU-memory bandwidth gap, up to 60% of
the area of recent microprocessors is dedicated to caches
and other memory latency-hiding hardwares [9]. The cache
memory stores instructions or data in data RAM along with
address tags in tag RAM. The primary caches are required
to operate at the processor’s clock speed. Use of lower volt-
age levels, high speed data read/write operations, and ex-
tremely dense circuitry increase the probability of transient
fault occurrence, resulting in more bit errors in cache mem-
ories. Moreover, external disturbances such as noise, power
jitter, local heat densities, and ionization due to alpha-





















  from State X to State Y
Figure 1. Cache data state transition.
Figure 1 shows how the state of a cache data block is af-
fected by error occurrence and recovery. Initially, the block
is error-free (StateN). Single- and multiple-bit errors due
to some transient faults lead to StateS and StateM, respec-
tively. The state of a corrupted block changes back to State
N if the error is overwritten by a new correct item or is cor-
rected by a recovery mechanism. The absorbing state,F,
represents the situation where an error propagates outside
the cache boundary through a normal access. If the cache
memory always operates in StateN (PNN = 1) or erro-
neous items are never used, then no fault tolerance schemes
are necessary. However, in practice, this is not likely to be
the case. The check codes help keepPSN andPMN nearly
equal to 1 so that the cache block rarely reaches StateF (i. .,
PSF ' PMF ' 0).
One may suspect that extremely infrequent error propa-
gation (0 6= PSF << 1, 0 6= PMF << 1) may not have any
notable effects. However, even a single-bit error can bring a
complete system failure. Through program execution, cor-
rupted data items propagate to the processor’s registers and
produce an erroneous outcome that can eventually propa-
gate to the external world. A single error can also spread
to other registers, cache lines, and memory locations as the
processor continues to use the corrupted data recursively.
The erroneous contents of registers can also cause page or
segmentation faults and incorrect control flow changes. It
is shown in [13] that the probability of a single bit leading
to a failure is about 50%. However, the actual probability
heavily depends on the cache hit rate.
Bit changes in the tag RAM also cause the following im-
proper cache hit and miss decisions that make the proces-
sor’s memory references chaotic.
 Pseudo-hit: the tag portion of the incoming reference
address matches with the wrong cache line’s tag field.
 Pseudo-miss: the tag associated with the desired data
item does not match with the reference address.
 Multi-hit: the tag portion of the reference address
matches with the tags of multiple lines in a set.
In the case of a pseudo-hit, the processor gets wrong data on
a read and updates the data in the wrong location on a write.
A pseudo-miss generates an unnecessary main memory ac-
cess. The multi-hit may be detected by the cache controller
without check code support for the tags, but handling is not
simple. The controller cannot distinguish between the mul-
tiple hit lines to service the processor’s request. Moreover,
invalidating all of those lines and treating the access as a
miss is not a solution if any of the lines are dirty, i.e., valid
data may exist only in the cache. Writing the data back to
main memory should precede the invalidation of a dirty line.
However, this cannot be done without resolving the line se-
lection problem. Thus, a resolution may not be possible or
may lead to data consistency failure.
Due to an error in the cache status field, a valid line can
be unintentionally invalidated if its valid bit is changed er-
roneously. In the case of a dirty bit error, a dirty line is
considered to be clean and may be replaced without a write-
back. Therefore, the most recent data can be lost. Line re-
placement based on access history can also be performed
improperly if faults flip the corresponding history bits.
3. New Architectural Approaches
In conventional systems, data integrity checking
schemes are implemented in a uniform structure. In this
section, we describe three alternative architectures that have
flexible chip area requirements. The parity caching scheme
described in Section 3.1 is proposed as a substitute for uni-
formly organized check code under extremely low error
rates. Section 3.2 describes shadow checking, which is an
inexpensive variant of replication architecture under very
noisy environments. Selective checking is presented in Sec-
tion 3.3 as a simpler alternative to the first two when a cache
has multi-way set associativity. In Section 3.4, we also dis-
cuss integration of cache scrubbing [11] into the proposed
architectures to enhance their capabilities.
A data read/write involves accessing cache cell arrays in-
cluding data, tag, and status bits. Errors can appear in any
field. Therefore, integrity checking is required for all three
fields. For brevity in presentation, we do not always ad-
dress the proposed schemes separately for each field. How-
ever, the operation and management mechanisms apply to
all three fields in an identical manner.
3.1. Parity caching
One of the widely known program properties is that only
10% of program instructions are responsible for 90% of in-
structions executed [4]. For some programs, a similar ob-
servation can be made in the data segment of main memory.
Cache accesses are also often localized. Under considerably
low error environments, it can be expected that most soft er-
rors of any significance will occur in these most commonly
used portions of instructions and data.
In a low error rate environment, when the budgeted area
is not sufficient for the check codes of the uniform struc-
ture, the number of check codes needs not be continuously
increased with the primary cache size to maintain high data
integrity. Based on the assumption that the MFU lines are
most error-prone and errors in those lines easily propagate
unless checked, we organize ap rity cache, whose entries
contain the check codes for the MFU lines. This scheme is
calledparity caching: the caching of check codes.
The organization and operation of the parity cache are
similar to general cache memories, but it provides integrity
checking for the main cache. It covers the most error-prone
main cache locations usinglog2l index bits, wherel is the
number of lines in the main cache. The number of parity
cache entries,n, is smaller thanl. The main cache lines for
which check codes are held in the parity cache are selected
dynamically such that the MFU portion of the cache can be
protected first. This is accomplished by employing least re-
cently used (LRU) replacement policy for the parity cache,
where the entry that has not been used for the longest time
is replaced with a new item.
Figure 2 shows the logical organization of a 16KB direct-
mapped data cache or D-cache (left half) protected by a par-
ity cache of 16 entries (right half) in conjunction with an
ECC unit. The 16 parity cache entries are organized in a
4-way set associative manner and store check codes for 16
lines selected from the main cache. The ECC unit performs
error checking. The parity cache type and the check code
capability can be flexibly determined. The main data line
consists of 32 bytes, and 32 parity bits (1 per byte) are used
for its protection. The tag is protected with a SEC-DEC














































































ECC for tag (6 bits)
Cmp
Comparator










Figure 2. A 16KB D-cache and a parity cache.
For the mapping between the parity and main caches,
each entry in the parity cache is tagged. In the case of direct-
mapped main caches, the index field of a reference address
is used for the parity cache tag as it exactly corresponds to
one line in the main cache. In the example above, the first
seven bits of the index are stored as a tag and the last two
bits are used in selecting a set in the parity cache. Note
that the number of parity tag bits is small in comparison to
the main cache tag, resulting in simpler tag comparators.
If the parity cache uses direct-mapping, the tag is reduced
to five bits. If the main cache isk-way (k > 1) set asso-
ciative, multiple lines of the same index value can coexist
in the main cache and one parity entry may erroneously be
mapped to all of those lines. This problem is solved by stor-
ing additionallog2k bits in the parity tag to distinguish the
correct line fromk ways. In this section, we apply the par-
ity caching only to direct-mapped main caches. We present
an alternate method (selective checking) for set associative
main caches in Section 3.3.
While the main cache serves the processor’s request, the
parity cache synchronously monitors the integrity of the
main cache and updates the check codes. This is the same
as existing systems with the uniform array of check codes,
and so, no additional delay in cache access time is needed.
When a miss occurs in the main cache, new data items are
fetched from the lower level memory with check codes, and
they are written into the parity cache in a free entry, if any,
or replace the LRU entry of the mapped set. Whenever a
read hit occurs in both caches, error checking on all three
fields of the line is performed using the ECC unit. Handling
errors is the same as in the conventional systems. Whenever
a main cache line is replaced, the check codes in the cor-
responding parity entry are also updated for the new line.
Thereby, a hit in only the parity cache never occurs.
If the parity cache misses the entry for a data block being
requested by the processor, the integrity checking cannot be
processed. In this case, the check codes are computed from
the data in the main cache and stored in a selected parity en-
try for future protection. If the accessed line has a corrupted
item, error propagation is possible. Check codes generated
using erroneous data do not help error checking. This event
is denoted bymisconstruction. Although the error does not
always propagate, we assume the worst case and treat it as
error propagation in the evaluation of the scheme.
The area estimate of the parity cache is obtained from
an on-chip cache area model presented by Mulder et al.
in [7]. They have used the technology independent notion
of a register-bit equivalent orrbe. Onerbe equals the area
of a one-bit register storage cell that has the highest band-
width. The static storage cell of medium bandwidth that
we use here has been empirically determined to be 0.6rbe
Thereby, an area represented inrbe for register cells is con-













































































































(b) 4-way parity cache
Figure 3. Check code area model.
Figure 3 depicts the check code array of a conventional
cache in the uniform structure and ak-way set associative
parity cache (k = 4). We denote the total area of the two
models bySc andSp, respectively. LetWE be the width of
an elementE and letCC represent the check code. The area
is the sum of areas of all memory elements and is given by
Sc = A(CC for tag) +A(CC for status bits) +A(CC for data)
+ A(drivers) +A(bitline sense amplifiers); (1)
Sp = A(CC + ovhdCC) +A(tag+ ovhdtag + status
+ ovhdstatus) +A(control); (2)
whereA(M) andovhdE denote the area of moduleM and
the overhead for elementE, respectively. TheovhdE in-
cludes comparators if any, drivers, and sense amps. In
the implementation of MIPS-X [1],Wcompare, Wdriver, and
Wsense-ampare approximately 6rbe each. For a static cell
array oflWCC bits (Figure 3a), the total size (Eqn: (1)) is
Sc = 0:6(WCC +Wdriver)(l +Wsense-amp)
= 0:6(WCC + 6)(l + 6),
whereWCC = WCC for tag +WCC for status bits+WCC for data.
The control logic for the parity cache can be imple-
mented in a programmable logic array (PLA) as a part of
the main cache controller. A PLA of 130rbe is presumed
according to [7]. From Eqn: (2), the area of ak-way parity
cache ofn entries fork 6= n (Figure 3b) is obtained as
Sp = 0:6(WCC  k + 6)(
n
k
+ 6) + 0:6f(Wstatus+Wtag)  k
+ 6g  (
n
k






whereWstatus= LRU bits+ valid bit + parity for the tag= log2k +
1 + 1, Wtag = log2l   log2n+ log2k = log2
lk
n































Direct, 16KB 4-way, 16KB
8-way, 16KB 16-way, 16KB
Direct, 32KB 4-way, 32KB










32 64 128 256




ECC / line, 16KB
parity / byte, 16KB
ECC / word, 16KB
ECC / byte, 16KB
ECC / line, 32KB
parity / byte, 32KB
ECC / word, 32KB
ECC / byte, 32KB












Direct 4-way 8-way 16-way




n = 32, 16KB n = 64, 16KB
n = 128, 16KB n = 256, 16KB
n = 32, 32KB n = 64, 32KB
n = 128, 32KB n = 256, 32KB
(c)WCC = 63
Figure 4. Relative area requirement.
To compare the areas of the two organizations, we com-
pute relative area ratio1 (RAR). The RAR for parity caching
equalsSp
Sc
. Figure 4 plots the RARs for various sets of con-
figuration parameters: check code type, data unit size, num-
ber of parity entry, and associativity. Considering current
microprocessors, two main cache sizes, 16KB for instruc-
tion cache (I-cache) and 32KB for D-cache, are compared.
However, some microprocessors employ larger caches. In
that case, the RARs become even better for parity caching.
The number of check bits per entry corresponds toWCC.
Four values are compared, representing different protection
capabilities. Several conclusions can be drawn. An increase
in check code width results in a decrease in the RAR. Ob-
viously, more overhead is required for higher associativity
and the RAR is proportional to the number of entries. The
parity cache with an RAR of less than 1 is of interest to us.
The corresponding protection coverages for these organiza-
tions are given in Section 5.
3.2. Shadow checking
For applications that require very high data integrity or
operate under highly noisy environments, parity- and ECC-
based protection cannot be satisfactory. One general ap-
proach in this case is to use replicated architecture such as
N modular redundancy (NMR) with majority voting, but it
is very expensive. Instead of full replications, we present an
alternative approach, calledshadow checking, where multi-
ple copies are partially supported to meet a budgeted area
which is not adequate for a complete NMR. The copies of
the MFU lines are stored inshadow cache. The underly-
ing idea is the same as parity caching, but the shadow cache
performs error checking by means of comparison using the
copies of data rather than check codes. The goal is to ob-
tain a high reliability enhancement even in the presence of
multiple-bit errors with smaller chip area overhead.
Figure 5a shows the diagram of shadow checking archi-
tecture. Depending on space availability,N identical addi-
tional cache modules, calledshadowi for 1  i  N , are
included in the shadow cache. We adopt the same address
mapping mechanism used in parity caching. Figure 5b de-
picts a parity group,j, consisting of a shadow cache tag and
















































(b) Components of parity group
Figure 5. Shadow checking architecture.
N copies of information, each of which contains tag, status,
and data bits. The shadow cache operates like a shadow of
the main cache. Data written into the main cache is also
written into the shadow cache along with the correspond-
ing tag and status bits in parallel. Error checking is per-
formed on read hits in both caches, and its effect depends
on the number of erroneousshadowmodules and their error
patterns. This is equivalent to known reliability gain in an
NMR system [12]. Thus, we do not discuss it here.
We show the advantages of shadow checking over full
replication with respect to chip area requirement and re-
sultant reliability enhancement. In the comparison, we ig-
nore the common factors of the two architectures such as
comparator/voter reliability and delay, and synchronization
cost. Figure 6 shows the RARs of a shadow cache of two
shadow modules in comparison with a triple modular re-
dundant (TMR) cache using the same area model presented
in Section 3.1. The RARs are smaller than those of a parity
cache for the same parameters due to the higher overhead

























Figure 6. Relative area ratio.
3.3. Selective checking
Parity caching and shadow checking have been proposed
as alternative architectures to the uniform check code struc-
ture and replication method, respectively. If the main cache
hask-way (k  2) set associativity, we can also configure
redundancy in a simpler manner. Out ofk lines per set, only
s lines (1  s < k) are selected to assign the check codes.
Similar to the previous two schemes, line selection is based
on the access frequency. We simply choose the most re-
cently used (MRU) lines of each set for error checking with
the expectation that those lines are MFU, and thus error-
prone and likely to be accessed in near future. We call this
schemeselective checking. It is obvious that the RAR for






































Figure 7. Uniform vs. selective organization.
Figure 7 depicts a comparison of redundant code organi-
zations between conventional approaches (left column) and
selective checking (right column) fork = 4 ands = 1.
Many commercial microprocessors use byte-parity or SEC-
DED codes in the uniform structure as shown in Figure 7a.
Alternatively, in selective checking withs = 1 (Figure 7b),
for Seti, only the MRU line is protected by a check code.
(If s  2, s MRU lines are guarded.) Each check code entry
is independently assigned to any line of a set while keeping
track of the MRU. Similar concepts can be applied to the
NMR cache (Figure 7c) for a selective NMR (Figure 7d)
whereN copies are maintained for only the MRU portions.
In the case of a miss, a line is fetched from memory with
check code and it becomes the MRU line. Whenever the
MRU line is requested, the cache controller recognizes that
the current check code belongs to the MRU and performs
error correction. As a result, no tag and resultant overhead
are necessary to achieve dynamic mapping.
The reliability of a cache that is already equipped with
a check code can be further enhanced in many ways. One
approach is to expand the check code with more check bits
as shown in Figure 7e. The new wider check code has a
higher checking capability. In case the full expansion is not
affordable, we can adopt the selective structure here. Onlys
additional code entries for each set are provided to enhance
the protection of the MRU lines. Figure 7f shows the selec-
tive expansion fors = 1. The combination of primary and
expanded check codes is callednhanced check code. The
primary check code is separable from the enhanced check
code. The expanded code is built in such a way that the
chosen line is protected by more intensive checking in con-
junction with primary check code. Any non-MRU line of a
set turns into a new MRU line whenever it is accessed. In
this case, only the checking by the primary code is valid.
The expanded code is ignored and is replaced by the code
for a new MRU line. Other basic operations of enhanced
check codes are the same as for the previous cases.
In selective checking, the redundant code entry is con-
structed for onlys (< k) lines per set. Thus, the redundant
codes are always evenly distributed over different sets irre-
spective of their usage frequency. On the other hand, the
parity and shadow caches maintain the redundant code en-
tries for the MFU lines in the range of the entire main cache.
This results in differences in cost and protection coverage of
the selective checking compared to the other two schemes.
3.4. Cache scrubbing
For most programs, less than 30% of instructions are
memory references. The D-cache is occupied during the
executions of those instructions. Depending on the proces-
sor architecture, some D-cache cycles may be idle. To fur-
ther enhance the data integrity, we can scrub off the latent
errors in the D-cache whenever possible. Soft error scrub-
bing is accomplished by reading out the data and check bits,
verifying their correctness, and writing back the corrected
data [11]. Scrubbing is more advantageous to caches pro-
tected by a low capability check code.
Since in our proposed schemes we shrink the check code
array, we suggest the use of the cache scrubbing technique
to increase the protection coverage. On every idle cycle, the
cache controller executes a single scrubbing cycle using an
entry from the check code array and its corresponding line
in the cache. One question that arises here is how to pick a
line for scrubbing. Random selection is the easiest method
to implement but performance may be poor. Intuitively, it
could be beneficial to check the lines whose check codes are
expected to be discarded soon to make room for new codes.
These can be the LRU lines in consideration of temporal
locality. By also taking spatial locality into account, lines
away from the MRU line and their neighboring lines can
also be selected.
4. Error Model and Evaluation Methodology
For the evaluation of the schemes, we employed a trace-
driven simulation combined with software error injection.
An error injection process inverts a single or multiple bits
in any field of a selected line. To reflect diverse possible
fault manifestation patterns, we conducted error injection
based on the following four error models.
1. For a cache item access, the mapped line/set in the
RAM is activated and probed. Any fault during the
access can result in errors in the line being accessed.
Thus, a higher error probability is expected in more
frequently accessed lines. Under this error model, the
error injection is executed in the target line right after
the access. We call thisdirect injection.
2. Cross-coupling effects of faults can generate errors in
the adjoining lines of the currently accessed line. Dur-
ing a line access, an error is injected in a neighboring
line on either side. We call thisadjacent injection.
3. Independent of line access, external disturbances and
single-event upsets can generate soft errors in any lo-
cation at any time. To simulate this occasion, an error
is injected in a random location at a random time. This
is calledrandom injection.
4. Unlike previous models, faults can cause errors in a
group such as column, row, or cell cluster. Only col-
umn errors can induce a performance difference be-
tween the conventional and proposed architectures.
Thus, we include a model, calledcolumn injection,
where an error is injected in every row of a selected
column. This model is added to examine the schemes’
performance in the worst situation.
All error models apply to both the main and protection
cache or array. The accuracy of error models depends on the
nature of the physical faults. Fault behavior and the distribu-
tion of different types of faults are likely to vary depending
on the operational environment. Therefore, it is very diffi-
cult to judge which error model is dominant and realistic in
a general situation. For this reason, we carried out a set of
simulation experiments for each model separately.
Parameters Main cache Parity/Shadow cache
Size I-cache: 16KB, D-cache: 32KB 256 check code entries
Associativity Direct-mapped 4-way set associative
Replacement
Write policy
Line size 32 Bytes 32 Bytes (shadow)
Least recently used (LRU) line first
Write-through, write around
Table 1 : Base cache parameters
The simulations were performed on-the-fly and every op-
eration was handled on a clock-by-clock basis, assuming
that a single instruction is issued and finished in every clock
cycle on a perfectly pipelined processor. The same protec-
tion scheme was applied to both the I-cache and D-cache on
each simulation run. Table 1 lists the base cache parameters
used for the simulations unless specified otherwise. All pro-
grams of SPEC95 suite [16] were instrumented on the Sun
Ultra1 model using Sun’s Shade tool V5.32C [2]. Table 2
shows input files and memory access rates of the programs
and hit rates on the base caches. These benchmarks provide
a range of computation and memory access patterns, and
their instruction and data sets are much larger than the sim-
ulated cache sizes. All benchmarks’ executable files were
built using Sun WorkShop Compilers,cc and f77 , with
the optimization flags-fast , -xO4 , and-xdepend .
The moment at which the decision of an error injec-
tion is made is defined as aninjection decision point(IDP).
For direct and adjacent injection models, the end of each
read/write access cycle was considered as an IDP, while the
trailing edge of every CPU clock cycle was used for ran-
dom and column injections. At each IDP, an error (multiple
errors for the column injection) is injected with a constant
probability, which is set to10 6 for direct and adjacent,
0:2  10 6 for random, and0:5  10 8 for column injec-
tion. These are accelerated rates for the rare events. If an
item selected for the error injection already has an error, no
additional error is injected. The number of error injections,
I , atN IDPs is a binomial random variable and the error in-
jection rate isI=N . To ignore initial warm up routines, the
error injection was started after the first 10 million instruc-
tions while the caches operated under normal conditions.
Errors were injected independently for the next 500 million
instruction executions and no injection was performed after-
ward. In consideration of latent errors, the simulations were
terminated after the following 100 million instructions.
Benchmark Input file Load (%) Store (%) I-cache (%) D-cache (%)
compress bigtest.in 7.20 2.16 99.99 84.68
gcc cp-decl.i 19.34 5.52 96.40 93.83
go 9stone21.in 18.79 6.77 97.74 93.60
ijpeg vigo.ppm 17.03 6.60 99.91 89.74
li *.lsp 20.89 9.89 98.61 93.66
m88ksim ctl.in 17.17 9.95 97.47 98.33
perl primes.in 26.03 12.41 96.08 95.76
vortex vortex.in 18.74 8.47 95.10 91.23
SPECint95 18.15 7.72 97.66 92.60
applu applu.in 25.43 11.40 99.99 86.25
apsi apsi.in 28.81 12.83 99.76 88.57
fpppp natoms.in 38.21 11.04 93.66 96.69
hydro2d hydro2d.in 21.65 9.28 99.52 75.03
mgrid mgrid.in 38.29 19.84 99.99 95.68
su2cor su2cor.in 22.24 7.23 97.11 91.52
swim swim.in 24.34 10.35 99.99 79.19
tomcatv tomcatv.in 22.20 7.74 97.47 92.57
turb3d turb3d.in 17.16 12.24 99.90 93.25
wave5 wave5.in 19.33 10.26 97.95 84.01
SPECfp95 25.77 11.22 98.53 88.28
Table 2 : Summary of benchmarks
The performance comparison targets for the proposed
schemes are uniformly organized check code and replica-
tion. The number of error bits per injection is not an impor-
tant factor in the comparison because the same capability
of the unit protection code is assumed. Only the number of
code entries is different. The parity cache was implemented
with a SEC-DED code, and single-bit errors were injected.
D-cache scrubbing was tested along with parity caching. A
shadow cache of two shadow modules was compared with a
general TMR cache. To simulate a harsher environment for
hadow checking, multiple-bit errors were used. Although
we proposed three organizations for selective checking, the
main idea of the three is common. Therefore, we investi-
gated only a simple case where onlys entries of SEC-DED
codes are maintained for each main cache set (Figure 7b)
with single-bit error injection.
Our main interest is how many injected errors propagate
to other components under the proposed schemes. For a
quantitative performance measurement, we use error prop-
agation rate (EPR), defined by
EPR =
total number of errors propagated
total number of errors injected
 100:
5. Results and Analysis
This section reports the performance of the three pro-
posed architectures. Instead of presenting the simulation
results in an exhaustive manner for all parameters, we fo-
cus only on the salient features to gain insight into how the
parameters affect the protection capabilities of the schemes.
5.1. The performance of parity caching
Recall that single-bit errors in cache protected by SEC-
DED code in the conventional uniform manner cannot prop-
agate (i.e., EPR = 0%). Figure 8 shows the EPRs under the
protection of two independent parity caches whose RARs
are 0.58 and 0.30 for checking the I-cache and D-cache,
respectively. The results of the four error models are com-
pared. If we assume that error propagation is equally likely
to occur in all locations, the expected EPRs with the check
codes of these areas would be 42% and 70% (identified by
thick dashed-lines in the figures), respectively. However, in
most cases, the EPRs are much lower than these values be-
cause the distribution of error propagation is not uniform.
Organizing the check codes in a cache makes the most of






























































































































































(d) Average EPR on D-cache
Figure 8. Error propagation rate (EPR).
Under the direct error injection model, for all bench-
marks exceptswim, the small parity cache allows less than
about 3% of injected errors to propagate. This is because
this error model and the applications match well with the
premise on which parity caching is developed. Due to spa-
tial locality, the parity cache also provides relatively high
protection coverage in the adjacent error model. However,
we observe larger EPRs on the D-cache forc mpress, ijpeg
andswim. One common attribute of these benchmarks is
that their data access does not show good locality, as can
be ascertained from their low hit rates in the D-cache given
in Table 2. The hit rate in the parity cache is even lower.
Thus, more items in the D-cache, whose check codes are not
present in the parity cache, can be requested. In this case,
error propagation takes place unless the items are error-free.
Even if error occurrence is evenly distributed (i.e., ran-
dom error model), localized error propagation makes the
parity cache area efficient. In the column error model where
every line gets an error injection, the results are not very
promising. Once a column error injection is executed, any
read miss in the parity cache after that results in an error
propagation. The column error injection is the worst case
test model. Nevertheless, if we consider the area occupancy,
on average parity caching provides more protection cover-
age with a given area as shown in Figures 8c and 8d.
Cache type
No. of entries 32 64 128 256 32 64 128 256
RAR 0.13 0.20 0.33 0.58 0.07 0.10 0.17 0.30
compress 0.00 0.00 0.00 0.00 8.33 2.08 0.00 2.08
gcc 6.84 5.34 2.99 1.28 1.85 1.85 0.00 0.00
go 7.59 6.29 3.04 1.08 3.91 0.78 0.00 0.78
ijpeg 6.35 3.94 3.94 1.31 5.30 5.30 6.06 0.76
li 5.38 1.79 0.22 0.00 0.00 0.00 0.00 0.00
m88ksim 4.57 4.35 3.48 0.22 0.00 0.00 0.00 0.00
perl 13.51 8.50 3.27 0.00 3.28 1.09 0.00 0.00
vortex 12.33 7.71 2.86 1.32 2.36 0.00 0.00 0.00
SPECint95 7.07 4.74 2.47 0.65 3.13 1.39 0.76 0.45
applu 10.75 8.55 3.51 3.07 2.65 1.06 0.00 0.00
apsi 5.42 3.65 0.65 0.00 14.16 12.79 4.11 0.00
fpppp 6.64 5.72 4.35 1.83 13.71 5.65 1.61 0.00
hydro2d 1.36 1.36 1.36 0.91 1.94 1.29 0.00 0.00
mgrid 0.00 0.00 0.00 0.00 1.52 1.89 0.38 0.38
su2cor 10.11 9.67 8.35 1.76 0.67 1.34 0.00 0.00
swim 2.25 0.00 0.00 0.00 29.14 37.09 39.74 41.06
tomcatv 8.22 8.22 7.31 2.05 2.40 1.80 0.00 0.00
turb3d 4.55 1.73 1.08 0.00 17.53 5.19 0.65 0.00
wave5 9.75 4.08 0.00 0.00 3.25 1.30 0.65 0.65








Table 3 : EPR (%) vs : the number of parity entries
Table 3 gives the EPRs on the direct error model for
an increasing number of check code entries along with
RARs. Only 32 entries, which occupy 13% and 7% of the
area needed for the uniform structure for the I-cache and
D-cache, respectively, bring significantly high coverages.
Again, this results from the fact that for many applications
the cache access is localized to a very small region. Inter-
estingly,swimexhibits a different attribute: as more entries
are added, the EPR increases. Forswim, it turns out that
increasing the parity cache associativity is more beneficial
than increasing the size. The EPR is reduced to 0.66% on
the 16-way set associative parity cache of 256 entries.
Some errors can be removed by normal write operations.
Figure 9a depicts the portion of overwritten errors out of
total injected errors. In the I-cache, only an instruction miss
generates a write, while a store request also causes a write in
the D-cache. This is why the overwritten error rate is higher
for the D-cache than for the I-cache. The results presented
so far are collected from parity caching in combination with
error scrubbing (D-cache only). In the case of fewer entries,
errors eliminated by scrubbing account for a large portion
of total eliminated errors as shown in Figure 9b. This is
because the number of scrubbing cycles executed per check
code entry is larger in a small parity cache. As the parity
cache includes more entries, error removal at read access







32 64 128 256















I-cache, SPECint95 I-cache, SPECfp95
D-cache, SPECint95 D-cache, SPECfp95







32 64 128 256
















Figure 9. Error removal.
We also present the average EPRs for other parameters.
From Figure 10a we note that higher parity cache associa-
tivity enhances the error checking capability. However, the
increase becomes insignificant with more than 8-way asso-
ciativity. On the other hand, area requirement grows rapidly





























I-cache, SPECint95 I-cache, SPECfp95
D-cache, SPECint95 D-cache, SPECfp95
RAR, I-cache RAR, D-cache
























LRU entry selection for scrubbing
No scrubbing
(b) Base parameters
Figure 10. Effects of other parameters.
In Figure 10b,base parameter setconsists of LRU pol-
icy for entry replacement, error scrubbing, and random en-
try selection for scrubbing. For performance comparison,
three additional simulations were performed with only one
parameter variation at a time. Due to the small number of
check code entries, one may question if a simpler replace-
ment can affect the coverage. We tested a pseudo-random
policy. The LRU strategy performs slightly better on the
D-cache for SPECint95. It is, however, a little less efficient
than the pseudo-random policy in the other cases. From
the results, we conclude that the replacement policy does
not significantly affect the performance. In section 3.4, we
discussed the entry selection issue for error scrubbing. As
shown in the graph, the performance gain from LRU entry
selection for scrubbing is insignificant. The results without
scrubbing are also shown. Scrubbing mostly improves the
coverage at the cost of hardware complexity.
Thus far we have discussed the effect of other parame-
ters on the parity cache performance only in the case of the
direct error model. However, similar effects of parameters
were noted from the results of simulations under the other
three models. We omit them here due to space limitation.
5.2. The performance of shadow checking
We have also conducted a set of simulations for shadow
checking to investigate how replication architecture with
unequal sized modules performs under the presence of soft
errors following different error models. Errors were in-
jected in the shadows as well as the main cache. Data items
that are supposed to be identical under the normal condi-
tion were exposed to independent error injections and are













































direct adjacent random column
(b) 8KB shadow
Figure 11. EPR under shadow checking.
Figure 11 shows the average EPRs under shadow check-
ing with two shadows. The results for two shadow sizes
are compared. Clearly, larger shadow misses fewer errors.
Note that the performance variation among different error
models are similar to the case of parity caching (Figures 8c
and 8d). The RARs of the shadow cache with 4KB shad-
ows are 0.3 and 0.15 for the I-cache and D-cache, respec-
tively. In the case of 8KB shadows, the RARs are 0.55 and
0.38, respectively. Here, we also observe the EPRs in the
direct model are very low, but the same is not the case for
the other models. However, we still confirm that shadow
checking is very area efficient in all cases. If the designer
needs to enhance cache reliability against the types of er-
rors that require replication, but only a small area can be
budgeted, then the shadow cache is worth considering.
5.3. The performance of selective checking
Figures 12a and 12b show the relationship between av-
erage EPRs and the number of entries per set in the direct
error model. With only half of the check code required for
the uniform structure, EPRs of less than 4% are obtained.
However, this is lower coverage than a parity cache of the
same area can provide. The reason for this is that in selec-
tive checking, a line is selected to assign check codes within
the scope of a set rather than the entire cache. Recall that
an advantage of selective checking is that it only needs a
simple modification to the conventional architecture.
From Figure 12c, unlike the first two checking schemes,
we note that the protection coverage of two check codes
per 4-way set varies very little under the three error mod-
els. This is also due to the fact that the check codes are
managed independently for each set. Although EPRs are
relatively high in the three models, they are still lower than
29%, which is much higher coverage than an intuitive ex-















SPECint95, I-cache SPECfp95, I-cache
















SPECint95, I-cache SPECfp95, I-cache























direct adjacent random column
(c) Effect of four error models
Figure 12. EPR under selective checking.
dicates that our locality-based checking scheme efficiently
uses the given check code area.
6. Conclusions
In conventional architectures, as the size of the primary
cache grows, the redundant code for data integrity check-
ing also needs to be increased proportionally. We have pro-
posed new architectural solutions for the situations where
enough area cannot be budgeted to support this uniform or-
ganization and further expansion of the protection code. In
our schemes, check code can be designed for a given area
in such a way that the most frequently accessed cache lines,
which are likely to be the most error-prone and are likely to
have the lowest error propagation latencies, take precedence
in integrity checking over less frequently used lines. Predic-
tion for line selection is performed by taking advantage of
locality in cache accesses.
We have considered four possible error models and ap-
plied them to our simulated systems. From the simulation,
we have found that with% check codes of the uniform
error checking architecture, the proposed schemes achieve
far more than% in error protection coverage. In partic-
ular, significantly low EPRs are obtained under the direct
error model with a small area. We have also shown that
adding the error scrubbing technique is more beneficial for
a system with a small number of protection code entries.
Parity caching and shadow checking schemes are more ef-
fective in the adjacent error model than selective checking.
However, selective checking requires the simplest organi-
zation and management, and is thus easy to implement for
multi-way set associative caches. Despite the unbiased er-
ror injection in time and location in the random and column
models, our schemes that are tuned for the protection of the
MFU lines are still area effective for such error models.
We have shown that our locality-based configuration
schemes for the check codes can be adapted to current sys-
tems with a small overhead. An important advantage of
the proposed architectures over the conventional uniform
structure is the flexibility given to the system designer in
planning cache systems of the desired capacity in terms of
size and reliability. We have given an area estimate for the
schemes based on an area model. In order to fully validate
the benefit of the schemes obtained from controllability of
area occupancy, the geometry of physical chip area in the
VLSI design needs to be investigated further.
Acknowledgments: The authors would like to thank Jo-
han Karlsson for his comments on our error injection model,
Matt Virgo for verifying our trace analyzers, and the anony-
mous reviewers for providing useful comments. This work
was funded in part by NSF Grants MIP-9630058 and MIP-
9896025.
References
[1] P. Chow. The MIPS-X RISC Microprocessor. Kluwer,
Boston, 1989.
[2] B. Cmelik and D. Keppel. Shade: a fast instruction-set simu-
lator for execution profiling.Performance Evaluation Reviw,
22:128–137, May 1994.
[3] M. Hamada and E. Fujiwara. A class of error control
codes for byte organized memory systems-SbEC-(Sb+S)ED
codes.IEEE Trans. on Computers, 46(1):105–110, January
1997.
[4] J. L. Hennessy and D. A. Patterson.Computer Architec-
ture: A Quantitative Approach. Morgan Kaufmann, San
Francisco, CA, 1996.
[5] H. Imai. Essentials of Error-Control Coding Techniques.
Academic Press, San Diego, CA, 1990.
[6] J. Karlsson, P. Ledan, P. Dahlgren, and R. Johansson. Using
heavy-ion radiation to validate fault handling mechanisms.
IEEE Micro, 14(1):8–23, February 1994.
[7] J. M. Mulder, N. T. Quach, and M. J. Flynn. An area model
for on-chip memories and its application.IEEE J. Solid-
State Circuits, 26:98–106, February 1991.
[8] S. Park and B. Bose. Burst asymmetric/unidirectional error
correcting/detecting codes.Proc. Int’l Symp. Fault-Tolerant
Computing, pages 273–280, June 1990.
[9] D. Patterson, T. Anderson, N. Cardwell, R. Formm, K. Kee-
ton, K. Kozyrakis, R. Thomas, and K. Yelick. Intelligent
RAM (IRAM): Chips that remember and compute.Proc.
Int’l Symp. Solid-State Circuits, pages 224–225, February
1997.
[10] J. C. Pickel and J. T. B. Jr. Cosmic ray induced error in MOS
memory cells.IEEE Trans. Nuclear Science, NS-25:1166–
1171, December 1978.
[11] A. M. Saleh. Reliability of scrubbing recovery-techniques
for memory systems.IEEE Trans. Reliability, 30(1):114–
122, April 1990.
[12] D. P. Siewiorek and R. S. Swarz.Reliable Computer Sys-
tems: Design and Evaluation. Digital Press, Beford, MA,
1992.
[13] A. K. Somani and K. Trivedi. A cache error propagation
model. Proc. Int’l Symp. Pacific Rim Fault Tolerant Com-
puting, pages 15–21, December 1997.
[14] J. Sosnowski. Transient fault tolerance in digital systems.
IEEE Micro, 14(1):24–35, February 1994.
[15] P. Sweazey. SRAM organization, control, and speed, and
their effect on cache memory design.Midcon/87, pages
434–437, September 1987.
[16] URL:. http://www.specbench.org .
