A Hierarchical Approach for Dependability Analysis of a Commercial Cached RAID Storage Architecture by Kaâniche, M. et al.
February 1998 UILU-ENG-98-2205
CRHC-98-02
University of Illinois at Urbana-Champaign
A Hierarchical Approach for Dependability Analysis 
of a Commercial Cached RAID Storage Architecture
M. Kaaniche, L. Romano, Z. Kalbarczyk, 
R.K. Iyer, and R. Karcich
Coordinated Science Laboratory
1308 West Main Street, Urbana, IL 61801
R EPO R T DO CUM ENTATIO N PAGE
Form  A p p ro v e d  
O M B  N O . 0 7 0 4 -0 1 8 8
PuOtic rtoom ng buras« lo f  this colt*cuon o l u ilm m su oo  13 oxum aiaO to sv s ra g o  1 hour par raaponsa, including tn# tima lo r rav tawing instructions, saarcnm g axisong d ata  souncas, 
gain «nog and m a m u m n g  tha data n aa d a d . a n d  com p ia tn g  and raviawatg tha cotlaction o l information. S a n d  com m  ant rag a rc in g  this Cure an « so n ata s  or any otnar aaoact of this 
«« •coon  at information, «e lu d in g  su g g e stio n s for ra d u e n g  mis b u rd a a  to W ashington H a a d q u a n a rs  Sa n n e as . Oiractormta for « form ation  O p eratio n s an d  P a p o n s . 1215 Ja ffa rso n  
Oavts Highway. Suita 1204. Arlington. V A  222 0 2-4 3 02 . and to m e Offica of M anagam ant an d  B u d g e t Paparworft H e d u ctio n  Protect (0704-0188). W ash ington . D C  20503.
1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. R EPO R T TY P E  AND DATES COVERED
1/30/98
4. TITLE AND SUBTITLE
A. Hierarchical Approach for Dependability Analysis of a 
Commercial Cached RAID Storage Architecture
5. FUNDING NUMBERS
DÂBT6 3-94-0.045
i
6. AUTHOR(S)
M. Kaaniche, L. Romano, Z. Kalbarczyk, R. Iyer, and R. Karcic
7. PERFORMING ORGANIZATION NAM ES(S) AND ADDRESS(ES)
Coordinated Science Laboratory Storage Technology 
University of Illinois 2270 S. 88th St.
1308 W. Main St. MS 2220
Urbana, IL 61801 Louisville, CO 80028
8. PERFORMING O RG ANIZATIO N  
REPORT NUMBER
(CRHC-98-02)
UILU-ENG-98-2205
9. SPONSORING / MO NITO RING A G EN CY NAME(S) AND ADORESS(ES)
DARPA/ITO'
3701 N. Fairfax Dr.
Arlington, VA 22203-1714
10. SPONSORING / M O NITO RING  
AGENCY REPORT NUM BER
11. SUPPLEMENTARY NOTES
The views, opinions and/or findings contained in this report are those of the author(s) and should not be construed as 
an official Department of the Army position, policy or decision, unless so designated by other documentation.
12a. DISTRIBUTION / AVAILABILITY STATEM ENT
Approved for public release; distribution unlimited.
12 b .  DISTRIBUTION CO DE
13. ABSTRACT (Maximum 2 00  words)
This report presents a hierarchical methodology for the dependability analysis and evaluation o f a highly 
available commercial cache-based RAID storage system. The architecture is com plex and includes 
several layers of overlapping error detection and correction mechanisms. A  detailed analysis o f such a 
system, requires the development of an efficient modeling approach to cope with the complexity of the 
system. Three different levels have been developed for the present study. Nevertheless, the flexibility o f  
the suggested approach makes it possible to define more levels, should the need arise. To preserve this 
flexibility, the m odels have been developed using DEPEND, an Object-Oriented simulation-based 
environment for system  level dependability analysis which provides facilities to inject faults into a 
functional behavior m odel, to simulate error detection and recovery mechanisms and to evaluate 
quantitative measures.
14. SUBJECT TERMS
disk caching, RAID, error detection and recovery, dependability 
modeling, behavioral simulation
15. NUMBER IF PAGES
21
16. PRICE CO DE
17. SECURITY CLASSIFICATION  
OR REPORT
UNCLASSIFIED
18. SECURITY CLASSIFICATION  
O F THIS PAGE
UNCLASSIFIED
19. SECURITY CLASSIFICATIO N  
OF ABSTRACT
UNCLASSIFIED
20. LIMITATION O F  ABSTRACT
UL
Standard Form 298 (Rav. 2-89) 
Pmacnb«! by ANSI Sid. 239-13'toa »Vi
NSN 7540-01-280-5500
A Hierarchical Approach for Dependability Analysis 
of a Commercial Cached RAID Storage Architecture*
f M. Kaâniche tfL. Romano Z. Kalbarczyk R. K. Iyer and R. Karcich
Center for Reliable and High-Performance 
Computing
University of Illinois at Urbana-Champaign 
1308 W. Main St., Urbana, IL 61801, USA
2270 So 88th Street MS 2220 
Louisville, CO 80028, USA
Storage Technology
Contact Author: Mohamed Kaâniche, email: kaaniche@crhc.uiuc.edu; kaaniche@laas.fr
Abstract
This report presents a hierarchical methodology for the dependability analysis and evaluation of a 
highly available commercial cache-based RAID storage system. The architecture is complex and in­
cludes several layers of overlapping error detection and correction mechanisms. A detailed analysis of 
such a system, requires the development of an efficient modeling approach to cope with the complexity 
of the system. In this paper, we suggest a hierarchical approach to model the cache architecture, cache 
operations and error detection and recovery mechanisms, and to analyze, at each level of the hierarchy, 
the impact of faults and errors occurring in the cache and in the disks. Three different levels have been 
developed for the present study. Nevertheless, the flexibility of the suggested approach makes it possible 
to define more levels, should the need arise. To preserve this flexibility, the models have been developed 
using DEPEND, an Object-Oriented simulation-based environment for system level dependability analy­
sis which provides facilities to inject faults into a functional behavior model, to simulate error detection 
and recovery mechanisms and to evaluate quantitative measures. It is possible to feed the simulator with 
several types of input distributions, as well as with a real trace collected on the field. A number of fault 
models is defined for each submodel to simulate cache component failures, disk failures, transmission 
errors, and data errors in the cache memory and in the disk array. Some of the parameters characterizing 
fault injection in a given submodel correspond to probabilities evaluated from the simulation of the 
lower level submodel.
Based on the proposed methodology, we: 1) analyze the potentiality for errors to remain undetected by 
the cache subsystem, or to cross the boundaries of several detection mechanisms before being detected, 
2) analyze the system behavior under a real workload and different error rates (focussing on burst er­
rors), and finally 3) evaluate quantitative measures characterizing system error detection latency and 
coverage factors for each error detection mechanism.
Key words: Disk caching, RAID, error detection and recovery, dependability modeling, behavioral 
simulation.
* For proprietary reasons, we are not allowed to reveal the name of the system.
t LAAS-CNRS, Toulouse, France, currently a visiting Research Assistant Professor at CRHC (until March 1998). 
ri Dipartimento di Informatica e Sistemistica, Naples, Italy, temporarily at CRHC.
1. Introduction
A RAID (Redundant Array of Inexpensive Disks) is a set of disk drives (and associated controller) 
which can automatically recover data when one (or more) drives in the set fail. Since the definition of 
RAID concepts and architectures in [Patterson et a l  1988], several papers have been published to 
analyze their performance and dependability [Chen et a l  1994, Gibson 1992, Malhotra and Trivedi 
1993], to propose new algorithms and concepts to improve their fault tolerance capabilities [Alvarez et 
a l  1997, Blaum et a l  1995, Burkhard & Menon 1993], to reduce the disk reconstruction times and 
the performance degradation when a single disk fails [Holland et a l  1993, Hou & Patt 1997, Muntz 
and Lui 1990], or to present and discuss real implementations and designs based on RAID concepts 
[Friedman 1995, Houtekamer 1995, Katz et a l  1993].
Storage architectures using a large cache and RAID disks are becoming a popular solution for pro­
viding high performance at low cost without sacrifying much data reliability. The analysis of these 
systems has been focussed on performance, see e.g., [Hou & Patt 1997, Menon 1994, Mourad et a l  
1993]. In these papers, the cache subsystem is assumed to be error free and only the impact of errors 
occurring in the disks is investigated. The impact of errors in the cache is addressed (to a limited ex­
tent) from a design point of view, in [Menon & Cortney 1993] where the architecture of a fault- 
tolerant cached RAID controller is presented. Papers addressing the impact of errors in caches can be 
found in other applications not related to RAID systems, for example, [Chen & Somani 1994] and 
[Somani & Kim 1997] define protocols to support the detection and recovery of cache and processor 
transient faults in redundant and non redundant processor systems employing caches.
In this paper, we address some unexplored issues on disk caches. Unlike previous work which 
mainly explored the effects of caching on the performance of disk arrays under the condition that all 
disks are operational or when one or more disks fail, we focus on the dependability analysis and 
evaluation of: 1) the impact of errors occurring in the cache subsystem and in the disk array on the 
system behavior and the data requested by the users, and 2) the efficiency of error detection and re­
covery mechanisms implemented in the cache and in the disk array in handling these errors. The ar­
chitecture analyzed in the paper is based on a real commercial design and consists of a cache subsys­
tem supporting a RAID disk array. The cache architecture is complex and consists of several layers of 
overlapping error detection and correction mechanisms. Instead of using analytical modeling tech­
niques based on Markov modeling as in most previous works dealing with the dependability of RAID 
systems, we use a behavioral simulation approach which offers more flexibility in addressing the im­
pact of different types of faults on the behavior of the system under real workload distributions 
(derived from actual input traces). A detailed evaluation of the architecture presented in this paper, in a 
reasonable simulation time, requires the use of efficient modeling techniques. In this study, we pres­
ent a hierarchical approach for the modeling and evaluation of the cache architecture, operations, and 
error detection and recovery mechanisms. The impact of faults occurring in the cache and the disk ar- 
raythe are evaluated at each level of the hierarchy. Simulation experiments are conducted using the 
Depend tool [Goswami et a l  1997]. Depend is a simulation-based environment for system level de­
pendability analysis which provides facilities to inject faults into a functional behavior model, to
simulate error detection and recovery mechanisms and to evaluate dependability measures. The pro­
posed model is decomposed into three submodels built according to a hierarchical approach. Several 
kinds of fault injectors have been incorporated in the model to simulate disk failures, cache component 
failures, transmission errors, and data errors in the cache memory and in the disk array. Some of the 
parameters characterizing fault injection in a given submodel correspond to probabilities evaluated 
from the simulation of the lower level submodel.
Simulation experiments are conducted, based on the proposed methodology, in order to: 1) trace error 
scenarios and analyze the potentiality of errors to cross the boundaries of several error detection and 
recovery mechanisms errors may remain undetected by the cache, and 2) analyze how the system re­
sponds to various error rates. A real trace file, collected on the field, instead of a synthetic generator, 
is used to drive the simulation, to best reproduce the characteristics of the input load. Since highly re­
liable commercial systems can generally easily tolerate isolated errors, our study focuses on the impact 
of multiple near-coincident errors occurring during a short period of time (bursts). In particular, the 
phenomena which lead to bursts of errors in a system, have been seldomly explored. In the present 
study, we show how it is possible, due to the high frequency of system operation, that a transient 
fault in a single component of the system results in a burst of errors which propagates to other com­
ponents. This means that, what is seen at a given level of abstraction as a single error becomes, at a 
higher level of abstraction, a burst of errors. We also analyze the distribution of the error latencies 
taking into account where and when these errors are generated. This provides useful insight about 
how long errors remain undetected in the system and about the probability for those errors to accu­
mulate in the cache and/or in the disks, thus affecting the behavior of the system and the efficiency of 
the error detection mechanisms [Chillarege & Iyer 1989].
This paper is structured into seven sections. Section 2 describes the architecture analyzed and the 
cache operations focussing on the error detection and recovery mechanisms. Section 3, summarizes 
the dependability analysis objectives of the study and introduces the hierarchical modeling methodol­
ogy. The hierarchical model developed for the system analyzed in this paper is described in detail in 
Section 4. Section 5 presents the results of the simulation experiments and Section 6 summarizes and 
discusses the main outputs of the study. Finally, Section 7 concludes the paper.
2. System Presentation
The storage architecture analyzed in this paper is summarized in Figure 1. This architecture is de­
signed to support very large amount of disk storage and to provide high performance and high avail­
ability. The storage system supports a RAID architecture composed of a collection of disk drives 
storing data, parity and Reed-Solomon coding information that are striped across the disks [Chen et 
al. 1994]. This architecture tolerates the failure of up to two disks. If a disk fails, the data lost from 
the failed disk is reconstructed on-the-fly using the valid disks and reconstructed data is stored on a 
hot spare disk without interrupting the service. The RAID system is controlled and managed by the 
array controller. The array controller manages data transfer operations between the hosts and the 
disks. It is composed of a set of control units which process user requests received from the channel 
interfaces and direct these requests to the cache subsystem. All data transfer operations between the
channels and the disk array are performed by the cache subsystem. The communication, control and 
data transfers inside the array controller are performed via reliable and high speed control and data 
busses. The cache subsystem consists of: 1) a cache controller organized into cache controller inter­
faces to the channels and the disks and cache controller interfaces to cache memory, and 2) cache 
volatile and nonvolatile memory. The cache controller interfaces are composed by redundant compo­
nents to ensure a high-level of availability. The cache controller processes read and write requests sent 
by the array controller, controls the access to cache memory and transfers the data between the chan­
nels, the disks and the cache memory. The communication links between the cache controller inter­
faces and with the cache memory are provided by redundant and multi-directional busses (denoted as 
Bus 1 and Bus 2). The cache subsystem implements a large volatile memory cache used as a data 
staging area for read and write operations, and a battery-backed nonvolatile memory to protect critical 
data against failures (modified user data which are updated in the cache and not yet in the disks, in­
formation of the file system that are necessary to map the data processed by the array controller to 
their physical location on the disks, etc.). The data transmitted from the hosts through the channel in­
terfaces is received in blocks of variable size. These blocks are assembled into tracks in the cache. The
number of tracks corresponding to a single request depends on the user applications.
d Ain  .
1
d  I w Disk1 VDisk ■1 ! W
1
Interfaces
Figure 1: Array controller architecture, external interfaces and data flow
2.1 Cache subsystem operations
Any data transfer request received by the array controller is processed by the cache subsystem. The 
cache subsystem is used for caching read as well as write requests. Entire tracks are staged into the 
cache memory, however, user write requests may lead to the modification of only a few blocks of the 
track in the cache memory. In the following, we describe the main cache operations assuming that the
unit of data transfer is an entire track. General information about cache operations and strategies can 
be found in [Karedla et al. 1994, Menon & Cortney 1993].
Read operation. The cache controller first checks the contents of the cache memory. If the track is 
already in the cache (“cache hit”), the track is read from the cache memory and the corresponding data 
is sent back to the channel interfaces. If the track is not referenced in the cache (“cache miss”), a re­
quest is scheduled to read the track from the disk array and swap it into the cache memory. Then, the 
track is read from the cache memory and the data is sent back to the channel interfaces.
Write operation. As for a read operation, the cache controller first checks the contents of the cache 
memory. In the case of a cache hit, the track is modified in the cache and flagged as “dirty” (otherwise 
the track is “clean). In the case of a cache miss, memory is allocated to the track and the track is writ­
ten into that memory location. Two write strategies can be distinguished: 1) write-through and 2) fast 
write. In the write-through strategy, the track is first written to the cache volatile memory and the 
write operation completion is signaled to the channel interfaces only after the track is written to the 
disks. In the fast-write strategy, the track is written to the volatile memory and to the nonvolatile 
memory and the write operation completion is signaled immediately. The modified track is later 
destaged to the disks according to a write-back strategy which consists in transferring the dirty tracks 
from the cache memory to the disks, periodically or when the amount of dirty tracks in the cache ex­
ceeds a predefined threshold. When space for a new track is needed in the cache memory, for a read 
or a write operation, a cache replacement algorithm based on the Least-Recently-Used (LRU) strategy 
is applied to swap out one track from the cache memory. If the LRU track is clean, the space occupied 
by that track can be immediately used; if it is dirty, the track must be destaged to the disk first.
Track transfer inside the cache. The transfer of the track between the cache memory, the cache 
controller and the channel interfaces is decomposed into several elementary data transfers. The track is 
broken down into several data blocks to accomodate the parallelism of the different devices involved 
in the transfer. This also makes it possible to overlap several track transfer operations over the data 
busses inside the cache subsystem. Arbitration algorithms are implemented to synchronize these trans­
fers and ovoid bus hogging by a single transfer.
2.2 Error detection mechanisms
The cache subsystem is designed to detect errors in the data, address and control paths by using, 
among other techniques, a combination of parity, error detection and correction codes in the memory 
and Cyclic Redundancy Checking (CRC). In the following, we explain how these error detection 
mechanisms are applied to detect errors in the data path.
Parity. Data transfers, over Busl, between the cache controller interfaces (see Figure 1) are covered 
by parity. For each data symbol (i.e., data word) transferred on the bus, parity bits are appended and 
passed over separate wires. Parity is generated and checked in both directions. Parity is not stored in 
the cache memory, it is stripped after being checked. When a parity error is detected, automatic retries 
are attempted by the cache controller to recover from the error. If the error persists, the internal cache 
operation is interrupted and a parity error is recorded.
Error detection and correction (EDAC) code. The data transmitted from the cache controller 
interfaces with the cache memory and stored in the volatile memory or the nonvolatile memory is pro­
tected by an error detection and correction code. The code produces redundancy bits which are ap­
pended to each data symbol transferred over Bus 2 and then stored in the cache. EDAC is capable of 
correcting on-the-fly all single or double bit errors per data symbol and detecting all triple bit data er­
rors. These errors may occur during data transfer over Bus2 or while the data is in the memory.
CRC. There are several kinds of CRC checking implemented in the array controller, only two of 
these are checked or generated within the cache subsystem: the frontend CRC and the physical sector 
CRC. The frontend CRC is computed by the channel interfaces and appended to the data sent to the 
cache subsystem during a write request. This CRC is checked by the cache controller. If the frontend 
CRC is valid, it is stored with the data in the cache memory. Otherwise, the cache operation is inter­
rupted and a CRC error is recorded. The frontend CRC is checked again when a read request involv­
ing the corresponding data is received later from the channels. Therefore, extra-detection is provided 
to recover from errors that may have occurred while the data was in the cache or in the disks and that 
escaped the error detection mechanisms implemented in the cache subsystem or the disk array. The 
physical sector CRC is appended, by the cache controller interfaces with the disks, to each data block 
to be stored in a disk sector. This CRC is stored with the data until a read from disk operation occurs: 
the physical sector CRC is checked and stripped before the data is stored in the cache memory. The 
same algorithm is implemented for frontend CRC and physical sector CRC computation and check­
ing. The algorithm is guaranteed to detect 3 or less data symbols in error in a data record.
Error detection m echanism Frontend
CRC
P arity EDAC Physical sec to r  
C RCError Location
Transfer from channels to cache X
Cache controller interfaces with 
channels/d isks
X
Bus 1 X X
C ache controller interfaces with 
cache m em ory
X
Bus 2 X X
Cache memory X X
Transfer from cache to disk X X
Disks X X
Error detection condition less than four data symbols with errors
odd number of errors 
in each data symbol
less than four bit-errors 
in each data symbol
less than four data 
symbols with errors
Table 1: Error detection efficiency with respect to the location and the number of errors
Table 1 summarizes the conditions for error detection for each mechanism presented above taking into 
account the component where the errors occur and the number of errors. The (x) symbol means that 
errors occurring in the corresponding component can be detected by the error detection mechanisms 
indicated in the corresponding column. The number of check bits and the size of the data symbol 
mentioned in the error detection condition are different for parity, EDAC and CRC. The number of 
errors indicated by the error detection condition corresponds to the total number of non corrected er­
rors occurring between the computation of the error detection code and the checking of this code. 
Therefore, some errors which may remain undetected by one mechanism, e.g. parity, can still be de­
tected by another one, e.g. frontend CRC. Therefore, the overlapping of error detection mechanisms 
aims at providing extra detection when errors escape the boundaries of one or more mechanisms.
2 .2 .1 . Error recovery and track reconstruction
Besides EDAC which allows some errors to be automatically corrected by hardware, software error 
recovery procedures are invoked when errors are detected by the cache subsystem. Recovery actions 
mainly consist of retries, memory fencing and track reconstruction operations. When errors are de­
tected during a read operation from the cache memory and the error persists after retries, an attempt is 
made to read the data from the disk array. (The success of this operation depends on whether the data 
on the disk is still valid or has been corrupted or lost due to errors or failures affecting the disks.) 
Figure 2 describes a simplified disk array composed n data disks and 2 redundancy disks: P[l-n] and 
Q[l-n] are computed based on the data tracks T1 to Tn stored in the corresponding row. This archi­
tecture tolerates the loss of two tracks in a each row (this condition will be referred to as the track re­
construction condition). The tracks that are lost due to disk failures or corrupted due to bit-errors can 
be reconstructed using the valid tracks in the row provided that the track reconstruction condition is 
satisfied, otherwise data is lost and cannot be retrieved from the disks. Additional information about 
reconstruction strategies can be found in [Holland etal. 1993].
D a ta  Disks: D 1 , .. .  Dn R edundancy Disks: P, Q
Figure 2: A simplified RAID: n data disks and 2 redundancy disks
3. Dependability Modeling, Analysis and Evaluation
Our objective is to analyze the dependability of the system presented in the previous sections using 
real workload patterns derived from field measurement. Three main objectives are considered: 1) 
analyze how the system responds to various error rates and error scenarios, 2) analyze the error la­
tency duration (time between the occurrence of a fault and its detection, or its removal) and its impact 
on the system behavior, and 3) evaluate the coverage of each error detection mechanism. In our study, 
we analyse the system behavior based on its design and not on an experimental prototype. In this 
context, two complementary approaches might be used to address the dependability objectives identi­
fied above: analytical modeling and simulation-based behavioral modeling. Analyzing the system be­
havior under real workloads and detailed fault scenarios, and evaluating the error latency and the error 
detection coverage in this context, can only be conducted using a simulation approach. Indeed analyti­
cal models are tractable only when simplified assumptions are defined. In this paper, we use the De­
pend simulation-based modeling environment developed at the University of Illinois [Goswami et al. 
1997]. Depend provides facilities to inject faults into a functional behavior model, to simulate error 
detection and recovery mechanisms, and to evaluate quantitative measures.
To be able to perform the analyses mentioned above, we need to model the behavior of the system in 
the presence of faults taking into account the interactions and the data flow: 1) between the cache sub­
system, the disks and the other components of the array controller, and 2) inside the cache subsystem. 
Due to the complexity of the system and to the fact that we need to model the system’s functional be­
havior (cache operations) as well as rare events (fault occurrence and recovery), a detailed analysis 
and evaluation of this system cannot be conducted, in a reasonable time, if efficient modeling tech­
niques are not deployed. In the next sections we suggest a hierarchical modeling and simulation ap­
proach, which is based on analyzing the system behavior at different levels of abstraction. At each 
level, we build a model which details a particular aspect of the system behavior, we analyze the im­
pacts of faults and errors occurring at this level, and characterize them quantitatively. Inputs and out­
puts must be carefully defined for each level, depending on that level’s abstraction. Also, each level 
defines a set of parameters characterizing the occurrence and the consequences of faults and errors 
(e.g., probability and number of errors occurring during data transfer between system components, 
probability that errors are detected, etc.) that need to be evaluated and validated from the simulation of 
lower level models, and provides parameter values to be used in the upper level models. Establishing 
the proper number of hierarchical levels and their boundaries is not a trivial problem. Several factors 
must be taken into account to find an optimal hierarchical decomposition: 1) the complexity of the 
system to be modeled, 2) the level of detail of the analysis and the dependability measures to be evalu­
ated, and 3) the strength of system component interactions (weak interactions favor hierarchical de­
composition at the opposite of strong coupling).
Hierarchical modeling has several advantages: 1) the complexity of the modeling is reduced, 2) the 
system behavior can be analyzed at different abstraction levels, 3) the simulation is speeded up and the 
memory space required to run the simulation is reduced, and 4) the system model can be more easily 
modified and refined than without decomposition.
In the next sections, we present the hierachical modeling methodology that we defined to analyze the 
cache-based storage architecture presented in this paper.
4. Hierarchical Modeling Methodology
We have defined three hierarchical levels (summarized in Figure 3) that enable us to model the be­
havior of the cache-based storage system in the presence of faults. At each level, the behavior of the 
shaded components is detailed in the lower level model. We focus on the behavior of the cache sub­
system and the RAID and their interactions. Each submodel corresponding to a hierarchical level is 
built in a modular fashion. It is characterized by:
• the components to be modeled and their behavior,
• a workload generator specifying the input distribution,
• a fault dictionary specifying the set of faults to be injected in the model, the distribution character­
izing the occurrence of each fault and finally the consequence of the fault with the corresponding 
probability of occurrence,
• the outputs derived from the simulation of this submodel.
Level 1: {Cache +RAID} modeled as a black box
Workload
Generator
Un {read,Fi,SI}.......U1{write,F1,S1)
User requests to read or write 
data of size Si from/to a file Fi 
Si= number of tracks
Channel interfaces 
and control units
Fault
Injector
result of 
each track 
transfer
occurrence of faults in {cache+disk} 
based on probabilities evaluated 
from level 2
inputs from 
level 2
Requests to read 
or write tracks
{write Tk}, 
{read.Tt}
m
mm
Level 1 Outputs
1) Probability of failure of user requests
2) workload distribution of track 
read/write requests sent to 
{cache+RAID}
Level interactions between cache controller, RAID and cache'ipemory; 
tracks in cache memory and disks explicitly modeled; data unit »track
Workload
Generator
{ re a d ,T 1 } ,{w r ite .T n }
Track read/  write requests 
to {cache + RAID}
user specified 
injection 
parameters
•cache component/disk failure 
•memory/disk track errors 
•track errors during transfer
inputs from 
level 3
_____
channels/disks
Cache controller
Level 2 Outputs
1) Probability of;
Successful Track operation:
• No errors or
• Errors detected and corrected 
Failed Track operation:
• Errors detected but not corrected
• Errors not detected
• Cache or RAID unavailable
2) Error detection coverage (CRC,
! parity, EDAC)
, 3) Error latency distribution 
v 4) Frequency of track reconstructions 
\
£evel 3: Details the transfer of a track inside the cache controller; 
each track decomposed into data blocks; data block = set of data symbols
Workload
Generator
T 1 {B 1 ,B 2 ,..., B n}
Request to transfer a Track 
composed of n data blocks, 
Bi = {di1,...dik} data symbols
Fault Injectors
T
user specified 
injection parameters
• track errors in the buffers
• bus transmission errors
Cache controller 
interface with chan­
nels/disks
Cache controller 
interface with 
cache memory
Buffers Busi Buffers
’I
accesses 
to the disks accesses to the cache memory
Bus2
Level 3 Outputs
Probability and number of errors 
during transfer
• in cache controller interfaces- 
with channels/disks
• over Bus 1
• in cache controller interfaces 
with cache memory
• over Bus 2
Figure 3: Hierarchical modeling of the cache-based storage system
For all levels, the workload is a sequence of events which might either come from the monitoring of 
the system during operation (i.e., real traces) or be generated from a synthetic distribution.
In the model described in Figure 3, the system behavior, the granularity of the unit of data transfer 
and the quantitative measures evaluated are refined from one level to another. In level 1 model, the 
unit of data transfer is a set of tracks to be read or written from a user file, in level 2 it is a single 
track, and in level 3 the track is decomposed into a set of data blocks, each one composed of a set of 
data symbols. In the following, we describe the three levels. In this study, We address level 2 and
level 3 models which describe the internal behavior of the cache and RAID subsystems in the presence 
of faults. Level 1 is included mainly to illustrate the flexibility of our approach: with the hierarchical 
methodology, additional models can be built on top of level 2 and level 3 models to study the behavior 
of other systems relying on the cache and RAID subsystems.
4.1 Level 1 Model
Level 1 models the interface between the users and the storage system taking into account the occur­
rence of faults in the cache and RAID subsystems. Its purpose is to translate user requests, operations 
to be perfomed on a specified file, to requests to the storage system, sequences of operations to be 
performed on the corresponding set of tracks, and to propagate the replies from the storage system 
back to the users. A file request (read, write) results in a sequence of track requests (read, fast-write, 
write-through). Concurrent requests involving the same file may be received from different users. As 
a consequence, a single track operation failure might result in a multiple file operation failure. The 
user requests can be generated from a trace provided from real measurements (examples are given 
e.g., in [Baker etal. 1991, Ousterthout et al. 1985]) or simulated by a synthetic distribution specify­
ing the interarrival time between user read or write requests and the number of tracks to be accessed in 
each request. In this model, the cache subsystem and the disk array are described, at a high-level of 
abstraction, as a single entity, modeled as a black box. A fault dictionary specifying the results of 
track operations is defined to characterize the external behavior of the black box taking into account 
the presence of faults in the cache and RAID subsystems. There are four possible results for a track 
operation (from the perspective of occurrence, detection and correction of errors): 1) successful 
read/write track operation (i.e., absence of errors, or errors detected and corrected), 2) errors detected 
but not corrected, 3) errors not detected, and 4) service unavailable. Parameter values representing the 
probability of occurrence of these events are provided by the simulation of level 2 model. Two types 
of outputs are derived from the simulation of level 1 model: 1) quantitative measures characterizing the 
probability of failure of user requests and 2) the workload distribution of read or write track requests 
received by the cache subsystem. This workload is used to feed level 2 model.
4.2 Level 2 Model
Level 2 model describes the behavior in the presence of faults of the cache subsystem and the disk 
array. Cache operations and the data flow between the cache controller, the cache memory and the 
disk array are described in order to identify the scenarios leading to the outputs described above in 
level 1 model and evaluate their probability of occurrence. At level 2, the data stored in the cache 
memory and the disks is explicitly modeled and structured into a set of tracks. A track transfer opera­
tion is seen at a high level of abstraction, in that a track is seen as an atomic piece of data, travelling 
between different subparts of the system (from user to cache, from cache to user, from disk to cache, 
from cache to disk), while errors are injected to it and to the different components of the system as 
well. Accordingly, when a track is to be transferred between two communication partners, for exam­
ple, from the disk to the cache memory, none of the two needs to be aware of the disassembling, 
buffering and reassembling procedures which occur during the transfer. This results in a significant 
simulation speedup, since the number of events which need to be processed is reduced dramatically.
4.2.1. Workload distribution
Inputs to level 2 model correspond to requests to read or write tracks from the cache subsystem. Each 
request specifies the type of the access (Read, Write-through, Fast-write) and the track to be accessed. 
The distribution specifying these requests and their interarrival times can be derived from the simula­
tion of Level 1 model, from real measurements (i.e. real trace) or by generating distributions charac­
terizing various types of workloads.
4.2.2. Fault models
The main aspects developed in level 2 model concern the specification of the fault models to be ana­
lyzed for each system component and the modeling of error detection mechanisms (presented in Sec­
tion 2.2) in order to trace error scenarios leading to the outputs provided to level 1 model (see Section
4.1 and Figure 3). We distinguish between: 1) permanent faults leading to cache controller component 
failures, cache memory component failures or disk failures, 2) transient faults leading to track errors 
affecting single or multiple bits of the tracks while they are stored in the cache memory or in the disks, 
and 3) transient faults leading to track errors affecting single or multiple bits during the transfer of the 
tracks by the cache controller to the cache memory or to the disks.
Component failures. When a permanent fault is injected in a cache controller component, the re­
quests processed by this component are allocated to the other components of the cache controller 
which are still available. If there is no component available to perform the operation (due to failure), a 
cache subsystem failure is recorded. The reintegration of the failed component after repair does not 
interrupt the cache operations which are in progress. Permanent faults injected in a cache memory card 
or in a single disk lead to the loss of all tracks stored in these components. When a read request in­
volving the tracks stored on the faulty component is received by the cache, an attempt is made to read 
these tracks from the disks. If the tracks are still valid on the disks or can be reconstructed from the 
other disks, then the read operation is successfull, otherwise, the data is lost.
Track errors in the cache memory and the disks. These correspond to the occurrence of sin­
gle or multiple bit-errors in the track due to transient faults. Two fault injection strategies might be 
considered to simulate the occurrence of track errors in the cache memory and in the disks: time de­
pendent and load dependent. Time dependent fault injection simulates faults occurring randomly inde­
pendent of the system activity. Load dependent fault injection aims to simulate the occurrence of faults 
due to stress during system operation. The rate of load dependent fault injection depends on the num­
ber of accesses to the memory or to the disk instead of the time. These errors are injected randomly 
into one or more bytes of a track. Multiple errors are less frequent than single errors. The rate of oc­
currence of fault injection is tuned to allow a single fault injection or consecutive multiple near- 
coincident fault injections (i.e. the rate of occurrence of faults is increased during a short period of 
time). This enables us to analyze the impact of isolated and bursty fault patterns.
Track errors during data transfer inside the cache. The track errors during the transfer of 
data between the cache controller, the cache memory and the disks may occur:
• in the cache controller interfaces to channels/disks before transmission over Busl (see Figure 1), 
(i.e. before parity or CRC computation or checking),
• during transmission over Bus 1, (i.e., after parity computation),
• in the cache controller interfaces to cache memory before transmission over Bus2, (i.e. before 
ED AC code computation or checking), or
• during transmission over Bus 2, (i.e., after ED AC code computation).
To be able to evaluate the probability of occurrence and the number of errors affecting the track during 
the transfer, a detailed simulation of cache operations during this transfer is required. Including this 
detailed behavior in level 2 model would be far too costly in terms of computation time and memory 
occupation. For that reason, this detailed simulation is performed in a lower level (level 3 model). In 
level 2 model, a distribution is associated to each event described above which specifies the probabil­
ity and the number of errors occuring during the track transfer, and errors are injected during the 
transfer of a track according to these probabilities. These probabilities are evaluated from level 3.
4.2 .3 . Modeling of error detection mechanisms
Perfect coverage is assumed for permanent faults leading to the failure of cache components or the 
disks. The detection of track errors occurring in the cache memory or in the disks, or during the trans­
fer depends on the number of errors affecting each data symbol to which the error detection code is 
appended and when and where these errors occurred (see Table 1). The implementation of the error 
detection mechanisms in the model is done using a behavioral approach. The number of errors in each 
track is recorded and updated during the simulation. Each time a new error is injected in the track, the 
number of errors is incremented. When a request is sent to the cache controller to read a track, the 
number of errors affecting the track is checked and compared with the error detection conditions 
summarized in Table 1. Several scenarios of detection or non detection of errors are possible as dis­
cussed in Section 2.2 and illustrated in Figure 4. During a write operation, the track errors that have 
been accumulated during the previous operations are overwritten and the number of errors associated 
to the track is reset to zero. Figure 4 shows that errors may escape from some error confinement 
boundaries and propagate inside the cache subsystem. These errors may be finally detected by the 
frontend CRC or propagate outside the cache subsystem. Figure 5 presents example scenarios of error 
occurrence during a write to cache memory operation followed by a read from cache and shows that 
the latency of errors is variable depending on where the errors occur and also on the efficiency of the 
error detection mechanisms. For example, error E l occurring in the cache controller interfaces to 
channels/disks after the frontend CRC is computed can only be detected when this CRC is checked 
upon a read operation from the cache memory following a request from the user involving the corre­
sponding track. The error latency in this case mainly depends on the input distribution, i.e., the la­
tency of errors injected in highly accessed tracks will be orders of magnitude shorter than the latency 
of errors occurring in rarely accessed tracks. Errors injected over Bus 1 (for instance E2, E3, and E4) 
will exhibit a short latency if they are detected by parity (E2), the latency of errors escaping parity (E3 
and E4) will be of the same order of magnitude of the latency of errors injected in the cache controller 
interfaces (e.g. E l).
2) data flow from cache memory to disk: steps 8 to 14
3) data flow from disk to cache memory: steps 15 to 21
4) data flow from cache memory to user steps 22 to 29
Figure 4: Cache operations, data flow, fault occurrence, error detection boundaries
and error propagation
Frontend CRC error confinement boundary: errors accumulated between to and t1 lean be detected by FE_CRC
N
E1: error error latency: t11-t1
to t1
E 5 -
E3, E4: not detected by parity
E 2  '
f — Ì
t2 t3 t4 t5 t6 t7 t8
i Fl— ¡4
t9 t10
error
detection
error
propagation
t11
Append Append Check Append l Check Append Check Check
FE-CRC
i
i
----------
Parity Parity EDAC I
I
I
* --------
EDAC Parity Parity FE_CR C
i
i
--------p
Time
Figure 5: Write to cache and read from cache scenarios: error detection boundaries and error latency
4.2 .4 . Outputs and quantitative measures
The simulation of level 2 model enables us to trace several error scenarios and analyze the likelihood 
of errors to remain undetected by the cache subsystem or to cross the boundaries of several error de­
tection and recovery mechanisms before being finally detected. Moreover, using the fault injection 
functions implemented in the model, we analyze how the system responds to various error rates
(especially burst errors) and input distributions and how the accumulation of errors in the cache or in 
the disk and the error latency affect the behavior of the overall system. Statistics are recorded from the 
simulation to evaluate the coverage factors for each error detection mechanism, the error latency dis­
tribution, and the frequency of track reconstruction operations. Other quantitative measures such as 
the availability of the system and the mean time to data loss can also be recorded.
4.3 Level 3 Model
Level 3 model details cache operations during the transfer of tracks from user to cache, from cache to 
user, from disk to cache and from cache to disk, in order to evaluate the probabilities and number of 
errors occurring during these transfers (these parameters are used to feed level 2 model as discussed 
in Section 4.2). Unlike level 2 which models a track transfer at a high level of abstraction as an atomic 
operation, in level 3, each track is decomposed into a set of data blocks which in turn are broken 
down into data symbols (each one corresponding to a set of bytes). The transfer of a track is per­
formed in several steps and spans several cycles. CRC, Parity or ED AC bits are appended to the data 
transferred inside the cache or over the busses (Bus 1 and Bus 2). Errors during the transfer may af­
fect the data bits as well as the checking bits. At this level, we assume that the data stored in the cache 
memory and in the disk array is error free, as the impacts of these errors are considered in Level 2 
model. Therefore, we only need to model the cache controller interfaces with the channels/disks and 
with the cache memory and the data transfer busses. The input distribution used by the workload gen­
erator associated with level 3 model, which defines the tracks to be accessed and the interarrival times 
between track requests model is derived from level 2 model.
The cache controller interfaces are described by a set of buffers where the data to be transmitted to, or 
received from the busses is temporarily stored. The cache controller interfaces to the channels/disks 
checks the frontend CRC received on read operations and appends parity to each data symbol trans­
ferred over Bus 1. The cache controller interface to the cache memory: 1) checks the parity received 
from Bus 1, strips the parity and then appends ED AC bits to each data symbol transferred over Bus 2, 
when data is transferred to the cache memory (write operation), and 2) checks the ED AC bits received 
from the cache memory, strips the ED AC bits and then appends parity bits to each data symbol trans­
ferred over Bus 1 (read operation).
In level 3 model, only transient faults are injected to the cache components (buffers and busses). 
During each operation it is assumed a healthy component will perform its task correctly, i.e., it will 
execute the operation without increasing the number of errors in the data it is currently handling. For 
example, the cache controller interfaces will successfully load their own buffers, unless they are af­
fected by errors while performing the load operation. Similarly, busses (Busl and Bus 2) will transfer 
a data symbol and the associated information without errors, unless they are faulty while doing so. On 
the contrary, when a transient fault occurs, single or multiple bit-flips are continuously injected, dur­
ing the duration of the transient, into the track data symbols which are being processed. Since a single 
track transfer is a sequence of operations which span several cycles, single errors due to transients in 
the cache components may lead to burst of errors in the track currently being transferred. Due to the
high operational speed of the components, even a short transient (few microseconds) may result in an 
error burst which affects large number of bits.
5. Simulation Experiments and Results
In this section, we present results obtained from the simulation of level 2 and level 3 models to high­
light the advantages of using a hierachical simulation based approach for system dependability analy­
sis. We focus on the behavior of the cache and the disks when the system is stressed with high tran­
sient fault rates leading to a burst of errors (i.e., sequence of errors in rapid succession during a short 
period of time). The occurrence of burst errors during data transmission over busses, in the memory 
and the disks is common in computing systems as observed, e.g., in [Campbell et al. 1992]. It is well 
known that the CRC and ED AC error detection mechanisms provide a high error detection coverage 
of single bit errors. However, to the best of our knowledge, the impact of burst errors has not been 
extensively explored. In this section, we analyze the coverage of the error detection mechanisms, the 
distribution of the error detection latency and the error accumulation in the cache memory and the 
disks, and finally the evolution of the frequency of track reconstruction operations in the disks.
5.1 Experiment Set-up
Input Distribution. Real traces of user I/O requests are used to derive inputs for the simulation. 
Information provided by the traces include tracks processed by the cache subsystem, the type of the 
request (read, write), and the interrival times between the requests. Using a real trace provided us the 
opportunity to analyze the system under a real workload. This is one of the advantages of using a 
simulation approach compared to analytical approaches which consider simplifying assumptions about 
the input distribution to be able to obtain tractable models. The input trace that we processed is very 
large and describes accesses to more than 127 000 different tracks. As illustrated by Figure 6, the 
distribution of the number of-accesses per track is not uniform, rather a few tracks are generally more 
frequently accessed than the rest of the tracks— well known, track skew phenomenon. For instance, 
the first 100 most frequently accessed tracks account for 80 percent of the total number of accesses in 
the trace and the leading track of the input trace is accessed 26224 times whereas only 200 accesses 
are counted for rank-100 track. The interarrival time between track accesses is about a few millisec­
onds leading to a high activity of the cache subsystem. Figure 7 plots the probability density function 
of the interarrival times between track requests. Regarding the type of the requests, the distribution is: 
86% of reads, 11.4% of fast-write and 2.6% of write-through operations.
Figure 6: Track skew phenomenon (Log-Log curve) Figure 7: Intearrival time probability density
function (in milliseconds)
Simulation parameters. We simulated a large disk array composed of 13 data disks and 2 redun­
dancy disks. The data capacity of the RAID is 128,000 data tracks. The capacity of the simulated 
cache memory is 5% the capacity of the disk array. The rate of occurrence of permanent faults is of 
the order of magnitude of 10"4 per hour for cache components (what is generally observed for hard­
ware components) and 10‘6 per hour for the disks [Chen et al. 1994]. The mean time for repair is 72 
hours. Transient faults leading to track errors occur more frequently than permanent faults. As our 
objective is to analyze how the system responds to high fault rates and bursts of errors, the values for 
the transient fault rates assumed in the simulation experiment is 100 per hour. The occurrence of er­
rors is more likely to happen during data transmission over the busses than in the cache controller in­
terfaces, the cache memory and the disks. Therefore, the fault injection rate over the busses is as­
sumed to be 100 times the rate of injection in the other components (100/hour for the busses and 
1/hour for the other components). Regarding the load dependent fault injection strategy, the rate of 
injection in the disk corresponds to the occurrence of one error each 1014 bits accessed as observed in 
[Chen et al. 1994]. The same injection rate is assumed for the cache memory. The length of the error 
burst occurring in the cache memory and in the disks is sampled from a normal distribution with a 
mean of 100 and a standard deviation of 10, whereas the length of the error burst occurring during the 
transfer of the track inside the cache is evaluated from the simulation of level 3 model as discussed in 
Section 4.3. The results presented in the following subsections correspond to the simulation of 24 
hours of system operation.
5.2 Level 3 model simulation results
As discussed in Section 4.3, Level 3 model aims at evaluating the number of errors occurring during 
the transfer of the tracks inside the cache due to transient faults over the busses and in the cache con­
troller interfaces to the channels/disks and to the cache memory. We assumed that the duration of a 
transient fault is 5 microseconds. During the duration of the transient, single or multiple bit flips are 
continuously injected in the track data symbols which are processed during that time. The cache op­
erational cycle for the transfer of a single data symbol is of the order of magnitude of a few nanosec­
onds, therefore, the occurrence of a transient fault might affect a large number of bits in a track. This 
is illustrated by Figure 8 which plots the conditional probability density function of the number of er­
rors (i.e., number of bit-flips) occurring during the transfer over Bus 1 and Bus 2 (Figure 8-a) and 
inside the cache controller interfaces (Figure 8-b), given that a transient fault occurred. The distribu­
tion is the same for Busl and Bus2 due to the fact these busses have the same speed. The mean length 
of the error burst measured from the simulation is around 100 bits during transfer over the busses, 
800 bits when the track is temporarily stored in the cache controller interfaces to cache memory, and 
1000 bits when the track is temporaliy stored in the cache controller interfaces to channels/disks. The 
difference between the results is related to the difference between the track transfer time over the bus­
ses and inside the cache controller interfaces.
400 600 800 1000
#  errors during trans fer
a) transfer over the busses (Busi, Bus 2) b) errors in cache controller interfaces (CCI)
Figure 8: Probability of the number of errors during track transfer given that a transient fault is injected
5.3 Level 2 model simulation results
We used the burst errors distributions obtained from the simulation of level 3 model to feed level 2 
model as explained in Section 4.2. In the following subsection we present and discuss the results ob­
tained from the simulation of Level 2 which concern: 1) the efficiency of the cache error detection 
mechanisms, 2) the error latency distribution and finally 3) the error accummulation in the cache 
memory and the disks and the evolution of track reconstructions.
5.3.1. Coverage of the error detection mechanisms
For all the simulation experiments that we performed, the coverage factor measured for the frontend 
CRC and the physical sector CRC was equal to 100 percent. This is related to the very high probabil­
ity of detection of error patterns characterizing the CRC algorithm implemented in the system (see 
Section 2.2). Regarding ED AC and parity, the coverage factors tend to stabilize as the simulation time 
increases (see Figures 9-a and 9-b, respectively). Each unit of time in Figures 9-a and 9-b corre­
sponds to 15 minutes of system operation. It can be noticed that the ED AC coverage remains high 
even though the system is stressed with bursty and high fault rates, and more than 98% of the errors 
detected by EDAC are automatically corrected on-the-fly. The reason is that errors are injected ran­
domly in the track and the probability of having more than three errors in a single data symbol is low 
(the size of a data symbol is around 10*3 the size of the track). All the errors which escaped EDAC or 
parity have been detected by the frontend CRC upon a read request from the hosts. This result illus­
trates the advantages of storing the CRC with the data in the cache memory in order to provide extra­
detection of errors escaping EDAC and parity, and to compensate for the relatively low parity cover­
age.
0 15 30 45 60 75 time 0 15 30 45 60 75 time
a) EDAC detection coverage and correction coverage b) Parity detection coverage
Figure 9: EDAC and Parity coverage evolution during simulation time
0 .9 95
0 .9 9 0  f  
0 .9 8 5  -■ 
0 .9 8 0  -
0 .9 7 5  -  - O- 
&
n  q7 n  is _____
E D A C  detection ¡coverage
; E D A C  $orrectiori coverage
5 .3 .2 . Error latency distribution and  error propagation
When an error is injected in a track, the time of occurrence of the error and a code identifying which 
component caused the error are recorded. This allows us to monitor the error propagation in the sys­
tem. Six error codes are defined: CCI (error in the cache controller interfaces to the channels/disks and 
to the cache memory), CM (error in the cache memory), D  (error in the disk), B1 (error during trans­
mission over Bus 1) and, B2 (error during transmission over Bus 2). The time for an error to be 
overwritten (during a write operation) or detected (upon a read operation) is defined as error latency. 
Since the size of tracks is large and several errors may be present in each track, it is impractical to rec­
ord the latency for each error of the track. Instead, we record the latency associated with the first error 
injected in the track. This means that the error latency that we measure corresponds to the time from 
when the track becomes faulty until errors are overwritten or detected. Therefore, the error latency 
measured for each track is the maximum latency for errors present in the track. Figure 10 plots the 
error latency probability density function for errors as categorized above, and error latency for all 
samples without taking into account the origin of errors (the unit of time is 0.1 ms). It can be noticed 
that the latter distribution is bi-modal. The first mode corresponds to a very short latency which 
mainly results from errors occurring over Busl and detected by parity. The second mode corresponds 
to higher latencies due to errors occurring in the cache memory or the disks, or to the propagation of 
errors occurring during data transfer inside the cache. Note that most of errors escaping parity (error 
code B l) remain latent for a long period of time (as discussed in Section 4.2.3). The value of the la­
tency depends on the input distribution, if the track is not frequently accessed, then errors present in 
the track might remain latent for a long period of time. Figure 10-b shows that the latency of errors 
injected in the cache memory is slightly lower than the latency of errors injected in the disk due to the 
fact that the disks are less frequently accessed than the cache memory. Finally, it is important to notice 
that the difference between the error latency distribution for error codes B l and B2 (Figure 10-a) is 
due to the fact that data transfers over Bus 1 (during read and write operations) are covered by parity, 
whereas errors occurring during write operations over Bus 2 are detected later by EDAC or by the 
frontend CRC when the data is read from the cache memory. Consequently, it would be useful to 
check EDAC before data is written to the cache memory in order to reduce the latency of errors due to 
Bus 2.
Dall » C M  = D  »C C I OB1 SB2
1E+8
1E+4 L a te n cy
a) all error codes b) B u s i, cache m em ory and d isks
Figure 10: Error latency distribution
5.3 .3 . Distribution of errors in the cache memory and in the disks
The injection of bursts of errors during data transfer inside the cache increases the likelihood of error 
propagation to the cache memory and to the disks. Analysis of error accumulation in the cache mem­
ory and disks provides valuable feedback, especially for scrubbing policy. Figure 11 plots the evolu­
tion in time of the percentage of faulty tracks in the cache memory and in disks (the unit of time is 
15 minutes). An increasing trend is observed for the disk whereas, in the cache memory, we observe 
a periodic behavior such that the percentage of faulty tracks first increases and then decreases when 
errors are detected upon read operations, or are overwritten when tracks become dirty. Since the cache 
memory is accessed very frequentlty (each 5 milliseconds in average), and the cache hit rate is high 
(more than 60%), errors are more frequently detected and overwritten in the cache memory than in the 
disks. The increase of the number of faulty tracks in the cache affects the rate of the track reconstruc­
tion as illustrated in Figure 12. The average rate of track reconstruction is around 8.7 10‘5 per milli­
second. It is noteworthy that the detection of errors in the cache memory does not necessarily lead to 
the reconstruction of a track (the track might still be valid in the disks). Nevertheless, the detection of 
errors in the cache have an impact on performance due to the increase of the number of accesses to the 
disk. Figure 11 indicates that different strategies should be considered for disk and cache memory 
scmbbing. The disk should be scrubbed more frequently than the cache memory to prevent error ac­
cumulation which can lead to inability of reconstructing a faulty track.
Figure 11. Percentage o f  faulty tracks Figure 12. Tim e evolution o f  the frequency
in the cache and in disks o f track reconstruction
6. Summary and Discussion of the Simulation Results
In this section we discuss key results of employing a hierarchical, simulation-based methodology for 
analysis and evaluation of highly dependable storage systems.
A complex and sophisticated cache-based storage architecture is modeled and simulated to demon­
strate the capabilities of the proposed method. To ensure reliable operation and to prevent data loss, 
the system employs a number of error detection mechanisms and recovery strategies, including parity 
checking, ED AC, CRC checking, and support of redundant disks for data reconstruction. Due to the 
complex interrelationship and interactions among these detection and recovery strategies, it is not a 
trivial task to capture accurately the behavior of the overall system in the presence of faults. To enable 
an efficient and detailed dependability analysis, we propose a hierarchical, behavioral-simulation- 
based approach in which the system is decomposed into several abstraction levels and a correspond­
ing simulation model is associated with each level. In this approach, the impact of low-level faults is 
used in a higher-level analysis.
We demonstrated that via an appropriate hierarchical system decomposition, the complexity of indi­
vidual models can be significantly reduced without reducing the ability of the models to caputre de­
tailed system behavior. Additional system details can be easily incorporated by introducing new ab­
straction levels and corresponding simulation models.
To demonstrate the capabilities of the methodology we have conducted an extensive analysis of the 
design of a real, commercial cache RAID storage system. To our knowledge, this kind of analysis of 
a cached-RAID system has not been accomplished either in academia nor in the industry. The depend­
ability measures used to characterize the storage system include coverage of different error detection 
mechanisms employed in the system, error latency distribution classified according to the origin of an 
error, error accumulation in the cache memory and disks, and frequency of data reconstruction in the 
cache memory. To analyze the system under realistic operational conditions, we use real input traces 
to drive the simulations. The study revealed that the distribution of user accesses to the storage system 
is not uniform or exponential as is typically assumed in published studies (including analytical mod­
eling) that address the dependability of RAID storage systems. It is important to emphasize that an 
analytical modeling of the system is not appropriate in our context due to the complexity of the archi­
tecture, the overlapping of various error detection and recovery mechanisms, and the necessity of 
capturing the latent errors in the cache and the disks. Hierarchical simulation offers an efficient 
method to accomplish the above task and allows detailed analysis of the system to be performed using 
real input traces and elaborate fault models.
The specific results of the study are presented in previous sections. It is, however important to em­
phasize the key points that demonstrate the usefulness of the proposed methodology. First, we fo­
cused on analysis of the behavior of the system when it is stressed with high fault rates. In particular, 
we demonstrated that transient faults during a short period of time, occurring during data transfer over 
the busses or while the data is stored in the cache controller interface may lead to the occurrence of 
bursts of errors affecting a large number of bits of the track (more than 100 bits and even 1000 bits 
may be corrupted due to a transient fault of a few microseconds). Moreover, despite the high rate of 
injected faults, high error detection coverage was observed for EDAC and CRC mechanisms. The 
relative low coverage of parity was compensated for by the extra detection provided by CRC, which 
is stored with the data in the cache memory.
The hierarchical simulation approach allowed us to perform a detailed analysis of error latency with 
respect to the origin of an error. The error latency distribution measured from the simulation, regard­
less the origin of the errors, is bi-modal1: short latencies are mainly related to errors occurring and 
detected during data transfer over the bus protected by parity, and the highest error latency was ob­
served for errors injected into the disks. The analysis of the evolution during the simulation of the 
percentage of faulty tracks in the cache memory and the disks showed that, regardless of a high rate of
1 Note that similar behavior was observed in other experimental studies, e.g., [Arlat et al. 1990, Silva et a l  1996].
injected faults, there is no error accumulation in the cache memory, i.e., the percentage of faulty 
tracks in the cache varies within a small range (0.5% to 2.5%, see section 5.3.3), whereas an in­
creasing trend was observed for the disks (see Figure 10, section 5.3.3). This is related to the fact that 
the cache memory is accessed very frequently, and errors are more frequently detected and overwrit­
ten in the cache memory than in the disks. The primary implication of this result, together with the 
results of the error latency analysis, is the need for a carefully designed scrubbing policy capable of 
reducing the error latency with acceptable performance overhead. The simulation results suggest that 
the disks should be scrubbed more frequently than the cache memory in order to prevent error accu­
mulation, which may lead to an inability to reconstruct faulty tracks. However, we should emphasize 
that the results presented in this paper are derived from the simulation of the system using a single, 
real trace to generate the input patterns for the simulation. Additional experiments with different input 
traces and during long simulation times should be performed to confirm these results.
7. Conclusions
A storage system is a heavily used, integral component of any computing environment and conse­
quently the reliability and consistency of stored data are critical in achieving a high dependability of 
the overall computing system. In this paper, we proposed a hierarchical methodology for dependabil­
ity analysis of a commercial cache-based RAID storage system. The analyzed system is complex and 
employs several layers of overlapping error detection and recovery mechanisms to ensure high cover­
age against potential operational errors occurring in the cache and in the disks. To perform a detailed 
dependability analysis of the system, we defined a simulation-based modeling approach to describe 
the system behavior at several hierarchical abstraction levels. A distinct simulation model is associated 
with each abstraction level and is simulated independently while faults are injected to evoke fault toler­
ance activities in the system. Low level fault effects are propagated to the higher simulation levels 
where their impact on the overall system is analyzed.
Our simulation experiments, performed under real access traces, focused on the analysis of coverage 
factors for different error detection mechanisms, the error latency classified with respect to the origin 
of an error and error accumulation in the cache memory and the disks. Permanent faults and high rate 
transient faults leading to the occurrence of burst of errors are injected to the system. The results from 
the simulation demonstrate that the system is able to tolerate high fault rates. The results presented in 
the paper are preliminary as we addressed only the impact of errors affecting the data. Continuation of 
this work will include modeling of errors affecting the control flow in cache operations. The proposed 
approach is flexible enough to easily incorporate these aspects of system behavior.
References
[Alvarez et a l  1997] G. A. Alvarez, W. A. Burkhard and F. Cristian, ‘Tolerating Multiple Failures in RAID Architec­
tures with Optimal Storage and Unifrom Declustering”, 24th Annual ACM/IEEE Int. Symposium on Computer Ar­
chitecture (ISCA'97), Denver, CO, USA, 1997.
[Arlat et al. 1990] J. Arlat, M. Aguera, Y. Crouzet, J.C. Fabre, E. Martins and D. Powell, “Experimental Evaluation of 
the Fault Tolerance of an Atomic Multicast System”, IEEE Transactions on Reliability, 39 (4), pp.455-467, 1990. 
[Baker et al. 1991] M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shiriff and J. K. Ousterthout, “Measurements of 
a distributed File System”, 13th ACM Symposium on Operating Systems Principles, pp.198-212, 1991.
orv
[Blaum et al. 1995] M. Blaum, J. Brady, J. Bruck and J. Menon, “EVENODD: An Efficient Scheme for Tolerating 
Double Disk Failures in RAID Architectures”, IEEE Transactions on Computers, 44 (2), pp. 192-202, 1995.
[Burkhard and Menon 1993] W. A. Burkhard and J. Menon, “Disk Array Storage System Reliability”, 23rd Int. Sympo­
sium on Fault-Tolerant Computing (FTCS-23), pp.432-441, Toulouse, France, June 1993.
[Campbell et a l  1992] A. Campbell, P. McDonald and K. Ray, “Single Event Upset Rates in Space”, IEEE Transac­
tions on Nuclear Science, 39 (6), pp. 1828-1835, December 1992.
[Chen and Somani 1994] C.-H. Chen and A. K. Somani, “A Cache Protocol for Error Detection and Recovery in Fault- 
Tolerant Computing Systems”, 24th Int. Symp. on Fault-Tolerant Computing, pp.278-287, Austin, TX, 1994.
[Chen et al. 1994] P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz and D. A. Patterson, “RAID: High-Performance, 
Reliable Secondary Storage”, ACM Computing Surveys, 26 (2), pp. 145-185, June 1994.
[Chillarege & Iyer 1989] R. Chillarege and R. K. Iyer “Measurement-Based Analysis of Error Latency”, IEEE Transac­
tions on Computers, 36 (5), pp.529-537, May 1987.
[Friedman 1995] M. B. Friedman, “The Performance and Tuning of a StorageTek RAID 6 Disk Subsystem”, CMG 
Transactions, 87, pp.77-88, Winter 1995.
[Gibson 1992] G. A. Gibson, Redundant Disk Arrays: Reliable, Parallel Secondary Storage, The MIT Press, Cam­
bridge, MA, USA, 1992.
[Goswami et al. 1997] K. K. Goswami, R. K. Iyer and L. Young, “DEPEND: A simulation Based Environment for 
System level Dependability Analysis”, IEEE Transactions on Computers, 46 (1), pp.60-74, January 1997.
[Holland et al. 1993] M. Holland, G. Gibson, A. and D. P. Siewiorek, “Fast, On-Line Failure Recovery in Redundant 
Disk Arrays”, 23rd Int. Symp. on Fault-Tolerant Computing (FTCS-23), pp.422-431, Toulouse, France, 1993.
[Hou & Patt 1997] R. Y. Hou and Y. N. Patt, “Using Non-Volatile Storage to improve the Reliability o f RAID5 Disk 
Arrays”, 27th Int. Symposium on Fault-Tolerant Computing (FTCS-27), WA, pp.206-215, Seattle, June 1997.
[Houtekamer 1995] G. E. Houtekamer, “RAID System: The Berkley and MVS Perspectives”, CMG'95, pp.46-61, 
Nashville, Tennesse, USA, 1995.
[Karedla et al. 1994] R. Karedla, J. S. Love and B. G. Wherry, “Caching Strategies to Improve Disk System Perform­
ance”, IEEE Computer, pp.38-46, March 1994.
[Katz et al. 1993] R. H. Katz, P. M. Chen, A. L. Drapeau, E. K. Lee, K. Lutz, E. L. Miller, S. Seshan and D. A. Pat­
terson, “RAID-II: Design and Implementation of a Large Scale Disk Array Controller”, 1993 Symposium on Inte­
grated Systems, Cambridge, MA, USA, 1993.
[Malhotra & Trivedi 1993] M. Malhotra and K. S. Trivedi, “Reliability of Redundant Arrays of Inexpensive Disks”, 
Journal o f Parallel and Distributed Computing, 17, pp. 146-151, 1993.
[Menon 1994] J. Menon, “Performance of RAID5 Disk Arrays with Read and Write Caching”, International Journal on 
Distributed and Parallel Databases, 2 (3), pp.261-293, July 1994.
[Menon & Cortney 1993] J. Menon and J. Cortney, “The Architecture of a Fault-Tolerant Cached RAID Controller”, 
20th Annual International Symposium on Computer Architecture, pp.76-86, San Diego, CA, USA, 1993.
[Merchant & Yu 1992] A. Merchant and P. S. Yu, “Design and Modeling of Clustered RAID”, 22nd IEEE Int. Sympo­
sium on Fault-Tolerant Computing (FTCS-22), pp. 140-149, Boston, MA, USA, 1992.
[Mourad er al. 1993] A. N. Mourad, W. K. Fuchs and D. G. Saab, “Recovery Issues in Databases Using Redundant 
Disk Arrays”, Journal o f Parallel and Distributed Computing, 17, pp.75-89, 1993.
[Muntz & Lui 1990] R. R. Muntz and J. C. S. Lui, “Performance Analysis o f Disk Arrays Under Failure”, 16th Inter­
national Conference on Very Large Data Bases, pp.162-173, Brisbane, Australia, 1990.
[Ousterthout et al. 1985] J.K. Ousterthout, H.Da Costa, D.Harrison, et a i ,  “A Trace-Driven Analysis o f the Unix 4.2  
BSD File System”, 10th Symp. Operating Systems Principles, pp. 15-24, 1985.
[Patterson et al. 1988] D. A. Patterson, G. A. Gibson and R. H. Katz, “A Case for Redundant Arrays of Inexpensive 
Disks (RAID)”, ACM International Conference on Management of Data (SIGMOD), pp. 109-116, New York, 1988.
[Silva et al. 1996] J. G. Silva, J. Carreira, H. Madeira, D. Costa and F. Moreira “Experimental Assessment of Parallel 
Systems”, 26th IEEE Int. Symposium on Fault-Tolerant Computing (FTCS-26), pp.415-424, Sendai, Japan, 1996.
[Somani & Kim 1997] A. K. Somani and S. Kim, ‘Transient Fault Detection in Cache Memories by Employing a 
Small Shadow Cache”, DCCA-6, pp. 17-37, Garmish, Germany, 1997.
[Tang et al. 1990] D. Tang, R. K. Iyer and S. S. Subramani “Failure Analysis and Modeling of a VAXcluster System”, 
20th IEEE Int. Sym. on Fault-Tolerant Computing (FTCS-20), pp.244-251, Newcastle Upon a Tyne, UK, 1990.
[Tsao & Siewiorek 1983] M. M. Tsao and D. P. Siewiorek ‘Trend Analysis on System Error Files”, 13th IEEE Int. 
Symp. on Fault-Tolerant Computing (FTCS-13), pp.116-119, Milano, Italy, 1983.
