RAID systems (Redundant Arrays of Inexpensive Disks) have dominated backend
storage systems for more than two decades and have grown continuously in size
and complexity. Currently they face unprecedented challenges from data intensive
applications such as image processing, transaction processing and data warehousing.
As the size of RAID systems increases, designers are faced with both performance and
reliability challenges. These challenges include limited back-end network bandwidth,
physical interconnect failures, correlated disk failures and long disk reconstruction
time.
This thesis studies the scalability of RAID systems in terms of both performance
and reliability through simulation, using a discrete event driven simulator for RAID
systems (SIMRAID) developed as part of this project. SIMRAID incorporates two
benchmark workload generators, based on the SPC-1 and Iometer benchmark specifications.
Each component of SIMRAID is highly parameterised, enabling it to explore
a large design space. To improve the simulation speed, SIMRAID develops a set of
abstraction techniques to extract the behaviour of the interconnection protocol without
losing accuracy. Finally, to meet the technology trend toward heterogeneous storage
architectures, SIMRAID develops a framework that allows easy modelling of different
types of device and interconnection technique.
Simulation experiments were first carried out on performance aspects of scalability.
They were designed to answer two questions: (1) given a number of disks, which
factors affect back-end network bandwidth requirements; (2) given an interconnection
network, how many disks can be connected to the system. The results show that
the bandwidth requirement per disk is primarily determined by workload features and
stripe unit size (a smaller stripe unit size has better scalability than a larger one), with
cache size and RAID algorithm having very little effect on this value. The maximum
number of disks is limited, as would be expected, by the back-end network bandwidth.
Studies of reliability have led to three proposals to improve the reliability and scalability
of RAID systems. Firstly, a novel data layout called PCDSDF is proposed.
PCDSDF combines the advantages of orthogonal data layouts and parity declustering
data layouts, so that it can not only survivemultiple disk failures caused by physical interconnect
failures or correlated disk failures, but also has a good degraded and rebuild
performance. The generating process of PCDSDF is deterministic and time-efficient.
The number of stripes per rotation (namely the number of stripes to achieve rebuild workload balance) is small. Analysis shows that the PCDSDF data layout can significantly
improve the system reliability. Simulations performed on SIMRAID confirm
the good performance of PCDSDF, which is comparable to other parity declustering
data layouts, such as RELPR.
Secondly, a system architecture and rebuilding mechanism have been designed,
aimed at fast disk reconstruction. This architecture is based on parity declustering data
layouts and a disk-oriented reconstruction algorithm. It uses stripe groups instead of
stripes as the basic distribution unit so that it can make use of the sequential nature of
the rebuilding workload. The design space of system factors such as parity declustering
ratio, chunk size, private buffer size of surviving disks and free buffer size are explored
to provide guidelines for storage system design.
Thirdly, an efficient distributed hot spare allocation and assignment algorithm for
general parity declustering data layouts has been developed. This algorithm avoids
conflict problems in the process of assigning distributed spare space for the units on
the failed disk. Simulation results show that it effectively solves the write bottleneck
problem and, at the same time, there is only a small increase in the average response
time to user requests