6 research outputs found
A Logical Model and Data Placement Strategies for MEMS Storage Devices
MEMS storage devices are new non-volatile secondary storages that have
outstanding advantages over magnetic disks. MEMS storage devices, however, are
much different from magnetic disks in the structure and access characteristics.
They have thousands of heads called probe tips and provide the following two
major access facilities: (1) flexibility: freely selecting a set of probe tips
for accessing data, (2) parallelism: simultaneously reading and writing data
with the set of probe tips selected. Due to these characteristics, it is
nontrivial to find data placements that fully utilize the capability of MEMS
storage devices. In this paper, we propose a simple logical model called the
Region-Sector (RS) model that abstracts major characteristics affecting data
retrieval performance, such as flexibility and parallelism, from the physical
MEMS storage model. We also suggest heuristic data placement strategies based
on the RS model and derive new data placements for relational data and
two-dimensional spatial data by using those strategies. Experimental results
show that the proposed data placements improve the data retrieval performance
by up to 4.0 times for relational data and by up to 4.8 times for
two-dimensional spatial data of approximately 320 Mbytes compared with those of
existing data placements. Further, these improvements are expected to be more
marked as the database size grows.Comment: 37 page
Scalability of RAID systems
RAID systems (Redundant Arrays of Inexpensive Disks) have dominated backend
storage systems for more than two decades and have grown continuously in size
and complexity. Currently they face unprecedented challenges from data intensive
applications such as image processing, transaction processing and data warehousing.
As the size of RAID systems increases, designers are faced with both performance and
reliability challenges. These challenges include limited back-end network bandwidth,
physical interconnect failures, correlated disk failures and long disk reconstruction
time.
This thesis studies the scalability of RAID systems in terms of both performance
and reliability through simulation, using a discrete event driven simulator for RAID
systems (SIMRAID) developed as part of this project. SIMRAID incorporates two
benchmark workload generators, based on the SPC-1 and Iometer benchmark specifications.
Each component of SIMRAID is highly parameterised, enabling it to explore
a large design space. To improve the simulation speed, SIMRAID develops a set of
abstraction techniques to extract the behaviour of the interconnection protocol without
losing accuracy. Finally, to meet the technology trend toward heterogeneous storage
architectures, SIMRAID develops a framework that allows easy modelling of different
types of device and interconnection technique.
Simulation experiments were first carried out on performance aspects of scalability.
They were designed to answer two questions: (1) given a number of disks, which
factors affect back-end network bandwidth requirements; (2) given an interconnection
network, how many disks can be connected to the system. The results show that
the bandwidth requirement per disk is primarily determined by workload features and
stripe unit size (a smaller stripe unit size has better scalability than a larger one), with
cache size and RAID algorithm having very little effect on this value. The maximum
number of disks is limited, as would be expected, by the back-end network bandwidth.
Studies of reliability have led to three proposals to improve the reliability and scalability
of RAID systems. Firstly, a novel data layout called PCDSDF is proposed.
PCDSDF combines the advantages of orthogonal data layouts and parity declustering
data layouts, so that it can not only survivemultiple disk failures caused by physical interconnect
failures or correlated disk failures, but also has a good degraded and rebuild
performance. The generating process of PCDSDF is deterministic and time-efficient.
The number of stripes per rotation (namely the number of stripes to achieve rebuild workload balance) is small. Analysis shows that the PCDSDF data layout can significantly
improve the system reliability. Simulations performed on SIMRAID confirm
the good performance of PCDSDF, which is comparable to other parity declustering
data layouts, such as RELPR.
Secondly, a system architecture and rebuilding mechanism have been designed,
aimed at fast disk reconstruction. This architecture is based on parity declustering data
layouts and a disk-oriented reconstruction algorithm. It uses stripe groups instead of
stripes as the basic distribution unit so that it can make use of the sequential nature of
the rebuilding workload. The design space of system factors such as parity declustering
ratio, chunk size, private buffer size of surviving disks and free buffer size are explored
to provide guidelines for storage system design.
Thirdly, an efficient distributed hot spare allocation and assignment algorithm for
general parity declustering data layouts has been developed. This algorithm avoids
conflict problems in the process of assigning distributed spare space for the units on
the failed disk. Simulation results show that it effectively solves the write bottleneck
problem and, at the same time, there is only a small increase in the average response
time to user requests
RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)
RAID proposal advocated replacing large disks with arrays of PC disks, but as
the capacity of small disks increased 100-fold in 1990s the production of large
disks was discontinued. Storage dependability is increased via replication or
erasure coding. Cloud storage providers store multiple copies of data obviating
for need for further redundancy. Varitaions of RAID based on local recovery
codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs
have low latency and high bandwidth, are more reliable, consume less power and
have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial
text overlap with arXiv:2306.0876
Fourth NASA Goddard Conference on Mass Storage Systems and Technologies
This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994