607 research outputs found
Stochastic Analysis on RAID Reliability for Solid-State Drives
Solid-state drives (SSDs) have been widely deployed in desktops and data
centers. However, SSDs suffer from bit errors, and the bit error rate is time
dependent since it increases as an SSD wears down. Traditional storage systems
mainly use parity-based RAID to provide reliability guarantees by striping
redundancy across multiple devices, but the effectiveness of RAID in SSDs
remains debatable as parity updates aggravate the wearing and bit error rates
of SSDs. In particular, an open problem is that how different parity
distributions over multiple devices, such as the even distribution suggested by
conventional wisdom, or uneven distributions proposed in recent RAID schemes
for SSDs, may influence the reliability of an SSD RAID array. To address this
fundamental problem, we propose the first analytical model to quantify the
reliability dynamics of an SSD RAID array. Specifically, we develop a
"non-homogeneous" continuous time Markov chain model, and derive the transient
reliability solution. We validate our model via trace-driven simulations and
conduct numerical analysis to provide insights into the reliability dynamics of
SSD RAID arrays under different parity distributions and subject to different
bit error rates and array configurations. Designers can use our model to decide
the appropriate parity distribution based on their reliability requirements.Comment: 12 page
Redundancy and Aging of Efficient Multidimensional MDS-Parity Protected Distributed Storage Systems
The effect of redundancy on the aging of an efficient Maximum Distance
Separable (MDS) parity--protected distributed storage system that consists of
multidimensional arrays of storage units is explored. In light of the
experimental evidences and survey data, this paper develops generalized
expressions for the reliability of array storage systems based on more
realistic time to failure distributions such as Weibull. For instance, a
distributed disk array system is considered in which the array components are
disseminated across the network and are subject to independent failure rates.
Based on such, generalized closed form hazard rate expressions are derived.
These expressions are extended to estimate the asymptotical reliability
behavior of large scale storage networks equipped with MDS parity-based
protection. Unlike previous studies, a generic hazard rate function is assumed,
a generic MDS code for parity generation is used, and an evaluation of the
implications of adjustable redundancy level for an efficient distributed
storage system is presented. Results of this study are applicable to any
erasure correction code as long as it is accompanied with a suitable structure
and an appropriate encoding/decoding algorithm such that the MDS property is
maintained.Comment: 11 pages, 6 figures, Accepted for publication in IEEE Transactions on
Device and Materials Reliability (TDMR), Nov. 201
RAIDX: RAID EXTENDED FOR HETEROGENEOUS ARRAYS
The computer hard drive market has diversified with the establishment of solid state disks (SSDs) as an alternative to magnetic hard disks (HDDs). Each hard drive technology has its advantages: the SSDs are faster than HDDs but the HDDs are cheaper. Our goal is to construct a parallel storage system with HDDs and SSDs such that the parallel system is as fast as the SSDs. Achieving this goal is challenging since the slow HDDs store more data and become bottlenecks, while the SSDs remain idle. RAIDX is a parallel storage system designed for disks of different speeds, capacities and technologies. The RAIDX hardware consists of an array of disks; the RAIDX software consists of data structures and algorithms that allow the disks to be viewed as a single storage unit that has capacity equal to the sum of the capacities of its disks, failure rate lower than the failure rate of its individual disks, and speeds close to that of its faster disks. RAIDX achieves its performance goals with the aid of its novel parallel data organization technique that allows storage data to be moved on the fly without impacting the upper level file system. We show that storage data accesses satisfy the locality of reference principle, whereby only a small fraction of storage data are accessed frequently. RAIDX has a monitoring program that identifies frequently accessed blocks and a migration program that moves frequently accessed blocks to faster disks. The faster disks are caches that store the solo copy of frequently accessed data. Experimental evaluation has shown that a HDD+SSD RAIDX array is as fast as an all-SSD array when the workload shows locality of reference
Analysis of a Gluonic Penguin Decay with the BaBar Detector
This thesis presents a branching fraction analysis of the neutral B meson decay channel B → ϕK0s where the K0s decays to π0π0. The decay is dominated by gluonic penguin transitions, which have been very important for the main program of BABAR: the search for physics beyond the Standard Model. The decay channel has been established and is included in the CP analysis, which is sensitive to new physics. The data set consists of 227 million BB̅ pairs recorded by the BABAR detector at the Stanford Linear Accelerator Center. Sophisticated analysis techniques have been applied primarily to suppress background from e+e- → quark/anti-quark reactions. The analysis of such rare decay channels with BABAR relies on the availability of a large set of computer simulated data. For that purpose a computer cluster has been built at the University of Tennessee as part of the distributed computing support work for BABAR. The design and performance of the cluster is a main subject of this thesis work
Dependence-driven techniques in system design
Burstiness in workloads is often found in multi-tier architectures, storage systems, and communication networks. This feature is extremely important in system design because it can significantly degrade system performance and availability. This dissertation focuses on how to use knowledge of burstiness to develop new techniques and tools for performance prediction, scheduling, and resource allocation under bursty workload conditions.;For multi-tier enterprise systems, burstiness in the service times is catastrophic for performance. Via detailed experimentation, we identify the cause of performance degradation on the persistent bottleneck switch among various servers. This results in an unstable behavior that cannot be captured by existing capacity planning models. In this dissertation, beyond identifying the cause and effects of bottleneck switch in multi-tier systems, we also propose modifications to the classic TPC-W benchmark to emulate bursty arrivals in multi-tier systems.;This dissertation also demonstrates how burstiness can be used to improve system performance. Two dependence-driven scheduling policies, SWAP and ALoC, are developed. These general scheduling policies counteract burstiness in workloads and maintain high availability by delaying selected requests that contribute to burstiness. Extensive experiments show that both SWAP and ALoC achieve good estimates of service times based on the knowledge of burstiness in the service process. as a result, SWAP successfully approximates the shortest job first (SJF) scheduling without requiring a priori information of job service times. ALoC adaptively controls system load by infinitely delaying only a small fraction of the incoming requests.;The knowledge of burstiness can also be used to forecast the length of idle intervals in storage systems. In practice, background activities are scheduled during system idle times. The scheduling of background jobs is crucial in terms of the performance degradation of foreground jobs and the utilization of idle times. In this dissertation, new background scheduling schemes are designed to determine when and for how long idle times can be used for serving background jobs, without violating predefined performance targets of foreground jobs. Extensive trace-driven simulation results illustrate that the proposed schemes are effective and robust in a wide range of system conditions. Furthermore, if there is burstiness within idle times, then maintenance features like disk scrubbing and intra-disk data redundancy can be successfully scheduled as background activities during idle times
Stochastic Modeling of Hybrid Cache Systems
In recent years, there is an increasing demand of big memory systems so to
perform large scale data analytics. Since DRAM memories are expensive, some
researchers are suggesting to use other memory systems such as non-volatile
memory (NVM) technology to build large-memory computing systems. However,
whether the NVM technology can be a viable alternative (either economically and
technically) to DRAM remains an open question. To answer this question, it is
important to consider how to design a memory system from a "system
perspective", that is, incorporating different performance characteristics and
price ratios from hybrid memory devices.
This paper presents an analytical model of a "hybrid page cache system" so to
understand the diverse design space and performance impact of a hybrid cache
system. We consider (1) various architectural choices, (2) design strategies,
and (3) configuration of different memory devices. Using this model, we provide
guidelines on how to design hybrid page cache to reach a good trade-off between
high system throughput (in I/O per sec or IOPS) and fast cache reactivity which
is defined by the time to fill the cache. We also show how one can configure
the DRAM capacity and NVM capacity under a fixed budget. We pick PCM as an
example for NVM and conduct numerical analysis. Our analysis indicates that
incorporating PCM in a page cache system significantly improves the system
performance, and it also shows larger benefit to allocate more PCM in page
cache in some cases. Besides, for the common setting of performance-price ratio
of PCM, "flat architecture" offers as a better choice, but "layered
architecture" outperforms if PCM write performance can be significantly
improved in the future.Comment: 14 pages; mascots 201
Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance
This initial version of this document was written back in 2014 for the sole
purpose of providing fundamentals of reliability theory as well as to identify
the theoretical types of machinery for the prediction of
durability/availability of erasure-coded storage systems. Since the definition
of a "system" is too broad, we specifically focus on warm/cold storage systems
where the data is stored in a distributed fashion across different storage
units with or without continuous operation. The contents of this document are
dedicated to a review of fundamentals, a few major improved stochastic models,
and several contributions of my work relevant to the field. One of the
contributions of this document is the introduction of the most general form of
Markov models for the estimation of mean time to failure. This work was
partially later published in IEEE Transactions on Reliability. Very good
approximations for the closed-form solutions for this general model are also
investigated. Various storage configurations under different policies are
compared using such advanced models. Later in a subsequent chapter, we have
also considered multi-dimensional Markov models to address detached
drive-medium combinations such as those found in optical disk and tape storage
systems. It is not hard to anticipate such a system structure would most likely
be part of future DNA storage libraries. This work is partially published in
Elsevier Reliability and System Safety. Topics that include simulation
modelings for more accurate estimations are included towards the end of the
document by noting the deficiencies of the simplified canonical as well as more
complex Markov models, due mainly to the stationary and static nature of
Markovinity. Throughout the document, we shall focus on concurrently maintained
systems although the discussions will only slightly change for the systems
repaired one device at a time.Comment: 58 pages, 20 figures, 9 tables. arXiv admin note: substantial text
overlap with arXiv:1911.0032
Workload Interleaving with Performance Guarantees in Data Centers
In the era of global, large scale data centers residing in clouds, many applications and users share the same pool of resources for the purposes of reducing energy and operating costs, and of improving availability and reliability. Along with the above benefits, resource sharing also introduces performance challenges: when multiple workloads access the same resources concurrently, contention may occur and introduce delays in the performance of individual workloads. Providing performance isolation to individual workloads needs effective management methodologies. The challenges of deriving effective management methodologies lie in finding accurate, robust, compact metrics and models to drive algorithms that can meet different performance objectives while achieving efficient utilization of resources. This dissertation proposes a set of methodologies aiming at solving the challenging performance isolation problem in workload interleaving in data centers, focusing on both storage components and computing components. at the storage node level, we focus on methodologies for better interleaving user traffic with background workloads, such as tasks for improving reliability, availability, and power savings. More specifically, a scheduling policy for background workload based on the statistical characteristics of the system busy periods and a methodology that quantitatively estimates the performance impact of power savings are developed. at the storage cluster level, we consider methodologies on how to efficiently conduct work consolidation and schedule asynchronous updates without violating user performance targets. More specifically, we develop a framework that can estimate beforehand the benefits and overheads of each option in order to automate the process of reaching intelligent consolidation decisions while achieving faster eventual consistency. at the computing node level, we focus on improving workload interleaving at off-the-shelf servers as they are the basic building blocks of large-scale data centers. We develop priority scheduling middleware that employs different policies to schedule background tasks based on the instantaneous resource requirements of the high priority applications running on the server node. Finally, at the computing cluster level, we investigate popular computing frameworks for large-scale data intensive distributed processing, such as MapReduce and its Hadoop implementation. We develop a new Hadoop scheduler called DyScale to exploit capabilities offered by heterogeneous cores in order to achieve a variety of performance objectives
Data Management Strategies for Relative Quality of Service in Virtualised Storage Systems
The amount of data managed by organisations continues to grow relentlessly.
Driven by the high costs of maintaining multiple local storage systems, there
is a well established trend towards storage consolidation using multi-tier Virtualised Storage Systems (VSSs). At the same time, storage infrastructures
are increasingly subject to stringent Quality of Service (QoS) demands.
Within a VSS, it is challenging to match desired QoS with delivered QoS,
considering the latter can vary dramatically both across and within tiers.
Manual efforts to achieve this match require extensive and ongoing human
intervention. Automated efforts are based on workload analysis, which ignores
the business importance of infrequently accessed data.
This thesis presents our design, implementation and evaluation of data
maintenance strategies in an enhanced version of the popular Linux Extended
3 Filesystem which features support for the elegant specification
of QoS metadata while maintaining compatibility with stock kernels. Users
and applications specify QoS requirements using a chmod-like interface. System
administrators are provided with a character device kernel interface
that allows for profiling of the QoS delivered by the underlying storage. We
propose a novel score-based metric, together with associated visualisation
resources, to evaluate the degree of QoS matching achieved by any given
data layout. We also design and implement new inode and datablock allocation
and migration strategies which exploit this metric in seeking to match
the QoS attributes set by users and/or applications on files and directories
with the QoS actually delivered by each of the filesystem’s block groups.
To create realistic test filesystems we have included QoS metadata support
in the Impressions benchmarking framework. The effectiveness of the
resulting data layout in terms of QoS matching is evaluated using a special
kernel module that is capable of inspecting detailed filesystem data on-the-fly.
We show that our implementations of the proposed inode and datablock
allocation strategies are capable of dramatically improving data placement
with respect to QoS requirements when compared to the default allocators
- …