12 research outputs found

    Modelling and Validation of Response Times in Zoned RAID

    No full text
    We present and validate an enhanced analytical queueing network model of zoned RAID. The model focuses on RAID levels 01 and 5, and yields the distribution of I/O request response time. Whereas our previous work could only support arrival streams of I/O requests of the same type, the model presented here supports heterogeneous streams with a mixture of read and write requests. This improved realism is made possible through multiclass extensions to our existing model. When combined with priority queueing, this development also enables more accurate modelling of the way subtasks of RAID 5 write requests are scheduled. In all cases we derive analytical results for calculating not only the mean but also higher moments and the full distribution of I/O request response time. We validate our mode

    Queueing network models of zoned RAID system performance

    No full text
    RAID systems are widely deployed, both as standalone storage solutions and as the building blocks of modern virtualised storage platforms. An accurate model of RAID system performance is therefore critical towards fulfilling quality of service constraints for fast, reliable storage. This thesis presents techniques and tools that model response times in zoned RAID systems. The inputs to this analysis are a specified I/O request arrival rate, an I/O request access profile, a given RAID configuration and physical disk parameters. The primary output of this analysis is an approximation to the cumulative distribution function of I/O request response time. From this, it is straightforward to calculate response time quantiles, as well as the mean, variance and higher moments of I/O request response time. The model supports RAID levels 0, 01, 10 and 5 and a variety of workload types. Our RAID model is developed in a bottom-up hierarchical fashion. We begin by modelling each zoned disk drive in the array as a single M/G/1 queue. The service time is modelled as the sum of the random variables of seek time, rotational latency and data transfer time. In doing so, we take into account the properties of zoned disks. We then abstract a RAID system as a fork-join queueing network. This comprises several queues, each of which represents one disk drive in the array. We tailor our basic fork-join approximation to account for the I/O request patterns associated with particular request types and request sizes under different RAID levels. We extend the RAID and disk models to support bulk arrivals, requests of different sizes and scheduling algorithms that reorder queueing requests to minimise disk head positioning time. Finally, we develop a corresponding simulation to improve and validate the model. To test the accuracy of all our models, we validate them against disk drive and RAID device measurements throughout

    Validation of Large Zoned RAID Systems

    No full text
    Building on our prior work we present an improved model for for large partial stripe following full stripe writes in RAID 5. This was necessary because we observed that our previous model tended to underestimate measured results. To date, we have only validated these models against RAID systems with at most four disks. Here we validate our improved model, and also our existing models for other read and write configurations, against measurements taken from an eight disk RAID array

    SIMULATION AND MODELLING OF RAID 0 SYSTEM PERFORMANCE

    No full text
    RAID systems are fundamental components of modern storage infrastructures. It is therefore important to model their performance effectively. This paper describes a simulation model which predicts the cumulative distribution function of I/O request response time in a RAID 0 system consisting of homogeneous zoned disk drives. The model is constructed in a bottom-up manner, starting by abstracting a single disk drive as an M/G/1 queue. This is then extended to model a RAID 0 system using a split-merge queueing network. Simulation results of I/O request response time for RAID 0 systems with various numbers of disks are computed and compared against device measurements

    Dynamic Control of the Join-Queue Lengths in Saturated Fork-Join Stations

    Get PDF
    The analysis of fork-join queueing systems has played an important role for the performance evaluation of distributed systems where parallel computations associated with the same job are carried out and a job is considered served only when all the parallel tasks it consists of are served and then joined. The fork-join nodes that we consider consist of K >= 2 parallel servers each of which is equipped with two FCFS queues, namely the service-queue and the join-queue. The former stores the tasks waiting for being served while the latter stores the served tasks waiting for being joined. When the queueing station is saturated, i.e., the service-queues are never empty, we observe that the join-queue sizes tend to grow infinitely even if the expected service times at the servers are the same. In fact, this is due to the variance of the service time distribution. To tackle this problem, we propose a simple service-rate control mechanism, and show that under the exponential assumption on the service times, we can analytically study a set of relevant performance indices. We show that by selectively reducing the speed of some servers, significant energy saving can be achieved

    Data allocation in disk arrays with multiple raid levels

    Get PDF
    There has been an explosion in the amount of generated data, which has to be stored reliably because it is not easily reproducible. Some datasets require frequent read and write access. like online transaction processing applications. Others just need to be stored safely and read once in a while, as in data mining. This different access requirements can be solved by using the RAID (redundant array of inexpensive disks) paradigm. i.e., RAIDi for the first situation and RAID5 for the second situation. Furthermore rather than providing two disk arrays with RAID 1 and RAID5 capabilities, a controller can be postulated to emulate both. It is referred as a heterogeneous disk array (HDA). Dedicating a subset of disks to RAID 1 results in poor disk utilization, since RAIDi vs RAID5 capacity and bandwidth requirements are not known a priori. Balancing disk loads when disk space is shared among allocation requests, referred to as virtual arrays - VAs poses a difficult problem. RAIDi disk arrays have a higher access rate per gigabyte than RAID5 disk arrays. Allocating more VAs while keeping disk utilizations balanced and within acceptable bounds is the goal of this study. Given its size and access rate a VA\u27s width or the number of its Virtual Disks -VDs is determined. VDs allocations on physical disks using vector-packing heuristics, with disk capacity and bandwidth as the two dimensions are shown to be the best. An allocation is acceptable if it does riot exceed the disk capacity and overload disks even in the presence of disk failures. When disk bandwidth rather than capacity is the bottleneck, the clustered RAID paradigm is applied, which offers a tradeoff between disk space and bandwidth. Another scenario is also considered where the RAID level is determined by a classification algorithm utilizing the access characteristics of the VA, i.e., fractions of small versus large access and the fraction of write versus read accesses. The effect of RAID 1 organization on its reliability and performance is studied too. The effect of disk failures on the X-code two disk failure tolerant array is analyzed and it is shown that the load across disks is highly unbalanced unless in an NxN array groups of N stripes are randomly rotated

    Large-scale simulator for global data infrastructure optimization

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, February 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 165-172).Companies depend on information systems to control their operations. During the last decade, Information Technology (IT) infrastructures have grown in scale and complexity. Any large company runs many enterprise applications that serve data to thousands of users which, in turn, consume this information in different locations concurrently and collaboratively. The understanding by the enterprise of its own systems is often limited. No one person in the organization has a complete picture of the way in which applications share and move data files between data centers. In this dissertation an IT infrastructure simulator is developed to evaluate the performance, availability and reliability of large-scale computer systems. The goal is to provide data center operators with a tool to understand the consequences of infrastructure updates. These alterations can include the deployment of new network topologies, hardware configurations or software applications. The simulator was constructed using a multilayered approach and was optimized for multicore scalability. The results produced by the simulator were validated against the real system of a Fortune 500 company. This work pioneers the simulation of large-scale IT infrastructures. It not only reproduces the behavior of data centers at a macroscopic scale, but allows operators to navigate down to the detail of individual elements, such as processors or network links. The combination of queueing networks representing hardware components with message sequences modeling enterprise software enabled reaching a scale and complexity not available in previous research in this area.by Sergio Herrero-L贸pez.Ph.D

    Data Management Strategies for Relative Quality of Service in Virtualised Storage Systems

    No full text
    The amount of data managed by organisations continues to grow relentlessly. Driven by the high costs of maintaining multiple local storage systems, there is a well established trend towards storage consolidation using multi-tier Virtualised Storage Systems (VSSs). At the same time, storage infrastructures are increasingly subject to stringent Quality of Service (QoS) demands. Within a VSS, it is challenging to match desired QoS with delivered QoS, considering the latter can vary dramatically both across and within tiers. Manual efforts to achieve this match require extensive and ongoing human intervention. Automated efforts are based on workload analysis, which ignores the business importance of infrequently accessed data. This thesis presents our design, implementation and evaluation of data maintenance strategies in an enhanced version of the popular Linux Extended 3 Filesystem which features support for the elegant specification of QoS metadata while maintaining compatibility with stock kernels. Users and applications specify QoS requirements using a chmod-like interface. System administrators are provided with a character device kernel interface that allows for profiling of the QoS delivered by the underlying storage. We propose a novel score-based metric, together with associated visualisation resources, to evaluate the degree of QoS matching achieved by any given data layout. We also design and implement new inode and datablock allocation and migration strategies which exploit this metric in seeking to match the QoS attributes set by users and/or applications on files and directories with the QoS actually delivered by each of the filesystem鈥檚 block groups. To create realistic test filesystems we have included QoS metadata support in the Impressions benchmarking framework. The effectiveness of the resulting data layout in terms of QoS matching is evaluated using a special kernel module that is capable of inspecting detailed filesystem data on-the-fly. We show that our implementations of the proposed inode and datablock allocation strategies are capable of dramatically improving data placement with respect to QoS requirements when compared to the default allocators
    corecore