1,380 research outputs found

    Elevating commodity storage with the SALSA host translation layer

    Full text link
    To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS

    RAIDX: RAID EXTENDED FOR HETEROGENEOUS ARRAYS

    Get PDF
    The computer hard drive market has diversified with the establishment of solid state disks (SSDs) as an alternative to magnetic hard disks (HDDs). Each hard drive technology has its advantages: the SSDs are faster than HDDs but the HDDs are cheaper. Our goal is to construct a parallel storage system with HDDs and SSDs such that the parallel system is as fast as the SSDs. Achieving this goal is challenging since the slow HDDs store more data and become bottlenecks, while the SSDs remain idle. RAIDX is a parallel storage system designed for disks of different speeds, capacities and technologies. The RAIDX hardware consists of an array of disks; the RAIDX software consists of data structures and algorithms that allow the disks to be viewed as a single storage unit that has capacity equal to the sum of the capacities of its disks, failure rate lower than the failure rate of its individual disks, and speeds close to that of its faster disks. RAIDX achieves its performance goals with the aid of its novel parallel data organization technique that allows storage data to be moved on the fly without impacting the upper level file system. We show that storage data accesses satisfy the locality of reference principle, whereby only a small fraction of storage data are accessed frequently. RAIDX has a monitoring program that identifies frequently accessed blocks and a migration program that moves frequently accessed blocks to faster disks. The faster disks are caches that store the solo copy of frequently accessed data. Experimental evaluation has shown that a HDD+SSD RAIDX array is as fast as an all-SSD array when the workload shows locality of reference

    Studies of disk arrays tolerating two disk failures and a proposal for a heterogeneous disk array

    Get PDF
    There has been an explosion in the amount of generated data in the past decade. Online access to these data is made possible by large disk arrays, especially in the RAID (Redundant Array of Independent Disks) paradigm. According to the RAID level a disk array can tolerate one or more disk failures, so that the storage subsystem can continue operating with disk failure(s). RAID 5 is a single disk failure tolerant array which dedicates the capacity of one disk to parity information. The content on the failed disk can be reconstructed on demand and written onto a spare disk. However, RAID5 does not provide enough protection for data since the data loss may occur when there is a media failure (unreadable sectors) or a second disk failure during the rebuild process. Due to the high cost of downtime in many applications, two disk failure tolerant arrays, such as RAID6 and EVENODD, have become popular. These schemes use 2/N of the capacity of the array for redundant information in order to tolerate two disk failures. RM2 is another scheme that can tolerate two disk failures, with slightly higher redundancy ratio. However, the performance of these two disk failure tolerant RAID schemes is impaired, since there are two check disks to be updated for each write request. Therefore, their performance, especially when there are disk failure(s), is of interest. In the first part of the dissertation, the operations for the RAID5, RAID6, EVENODD and RM2 schemes are described. A cost model is developed for these RAID schemes by analyzing the operations in various operating modes. This cost model offers a measure of the volume of data being transmitted, and provides adevice-independent comparison of the efficiency of these RAID schemes. Based on this cost model, the maximum throughput of a RAID scheme can be obtained given detailed disk characteristic and RAID configuration. Utilizing M/G/1 queuing model and other favorable modeling assumptions, a queuing analysis to obtain the mean read response time is described. Simulation is used to validate analytic results, as well as to evaluate the RAID systems in analytically intractable cases. The second part of this dissertation describes a new disk array architecture, namely Heterogeneous Disk Array (HDA). The HDA is motivated by a few observations of the trends in storage technology. The HDA architecture allows a disk array to have two forms of heterogeneity: (1) device heterogeneity, i.e., disks of different types can be incorporated in a single HDA; and (2) RAID level heterogeneity, i.e., various RAID schemes can coexist in the same array. The goal of this architecture is (1) utilizing the extra resource (i.e. bandwidth and capacity) introduced by new disk drives in an automated and efficient way; and (2) using appropriate RAID levels to meet the varying availability requirements for different applications. In HDA, each new object is associated with an appropriate RAID level and the allocation is carried out in a way to keep disk bandwidth and capacity utilizations balanced. Design considerations for the data structures of HDA metadata are described, followed by the actual design of the data structures and flowcharts for the most frequent operations. Then a data allocation algorithm is described in detail. Finally, the HDA architecture is prototyped based on the DASim simulation toolkit developed at NJIT and simulation results of an HDA with two RAID levels (RAID 1 and RAIDS) are presented

    Rebuild performance enhancement using onboard caching and delayed vacation termination in clustered raid 5

    Get PDF
    The Clustered Raid 5 (CRAID5) architecture with a parity group size(G) smaller than the number of disks(N) increases the load by the declustering ratio denoted by α = (G -1)/(N -1), which can be lesser than that in Raid 5 while switching to, and subsequently operating in rebuild mode. The Nearly Random Permutation (NRP) layout provides the flexibility to vary the declustering ratio (α) for a given N, and the Vacationing Server Model (VSM) of processing the rebuild requests provides acceptable rebuild and user response times. The rebuild performance and the user response time can be improved by introducing an onboard buffer in the disks, which caches a single track upon arrival of a rebuild request while in rebuild mode. Such an enhancement is proposed, and the architecture is described along with an analysis using the DASim simulation toolkit developed at NJIT. Also proposed is the delayed termination of vacations with two user requests as this improves the rebuild performance with a negligible negative impact on user response time. Finally, the effect of limiting the rebuild buffer on the rebuild performance is presented in the context of three different disk utilizations and declustering ratios

    The effects of limited rebuild buffer and track buffers on rebuild time in raid5

    Get PDF
    Redundant Arrays of Independent Disks (RAID) are very popular for creating large, reliable storage systems. A RAID array consists of multiple independent disks that achieve fault tolerance by parity coding. The contents on a failed disk can be reconstructed on demand by reading and exciusive-ORing the corresponding blocks on surviving disks. Upon disk failure, the array enters rebuild mode when it begins to systematically reconstruct the data of the failed disk on a spare disk, provided one is available. The fundamental element of rebuild is the Rebuild Unit (RU). Surviving disks engaged in rebuild, process user requests at a higher priority. Since, not all RUs are available at the same time, available RUs must be stored in a buffer, called the rebuild buffer, which is a part of the disk array controller cache. Most studies assume that this buffer is infinite. However, with the advent of large sized disks, it is increasingly difficult to provide buffers large enough that do not prove to be bottlenecks. This work studies the effect of a limited rebuild buffer on the rebuild time in an effort to estimate its effect on the Mean Time to Data Loss (MTTDL) of the array. Finally, this work studies the idea of using track buffers which aim to improve the rebuild time by reducing the number of times a track has to be read in order to be completely rebuilt
    • …
    corecore