14 research outputs found

    RAID Level 6 and Level 6+ Reliability

    Get PDF
    Storage systems are built of fallible components but have to provide high degrees of reliability. Besides mirroring and triplicating data, redundant storage of information using erasure-correcting codes is the only possibility to have data survive device failure.We provide here exact formula for the data-loss probability of a disk array composed of several RAID Level 6 stripes. This two-failure tolerant is not only used in practice but can also provide a reference point for the assessment of other data organizations

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    Efficient data mappings for parity-declustered data layouts

    Get PDF
    AbstractThe joint demands of high performance and fault tolerance in a large array of disks can be satisfied by a parity-declustered data layout. Such a data layout is generated by partitioning the data on the disks into stripes and choosing a part of each stripe to hold redundant information. Thus the data layout can be represented as a table of stripes. The data mapping problem is the problem of translating a data address into a disk identifier and an offset on that disk. Recent work has yielded mappings that compute disks and offsets directly from data addresses without the need to store tables. In this paper, we show that parity-declustered data layouts based on commutative rings yield mappings with improved computational efficiency and wider applicability

    Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads,”

    Get PDF
    Abstract To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present an algorithm that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasure code and produces recovery schedules that use a minimum amount of data. We differentiate popular erasure codes based on this criterion and demonstrate that the differences improve I/O performance in practice for the large block sizes used in cloud file systems. Several cloud systems [Ford10,Calder11] have adopted Reed-Solomon (RS) codes, because of their generality and their ability to tolerate larger numbers of failures. We define a new class of rotated ReedSolomon codes that perform degraded reads more efficiently than all known codes, but otherwise inherit the reliability and performance properties of Reed-Solomon codes. The online home for this paper may be found at: http://web.eecs.utk.edu/~plank/plank/papers/ FAST-2012 Abstract To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present an algorithm that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasure code and produces recovery schedules that use a minimum amount of data. We differentiate popular erasure codes based on this criterion and demonstrate that the differences improve I/O performance in practice for the large block sizes used in cloud file systems. Several cloud system

    Parity Declustering for Continuous Operation in Redundant Disk Arrays

    No full text
    We describe and evaluate a strategy for declustering the parity encoding in a redundant disk array. This declustered parity organization balances cost against data reliability and performance during failure recovery. It is targeted at highly-available parity-based arrays for use in continuousoperation systems. It improves on standard parity organizations by reducing the additional load on surviving disks during the reconstruction of a failed disk's contents. This yields higher user throughput during recovery, and/or shorter recovery time. We first address the generalized parity layout problem, basing our solution on balanced incomplete and complete block designs. A software implementation of declustering is then evaluated using a disk array simulator under a highly concurrent workload comprised of small user accesses. We show that declustered parity penalizes user response time while a disk is being repaired (before and during its recovery) less than comparable non-declustered (RAID5) ..
    corecore