20,797 research outputs found

    CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems

    Full text link
    Data availability is critical in distributed storage systems, especially when node failures are prevalent in real life. A key requirement is to minimize the amount of data transferred among nodes when recovering the lost or unavailable data of failed nodes. This paper explores recovery solutions based on regenerating codes, which are shown to provide fault-tolerant storage and minimum recovery bandwidth. Existing optimal regenerating codes are designed for single node failures. We build a system called CORE, which augments existing optimal regenerating codes to support a general number of failures including single and concurrent failures. We theoretically show that CORE achieves the minimum possible recovery bandwidth for most cases. We implement CORE and evaluate our prototype atop a Hadoop HDFS cluster testbed with up to 20 storage nodes. We demonstrate that our CORE prototype conforms to our theoretical findings and achieves recovery bandwidth saving when compared to the conventional recovery approach based on erasure codes.Comment: 25 page

    Construction of Near-Optimum Burst Erasure Correcting Low-Density Parity-Check Codes

    Full text link
    In this paper, a simple, general-purpose and effective tool for the design of low-density parity-check (LDPC) codes for iterative correction of bursts of erasures is presented. The design method consists in starting from the parity-check matrix of an LDPC code and developing an optimized parity-check matrix, with the same performance on the memory-less erasure channel, and suitable also for the iterative correction of single bursts of erasures. The parity-check matrix optimization is performed by an algorithm called pivot searching and swapping (PSS) algorithm, which executes permutations of carefully chosen columns of the parity-check matrix, after a local analysis of particular variable nodes called stopping set pivots. This algorithm can be in principle applied to any LDPC code. If the input parity-check matrix is designed for achieving good performance on the memory-less erasure channel, then the code obtained after the application of the PSS algorithm provides good joint correction of independent erasures and single erasure bursts. Numerical results are provided in order to show the effectiveness of the PSS algorithm when applied to different categories of LDPC codes.Comment: 15 pages, 4 figures. IEEE Trans. on Communications, accepted (submitted in Feb. 2007

    Enhanced Recursive Reed-Muller Erasure Decoding

    Get PDF
    Recent work have shown that Reed-Muller (RM) codes achieve the erasure channel capacity. However, this performance is obtained with maximum-likelihood decoding which can be costly for practical applications. In this paper, we propose an encoding/decoding scheme for Reed-Muller codes on the packet erasure channel based on Plotkin construction. We present several improvements over the generic decoding. They allow, for a light cost, to compete with maximum-likelihood decoding performance, especially on high-rate codes, while significantly outperforming it in terms of speed

    Low-Complexity Codes for Random and Clustered High-Order Failures in Storage Arrays

    Get PDF
    RC (Random/Clustered) codes are a new efficient array-code family for recovering from 4-erasures. RC codes correct most 4-erasures, and essentially all 4-erasures that are clustered. Clustered erasures are introduced as a new erasure model for storage arrays. This model draws its motivation from correlated device failures, that are caused by physical proximity of devices, or by age proximity of endurance-limited solid-state drives. The reliability of storage arrays that employ RC codes is analyzed and compared to known codes. The new RC code is significantly more efficient, in all practical implementation factors, than the best known 4-erasure correcting MDS code. These factors include: small-write update-complexity, full-device update-complexity, decoding complexity and number of supported devices in the array

    Asymmetric LOCO Codes: Constrained Codes for Flash Memories

    Full text link
    In data storage and data transmission, certain patterns are more likely to be subject to error when written (transmitted) onto the media. In magnetic recording systems with binary data and bipolar non-return-to-zero signaling, patterns that have insufficient separation between consecutive transitions exacerbate inter-symbol interference. Constrained codes are used to eliminate such error-prone patterns. A recent example is a new family of capacity-achieving constrained codes, named lexicographically-ordered constrained codes (LOCO codes). LOCO codes are symmetric, that is, the set of forbidden patterns is closed under taking pattern complements. LOCO codes are suboptimal in terms of rate when used in Flash devices where block erasure is employed since the complement of an error-prone pattern is not detrimental in these devices. This paper introduces asymmetric LOCO codes (A-LOCO codes), which are lexicographically-ordered constrained codes that forbid only those patterns that are detrimental for Flash performance. A-LOCO codes are also capacity-achieving, and at finite-lengths, they offer higher rates than the available state-of-the-art constrained codes designed for the same goal. The mapping-demapping between the index and the codeword in A-LOCO codes allows low-complexity encoding and decoding algorithms that are simpler than their LOCO counterparts.Comment: 9 pages (double column), 0 figures, accepted at the Annual Allerton Conference on Communication, Control, and Computin
    corecore